Research Interests
My research develops Bayesian statistical methods for modeling stochastic processes in biological and epidemiological systems. By integrating computational statistics, applied probability, and evolutionary biology, I work to understand how these complex systems evolve and change over time.
Methodology
- Bayesian Statistics — hierarchical and nonparametric modeling, probabilistic inference, and prior construction.
- Computational Statistics — algorithmic development for high-dimensional models, Markov chain Monte Carlo (MCMC) and Hamiltonian Monte Carlo (HMC).
- Applied Probability and Mathematical Biology — continuous-time Markov chains (CTMCs) and diffusion models.
Areas of Application
- Evolutionary Processes — phylogenetics, phylogeography, phylodynamics, and coalescent theory.
- Epidemiology and Infectious Diseases — statistical modeling of epidemic dynamics and viral evolution to uncover the ecological and demographic drivers of transmission.
Projects
Show Abstract
Inferring the infinitesimal rates of continuous-time Markov chains (CTMCs) is a central challenge in many scientific domains. This task is hindered by three factors: quadratic growth in the number of rates as the CTMC state space expands, strong dependencies among rates, and incomplete information for many transitions. We introduce a new Bayesian framework that flexibly models the CTMC rates by incorporating covariates through Gaussian processes (GPs). This approach improves inference by integrating new information and contributes to the understanding of the CTMC stochastic behavior by shedding light on potential external drivers. Unlike previous approaches limited to linear covariate effects, our method captures complex non-linear relationships, enabling fuller use of covariate information and more accurate characterization of their influence. To perform efficient inference, we employ a scalable Hamiltonian Monte Carlo (HMC) sampler. We address the prohibitive cost of computing the exact likelihood gradient by integrating the HMC trajectories with a scalable gradient approximation, reducing the computational complexity from ${\cal O}(K^5)$ to ${\cal O}(K^2)$, where $K$ is the number of CTMC states. Finally, we demonstrate our method on Bayesian phylogeography inference—a domain where CTMCs are central—showing effectiveness on both synthetic and real datasets.
Show Abstract
Effective population size ($N_e(t)$) is a fundamental parameter in population genetics and phylodynamics that quantifies genetic diversity and reveals demographic history. Coalescent-based methods infer $N_e(t)$ trajectories through time from time-scaled phylogenies reconstructed from molecular sequence data. Understanding the ecological and environmental drivers of population dynamics requires linking $N_e(t)$ to external covariates such as climate or epidemiological variables. Existing approaches typically impose log-linear relationships between covariates and $N_e(t)$, which may fail to capture complex biological processes and can introduce bias when the true relationship is nonlinear. We present a flexible Bayesian framework that integrates covariates into coalescent models with piecewise-constant $N_e(t)$ through a Gaussian process (GP) prior. The GP, a distribution over functions controlled by a kernel with data-driven hyperparameters, naturally accommodates nonlinear covariate effects without restrictive parametric assumptions. This formulation improves estimation of covariate-$N_e(t)$ relationships, mitigates bias when associations are nonlinear, and yields interpretable uncertainty quantification that varies across the covariate space. To balance global covariate-driven patterns with local temporal dynamics, we couple the GP prior with a Gaussian Markov random field that enforces smoothness in $N_e(t)$ trajectories.