Bayesian inference of infectious disease transmission from whole-genome sequence data

Mol Biol Evol. 2014 Jul;31(7):1869-79. doi: 10.1093/molbev/msu121. Epub 2014 Apr 8.


Genomics is increasingly being used to investigate disease outbreaks, but an important question remains unanswered--how well do genomic data capture known transmission events, particularly for pathogens with long carriage periods or large within-host population sizes? Here we present a novel Bayesian approach to reconstruct densely sampled outbreaks from genomic data while considering within-host diversity. We infer a time-labeled phylogeny using Bayesian evolutionary analysis by sampling trees (BEAST), and then infer a transmission network via a Monte Carlo Markov chain. We find that under a realistic model of within-host evolution, reconstructions of simulated outbreaks contain substantial uncertainty even when genomic data reflect a high substitution rate. Reconstruction of a real-world tuberculosis outbreak displayed similar uncertainty, although the correct source case and several clusters of epidemiologically linked cases were identified. We conclude that genomics cannot wholly replace traditional epidemiology but that Bayesian reconstructions derived from sequence data may form a useful starting point for a genomic epidemiology investigation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem
  • Communicable Diseases / microbiology
  • Communicable Diseases / transmission*
  • Computational Biology / methods*
  • Computer Simulation
  • Disease Outbreaks
  • Genome, Microbial*
  • Humans
  • Markov Chains
  • Mutation Rate
  • Phylogeny