Within-host bacterial diversity hinders accurate reconstruction of transmission networks from genomic distance data

PLoS Comput Biol. 2014 Mar 27;10(3):e1003549. doi: 10.1371/journal.pcbi.1003549. eCollection 2014 Mar.


The prospect of using whole genome sequence data to investigate bacterial disease outbreaks has been keenly anticipated in many quarters, and the large-scale collection and sequencing of isolates from cases is becoming increasingly feasible. While sequence data can provide many important insights into disease spread and pathogen adaptation, it remains unclear how successfully they may be used to estimate individual routes of transmission. Several studies have attempted to reconstruct transmission routes using genomic data; however, these have typically relied upon restrictive assumptions, such as a shared topology of the phylogenetic tree and a lack of within-host diversity. In this study, we investigated the potential for bacterial genomic data to inform transmission network reconstruction. We used simulation models to investigate the origins, persistence and onward transmission of genetic diversity, and examined the impact of such diversity on our estimation of the epidemiological relationship between carriers. We used a flexible distance-based metric to provide a weighted transmission network, and used receiver-operating characteristic (ROC) curves and network entropy to assess the accuracy and uncertainty of the inferred structure. Our results suggest that sequencing a single isolate from each case is inadequate in the presence of within-host diversity, and is likely to result in misleading interpretations of transmission dynamics--under many plausible conditions, this may be little better than selecting transmission links at random. Sampling more frequently improves accuracy, but much uncertainty remains, even if all genotypes are observed. While it is possible to discriminate between clusters of carriers, individual transmission routes cannot be resolved by sequence data alone. Our study demonstrates that bacterial genomic distance data alone provide only limited information on person-to-person transmission dynamics.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Area Under Curve
  • Bacteria / genetics*
  • Bacterial Infections / epidemiology
  • Bacterial Infections / transmission
  • Computational Biology
  • Computer Simulation
  • Disease Outbreaks*
  • Entropy
  • Epidemics
  • Genetic Variation*
  • Genomics
  • Genotype
  • Humans
  • Mutation
  • Phylogeny
  • Polymorphism, Single Nucleotide
  • Population Density
  • Population Dynamics
  • ROC Curve
  • Staphylococcus aureus / physiology
  • Stochastic Processes