Reconstructing disease outbreaks from genetic data: a graph approach

Heredity (Edinb). 2011 Feb;106(2):383-90. doi: 10.1038/hdy.2010.78. Epub 2010 Jun 16.


Epidemiology and public health planning will increasingly rely on the analysis of genetic sequence data. In particular, genetic data coupled with dates and locations of sampled isolates can be used to reconstruct the spatiotemporal dynamics of pathogens during outbreaks. Thus far, phylogenetic methods have been used to tackle this issue. Although these approaches have proved useful for informing on the spread of pathogens, they do not aim at directly reconstructing the underlying transmission tree. Instead, phylogenetic models infer most recent common ancestors between pairs of isolates, which can be inadequate for densely sampled recent outbreaks, where the sample includes ancestral and descendent isolates. In this paper, we introduce a novel method based on a graph approach to reconstruct transmission trees directly from genetic data. Using simulated data, we show that our approach can efficiently reconstruct genealogies of isolates in situations where classical phylogenetic approaches fail to do so. We then illustrate our method by analyzing data from the early stages of the swine-origin A/H1N1 influenza pandemic. Using 433 isolates sequenced at both the hemagglutinin and neuraminidase genes, we reconstruct the likely history of the worldwide spread of this new influenza strain. The presented methodology opens new perspectives for the analysis of genetic data in the context of disease outbreaks.

MeSH terms

  • Computer Simulation*
  • Hemagglutinins / genetics
  • Humans
  • Influenza A Virus, H1N1 Subtype / genetics*
  • Influenza A Virus, H1N1 Subtype / isolation & purification
  • Influenza, Human / epidemiology*
  • Influenza, Human / virology
  • Models, Genetic*
  • Neuraminidase / genetics
  • Pandemics
  • Pedigree
  • Phylogeny
  • Poisson Distribution
  • Population Dynamics


  • Hemagglutinins
  • Neuraminidase