Sequencing pathogen samples during a communicable disease outbreak is becoming an increasingly common procedure in epidemiologic investigations. Identifying who infected whom sheds considerable light on transmission patterns, high-risk settings and subpopulations, and the effectiveness of infection control. Genomic data shed new light on transmission dynamics and can be used to identify clusters of individuals likely to be linked by direct transmission. However, identification of individual routes of infection via single genome samples typically remains uncertain. We investigated the potential of deep sequence data to provide greater resolution on transmission routes, via the identification of shared genomic variants. We assessed several easily implemented methods to identify transmission routes using both shared variants and genetic distance, demonstrating that shared variants can provide considerable additional information in most scenarios. While shared-variant approaches identify relatively few links in the presence of a small transmission bottleneck, these links are highly accurate. Furthermore, we propose a hybrid approach that also incorporates phylogenetic distance to provide greater resolution. We applied our methods to data collected during the 2014 Ebola outbreak, identifying several likely routes of transmission. Our study highlights the power of data from deep sequencing of pathogens as a component of outbreak investigation and epidemiologic analyses.
Keywords: Ebola virus; epidemics; genomics; infection control; infectious disease outbreaks; molecular epidemiology.
© The Author(s) 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health.