Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Oct 20;2(2):vew031.
doi: 10.1093/ve/vew031. eCollection 2016 Jul.

Impacts and shortcomings of genetic clustering methods for infectious disease outbreaks

Affiliations

Impacts and shortcomings of genetic clustering methods for infectious disease outbreaks

Art F Y Poon. Virus Evol. .

Abstract

For infectious diseases, a genetic cluster is a group of closely related infections that is usually interpreted as representing a recent outbreak of transmission. Genetic clustering methods are becoming increasingly popular for molecular epidemiology, especially in the context of HIV where there is now considerable interest in applying these methods to prioritize groups for public health resources such as pre-exposure prophylaxis. To date, genetic clustering has generally been performed with ad hoc algorithms, only some of which have since been encoded and distributed as free software. These algorithms have seldom been validated on simulated data where clusters are known, and their interpretation and similarities are not transparent to users outside of the field. Here, I provide a brief overview on the development and inter-relationships of genetic clustering methods, and an evaluation of six methods on data simulated under an epidemic model in a risk-structured population. The simulation analysis demonstrates that the majority of clustering methods are systematically biased to detect variation in sampling rates among subpopulations, not variation in transmission rates. I discuss these results in the context of previous work and the implications for public health applications of genetic clustering.

Keywords: genetic clustering; infectious diseases; molecular epidemiology; phylodynamics.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
A hierarchical clustering dendrogram of nonparametric genetic clustering methods. This dendrogram was generated from a binary character state matrix that encodes ten different features for nine categories of nonparametric methods. Internal nodes of the dendrogram are labeled with features that distinguish the categories below the node. Each category is annotated with a small number of citations to publications that either describe the method or provide examples of its usage; these are not meant to be exhaustive lists.
Figure 2.
Figure 2.
Forward simulation of trees from SIR model. Branch lengths are in units of the simulation processes. (A) Distribution of mean internal and terminal branch lengths from the two subpopulations under four different transmission and sampling scenarios. Upper and lower case symbols correspond to the majority and minority subpopulations, respectively, where the latter has the potential to form clusters in the tree. Lower numbers of sampled lineages from minority subpopulations under “control” and faster “transmission” scenarios resulted in greater dispersion in estimates of mean branch lengths. (B) An example tree simulated under a scenario where both transmission and sampling rates are elevated in the minority subpopulation (cyan).
Figure 3.
Figure 3.
Receiver operator characteristic (ROC) curves summarizing the performance of six clustering methods on simulated data. An ideal method would reach the extreme upper-left of a plot region, with a zero false positive rate (FPR) and 100% true positive rate (TPR). The FPR = TPR line indicates the expected performance of a random classifier. A reference point (cross) at FPR = 20% and TPR = 80% is drawn in each plot to facilitate comparisons across methods. The methods were evaluated on ten replicate phylogenies generated under three scenarios in which the minority population exhibited: faster sampling rates (dashed, green); faster transmission rates (dotted, blue); or both (solid, red). For methods that use bootstrap support values, ROC curves are displayed for two different support cutoffs (labeled by percentiles to the right of each curve). Results obtained using Cluster Picker with a bootstrap support cutoff of 99% were not qualitatively different from the results under a cutoff of 95%. There was no tuning parameter used for the Gap Procedure method, so the results for each replicate tree were plotted directly on the graph.

Similar articles

  • A model-based clustering method to detect infectious disease transmission outbreaks from sequence variation.
    McCloskey RM, Poon AFY. McCloskey RM, et al. PLoS Comput Biol. 2017 Nov 13;13(11):e1005868. doi: 10.1371/journal.pcbi.1005868. eCollection 2017 Nov. PLoS Comput Biol. 2017. PMID: 29131825 Free PMC article.
  • Biased phylodynamic inferences from analysing clusters of viral sequences.
    Dearlove BL, Xiang F, Frost SDW. Dearlove BL, et al. Virus Evol. 2017 Aug 3;3(2):vex020. doi: 10.1093/ve/vex020. eCollection 2017 Jul. Virus Evol. 2017. PMID: 28852573 Free PMC article.
  • Public health in genetic spaces: a statistical framework to optimize cluster-based outbreak detection.
    Chato C, Kalish ML, Poon AFY. Chato C, et al. Virus Evol. 2020 Mar 13;6(1):veaa011. doi: 10.1093/ve/veaa011. eCollection 2020 Jan. Virus Evol. 2020. PMID: 32190349 Free PMC article.
  • Genetic Cluster Analysis for HIV Prevention.
    Grabowski MK, Herbeck JT, Poon AFY. Grabowski MK, et al. Curr HIV/AIDS Rep. 2018 Apr;15(2):182-189. doi: 10.1007/s11904-018-0384-1. Curr HIV/AIDS Rep. 2018. PMID: 29460226 Free PMC article. Review.
  • Tuberculosis.
    Bloom BR, Atun R, Cohen T, Dye C, Fraser H, Gomez GB, Knight G, Murray M, Nardell E, Rubin E, Salomon J, Vassall A, Volchenkov G, White R, Wilson D, Yadav P. Bloom BR, et al. In: Holmes KK, Bertozzi S, Bloom BR, Jha P, editors. Major Infectious Diseases. 3rd edition. Washington (DC): The International Bank for Reconstruction and Development / The World Bank; 2017 Nov 3. Chapter 11. In: Holmes KK, Bertozzi S, Bloom BR, Jha P, editors. Major Infectious Diseases. 3rd edition. Washington (DC): The International Bank for Reconstruction and Development / The World Bank; 2017 Nov 3. Chapter 11. PMID: 30212088 Free Books & Documents. Review.

Cited by

References

    1. Aldous J. L., et al. (2012) ‘Characterizing HIV Transmission Networks across the United States’, Clinical Infectious Disease, 55, 1135–43. - PMC - PubMed
    1. Alizon S., Fraser C. (2013) ‘Within-Host and Between-Host Evolutionary Rates across the HIV-1 Genome’, Retrovirology, 10, 49.. - PMC - PubMed
    1. Balfe P., et al. (1990) ‘Concurrent Evolution of Human Immunodeficiency Virus Type 1 in Patients Infected from the Same Source: Rate of Sequence Change and Low Frequency of Inactivating Mutations’, Journal of Virology, 64, 6221–33. - PMC - PubMed
    1. Brenner B. G., et al. (2007) ‘High Rates of Forward Transmission Events after Acute/Early HIV-1 Infection’, Journal of Infectious Diseases, 195, 951–9. - PubMed
    1. Buchman T. G., et al. (1978) ‘Restriction Endonuclease Fingerprinting of Herpes Simplex Virus DNA: A Novel Epidemiological Tool Applied to a Nosocomial Outbreak’, Journal of Infectious Diseases, 138, 488–98. - PubMed

LinkOut - more resources