Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2018 May 23;14(5):e1007279.
doi: 10.1371/journal.pgen.1007279. eCollection 2018 May.

Identity-by-descent analyses for measuring population dynamics and selection in recombining pathogens

Affiliations
Comparative Study

Identity-by-descent analyses for measuring population dynamics and selection in recombining pathogens

Lyndal Henden et al. PLoS Genet. .

Abstract

Identification of genomic regions that are identical by descent (IBD) has proven useful for human genetic studies where analyses have led to the discovery of familial relatedness and fine-mapping of disease critical regions. Unfortunately however, IBD analyses have been underutilized in analysis of other organisms, including human pathogens. This is in part due to the lack of statistical methodologies for non-diploid genomes in addition to the added complexity of multiclonal infections. As such, we have developed an IBD methodology, called isoRelate, for analysis of haploid recombining microorganisms in the presence of multiclonal infections. Using the inferred IBD status at genomic locations, we have also developed a novel statistic for identifying loci under positive selection and propose relatedness networks as a means of exploring shared haplotypes within populations. We evaluate the performance of our methodologies for detecting IBD and selection, including comparisons with existing tools, then perform an exploratory analysis of whole genome sequencing data from a global Plasmodium falciparum dataset of more than 2500 genomes. This analysis identifies Southeast Asia as having many highly related isolates, possibly as a result of both reduced transmission from intensified control efforts and population bottlenecks following the emergence of antimalarial drug resistance. Many signals of selection are also identified, most of which overlap genes that are known to be associated with drug resistance, in addition to two novel signals observed in multiple countries that have yet to be explored in detail. Additionally, we investigate relatedness networks over the selected loci and determine that one of these sweeps has spread between continents while the other has arisen independently in different countries. IBD analysis of microorganisms using isoRelate can be used for exploring population structure, positive selection and haplotype distributions, and will be a valuable tool for monitoring disease control and elimination efforts of many diseases.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Power and accuracy of isoRelate to detect IBD in simulated sequencing data for P. falciparum.
The performance results are segregated by the clonal-fraction of the related clone in the isolate. Clones that make up the highest proportion of an isolate are referred to as the major clone, while those that make up the smallest proportion are the minor clone. For MOI = 3 isolates, the clone that is neither the major nor the minor clone is referred to as the middle clone.
Fig 2
Fig 2. Power and accuracy of isoRelate given a uniform allele frequency spectrum.
The performance results are segregated by the clonal-fraction of the related clone in the isolate. Clones that make up the highest proportion of an isolate are referred to as the major clone, while those that make up the smallest proportion are the minor clone. For MOI = 3 isolates, the clone that is neither the major nor the minor clone is referred to as the middle clone.
Fig 3
Fig 3. Power and accuracy results of isoRelate, iHS and HaploPS in detecting complex sweeps.
The performance results are segregated by sweep type, where the results for selection on standing variation are shown for a selection coefficient of 0.1. Power is defined as the proportion of sweeps (calculated over 10 reps) with at least one 20 kb bin within 50 kb either side of the selected SNP that either contains three or more significant SNPs (isoRelate and iHS, alpha = 5%), or is in the top 1% of bins with respect to the average number of haplotype counts per bin (haploPS), as a function of the number of generations since the sweep was introduced. Accuracy is calculated as either the proportion of 20 kb bins with at least three significant SNPs (isoRelate and iHS) that are within 50kb of the selected SNP or the proportion of 20 kb bins within the top 1% of bins with respect to of haplotype counts (haploPS), that are within 50kb of the selected SNP, as a function of the number of generations since the sweep was introduced. If there are no bins with at least three significant SNPs for any of the 10 reps then the accuracy is set to NA.
Fig 4
Fig 4. The proportion of pairs within each country who are IBD at each SNP.
Chromosome boundaries are indicated by grey dashed vertical lines and positive control genes are identified by tick marks on the top x-axis. Countries that are part of the African continent are shades of red and orange while countries in Southeast Asia are shades of blue and Papua New Guinea is pink.
Fig 5
Fig 5. Relatedness network for pairs of isolates identified as having high proportions of IBD sharing.
Each node identifies a unique isolate and an edge is drawn between two isolates if they share more than 90% of their genome IBD. Isolates with MOI = 1 are represented by circles while isolates with MOI > 1 are represented by squares. There are 264 clusters in this network comprising 805 isolates (out of 2,377 isolates) in total. Isolates that do not share more than 90% of their genome IBD with any other isolate are omitted from the network.
Fig 6
Fig 6. Selection signals from isoRelate on Pf3k dataset.
–log10(p-values) of XiR calculated by transforming and normalizing the IBD proportions within each country. Dashed horizontal lines represent a 5% singnificance threshold. Grey dashed vertical lines indicate chromosome boundaries. Positive control genes are identified by gene symbol and tick marks on the upper x-axis.
Fig 7
Fig 7. Relatedness network for pairs of isolates inferred IBD over Pfcrt.
Each node identifies a unique isolate and an edge is drawn between two isolates if they were inferred either partially or completely IBD over Pfcrt. Isolates with MOI = 1 are represented by circles while isolates with MOI > 1 are represented by squares. There are 178 clusters in this network comprising of 1,563 isolates in total, with the largest cluster containing 1,134 isolates. Isolates that are not IBD over Pfcrt are omitted from the network. (A) Isolates are coloured according to country. (B) Isolates are coloured if they carry the K76T mutation associated with chloroquine resistance.
Fig 8
Fig 8. Relatedness network for pairs of isolates inferred IBD over Pfk13.
Each node identifies a unique isolate and an edge is drawn between two isolates if they were inferred either partially or completely IBD over Pfk13. Isolates with MOI = 1 are represented by circles while isolates with MOI > 1 are represented by squares. There are 242 clusters in this network comprising of 1,148 isolates in total, with the largest cluster containing 335 isolates. Isolates that are not IBD over Pfk13are omitted from the network. (A) Isolates are coloured according to country. (B) Isolates are coloured if they carry the C580Y mutation associated with artemisinin resistance.
Fig 9
Fig 9. Relatedness networks for pairs of isolates inferred IBD over the interval chr6: 1,001,000–1,300,000.
Each node identifies a unique isolate and an edge is drawn between two isolates if they were inferred IBD anywhere over this interval. Isolates with MOI = 1 are represented by circles while isolates with MOI > 1 are represented by squares. There are 93 clusters in this network comprising of 1,862 isolates in total, with the largest cluster containing 1,643 isolates. Isolates that are not IBD over this interval omitted from the network.
Fig 10
Fig 10. Relatedness network for pairs of isolates inferred IBD over the interval chr12: 700,000–1,100,000.
Each node identifies a unique isolate and an edge is drawn between two isolates if they were inferred IBD anywhere over this interval. Isolates with MOI = 1 are represented by circles while isolates with MOI > 1 are represented by squares. There are 149 clusters in this network comprising of 1,569 isolates in total, with the largest cluster containing 1,089 isolates. Isolates that are not IBD over this interval omitted from the network.
Fig 11
Fig 11. Selection signals in Ghana stratified by pairs who are IBD or non-IBD over Pfmdr1.
Pairs that are IBD over Pfmdr1 represent the red signal while pairs that are not IBD over Pfmdr1 represent the black signal. The dashed horizontal line represent a 5% singificance threshold and the dashed vertical lines idenitfies the chromosome boundaries.

Similar articles

Cited by

References

    1. Browning S, Browning B. Identity by descent between distant relatives: detection and applications. Annu Rev Genet. 2012;46:617–33. doi: 10.1146/annurev-genet-110711-155534 - DOI - PubMed
    1. Thompson E. Identity by descent: variation in meiosis, across genomes, and in populations. Genetics. 2013;194(2):301–26. doi: 10.1534/genetics.112.148825 - DOI - PMC - PubMed
    1. Albrechtsen A, Sand Korneliussen T, Moltke I, van Overseem Hansen T, Nielsen FC, Nielsen R. Relatedness mapping and tracts of relatedness for genome-wide data in the presence of linkage disequilibrium. Genet Epidemiol. 2009;33(3):266–74. doi: 10.1002/gepi.20378 - DOI - PubMed
    1. Pemberton TJ, Wang C, Li JZ, Rosenberg NA. Inference of unexpected genetic relatedness among individuals in HapMap Phase III. Am J Hum Genet. 2010;87(4):457–64. doi: 10.1016/j.ajhg.2010.08.014 - DOI - PMC - PubMed
    1. Albrechtsen A, Moltke I, Nielsen R. Natural selection and the distribution of identity-by-descent in the human genome. Genetics. 2010;186(1):295–308. doi: 10.1534/genetics.110.113977 - DOI - PMC - PubMed

Publication types

MeSH terms

Grants and funding

This work was supported by National Health and Medical Research Council (NHMRC) Program Grant (APP1054618) and NHMRC Senior Research Fellowship (1002098) to MB and a NHMRC Project Grant (APP1027108) awarded to AB. LH was supported by the John and Patricia Farrant Scholarship and the Australian Postgraduate Award Scholarship. SL was also supported by the Australian Postgraduate Award Scholarship. This work was also supported by Victorian State Government Operational Infrastructure Support and the Australian Government NHMRC IRISS funding. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.