Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 3, 211
eCollection

Clinical Implications of Human Population Differences in Genome-Wide Rates of Functional Genotypes

Affiliations

Clinical Implications of Human Population Differences in Genome-Wide Rates of Functional Genotypes

Ali Torkamani et al. Front Genet.

Abstract

There have been a number of recent successes in the use of whole genome sequencing and sophisticated bioinformatics techniques to identify pathogenic DNA sequence variants responsible for individual idiopathic congenital conditions. However, the success of this identification process is heavily influenced by the ancestry or genetic background of a patient with an idiopathic condition. This is so because potential pathogenic variants in a patient's genome must be contrasted with variants in a reference set of genomes made up of other individuals' genomes of the same ancestry as the patient. We explored the effect of ignoring the ancestries of both an individual patient and the individuals used to construct reference genomes. We pursued this exploration in two major steps. We first considered variation in the per-genome number and rates of likely functional derived (i.e., non-ancestral, based on the chimp genome) single nucleotide variants and small indels in 52 individual whole human genomes sampled from 10 different global populations. We took advantage of a suite of computational and bioinformatics techniques to predict the functional effect of over 24 million genomic variants, both coding and non-coding, across these genomes. We found that the typical human genome harbors ∼5.5-6.1 million total derived variants, of which ∼12,000 are likely to have a functional effect (∼5000 coding and ∼7000 non-coding). We also found that the rates of functional genotypes per the total number of genotypes in individual whole genomes differ dramatically between human populations. We then created tables showing how the use of comparator or reference genome panels comprised of genomes from individuals that do not have the same ancestral background as a patient can negatively impact pathogenic variant identification. Our results have important implications for clinical sequencing initiatives.

Keywords: clinical sequencing; congenital disease; population genetics; whole genome sequencing.

Figures

Figure 1
Figure 1
Boxplots reflecting the differences in the number and rates of specific variant types across the 10 populations. (A) Number of loci on individual genomes with at least one non-reference allele (i.e., homozygous or heterozygous non-reference allele genotypes); (B) Number of coding loci on individual genomes with at least one non-reference allele that results in a non-synonymous amino acid substitution that is predicted to have functional effect. (C) Number of loci on individual genomes with at least one derived allele (i.e., homozygous or heterozygous derived allele genotypes); (D) Number of coding loci on individual genomes with at least one derived allele that results in a non-synonymous amino acid substitution that is predicted to have functional effect.
Figure 2
Figure 2
Boxplots reflecting the differences in the number and rates of specific variant types across the 10 populations. (A) Number of loci on individual genomes that are homozygous for a derived allele; (B) Number of coding loci on individual genomes that are homozygous for a derived allele that results in a non-synonymous amino acid substitution that is predicted to have functional effect. (C) Number of loci on individual genomes that are homozygous for a derived allele that is predicted to have a functional effect; (D) The rate of loci on individual genomes that are homozygous for a derived allele that is predicted to have a functional effect (relative to all loci on individual genomes with at least one derived allele).
Figure 3
Figure 3
Relationship between the number of ns cSNVs with polyphen 2.0 scores >0.8 that would be declared as novel if a European individual’s ns cSNVs were compared to a reference panel made up of European, African or Asian individuals (A) or if an African individual’s ns cSNVs were compared to a reference panel made up of European (light dashed and dotted line), African (black solid line), or Asian individuals (dashed light line) (B) as a function of the number of individuals in the panel. Standard errors were computed by taking a randomly choosing the number of individuals from our collection of European, African, and Asian genomes given on the x axis.
Figure A1
Figure A1
Multidimensional scaling plot of the similarity of the 54 unrelated individuals with complete genome data. (A) The 54 individuals (black dots) overlaid on 4,123 individuals of known ancestry based on 16,411 ancestry informative markers. The individuals with known ancestries were obtained from public repositories and are color coded by continent with shading indicating subpopulations within those continents (Blue, Europeans; Yellow, Yorubans; Purple, East Asians; Red: Native Americans; Green: Central Asians; Grey: African Americans). (B) Multidimensional scaling plot of the similarity of the content of 54 individuals with complete individual genomes without the overlay of other individuals. Color coding for these 54 individuals based on their known ancestries is given in the inset.
Figure A2
Figure A2
PCA plot of the similarity of the 54 unrelated individuals with complete genome data based on 19,208,882 SNVs obtained from the complete sequencing data without regard to a reference panel of individuals with global ancestries (shading according to population is provided by the inset).
Figure A3
Figure A3
Relationship between the number of ns cSNVs with polyphen 2.0 scores > 0.8 that would be declared as novel an African individual’s ns cSNVs were compared to a reference panel made up of European, African Yoruban or African non-Yoruban individuals as a function of the number of individuals in the panel. Standard errors were computed by taking a randomly choosing the number of individuals from our collection of European, African Yoruban, or African non-Yoruban individuals given on the x axis.

Similar articles

See all similar articles

Cited by 15 articles

See all "Cited by" articles

References

    1. Andersen M. C., Engström P. G., Lithwick S., Arenillas D., Eriksson P., Lenhard B., et al. (2008). In silico detection of sequence variations modifying transcriptional regulation. PLoS Comput. Biol. 4, e5.10.1371/journal.pcbi.0040005 - DOI - PMC - PubMed
    1. Clifford R. J., Edmonson M. N., Nguyen C., Buetow K. H. (2004). Large-scale analysis of non-synonymous coding region single nucleotide polymorphisms. Bioinformatics 20, 1006–101410.1126/scitranslmed.3002243 - DOI - PubMed
    1. Hofacker I. L. (2003). Vienna RNA secondary structure server. Nucleic Acids Res. 31, 3429–343110.1101/gad.1968411 - DOI - PMC - PubMed
    1. Korhonen J., Martinmäki P., Pizzi C., Rastas P., Ukkonen E. (2009). MOODS: fast search for position weight matrix matches in DNA sequences. Bioinformatics 25, 3181–318210.1038/nrn2920 - DOI - PMC - PubMed
    1. Lewis B. P., Burge C. B., Bartel D. P. (2005). Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120, 15–2010.1101/gr.092841.109 - DOI - PubMed

LinkOut - more resources

Feedback