Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Sep 5;93(3):411-21.
doi: 10.1016/j.ajhg.2013.07.002. Epub 2013 Aug 8.

Mapping the Human Reference Genome's Missing Sequence by Three-Way Admixture in Latino Genomes

Affiliations
Free PMC article

Mapping the Human Reference Genome's Missing Sequence by Three-Way Admixture in Latino Genomes

Giulio Genovese et al. Am J Hum Genet. .
Free PMC article

Abstract

A principal obstacle to completing maps and analyses of the human genome involves the genome's "inaccessible" regions: sequences (often euchromatic and containing genes) that are isolated from the rest of the euchromatic genome by heterochromatin and other repeat-rich sequence. We describe a way to localize these sequences by using ancestry linkage disequilibrium in populations that derive ancestry from at least three continents, as is the case for Latinos. We used this approach to map the genomic locations of almost 20 megabases of sequence unlocalized or missing from the current human genome reference (NCBI Genome GRCh37)-a substantial fraction of the human genome's remaining unmapped sequence. We show that the genomic locations of most sequences that originated from fosmids and larger clones can be admixture mapped in this way, by using publicly available whole-genome sequence data. Genome assembly efforts and future builds of the human genome reference will be strongly informed by this localization of genes and other euchromatic sequences that are embedded within highly repetitive pericentromeric regions.

Figures

Figure 1
Figure 1
Ancestry Proportions for Admixed Samples from the 1000 Genomes Project Phase 1 Abbreviations for ancestral populations are as follows: Eur, European; AFR, West African; NAT, Native American; UNK, Unknown. Abbreviations from the 1000 Genomes Project: ASW, African American; CLM, Colombian; MXL, Mexican; PUR, Puerto Rican.
Figure 2
Figure 2
Admixture Mapping Flowchart for the LATOOLS Software Tool Described in This Study Local ancestry deconvolution for multiple samples can be input in LATOOLS in unionbedg format, which can be easily generated with the bedtools suite starting from single sample deconvolutions in BedGraph format. Genotype likelihoods can be input from a VCF file, without further processing if directly generated with GATK.
Figure 3
Figure 3
Ancestral Allele Frequencies Spectrum for Mapped SNPs and Power Estimates for the Mappability of a SNP Given Its Ancestral Allele Frequencies (A–C) Ancestral frequencies estimates for SNPs from unlocalized scaffolds that mapped with a LOD score greater than or equal to 6. (D–F) Probability of obtaining a LOD score greater than or equal to 6 for a SNP monomorphic for the reference allele for one ancestral population and with given alternate allele frequencies for the two other populations on the x and y axes.
Figure 4
Figure 4
Percentage of hs37d5 Unlocalized Contigs that Were Localized in This Study as a Function of Contig Size Percentage of localized contigs, in blue, and localized sequence, in green, admixture mapped among all unlocalized contigs from the hs37d5 reference larger than a given size using sequence data from the 242 admixed samples from the 1000 Genomes Project Phase 1.
Figure 5
Figure 5
Regions of Excess Coverage and Mapping for Unlocalized Scaffolds from hs37d5
Figure 6
Figure 6
A Common Structural Polymorphism at 16p11.2 Sequence read coverage for 826 samples from the 1000 Genomes Project Phase 1 within regions 16p11.2, 6p25.3, and 20q11.2. For clarity, 42 samples that were not classified as copy number two over the two mostly unaffected windows chr16: 32,699,009–32,829,564 and chr16: 33,142,816–33,339,320 were excluded, because these may harbor larger and rarer CNVs. Median coverage for samples genotyped as CN = 2,3,4 over chr6: 257,000–295,000 is displayed. Notably, a strong correlation emerges between coverage over the genotyped region and sequence within window chr16: 32,258,540–32,659,102. Coordinates in Mbp on the horizontal axis are with respect to GRCh37.

Similar articles

  • Using population admixture to help complete maps of the human genome.
    Genovese G, Handsaker RE, Li H, Altemose N, Lindgren AM, Chambert K, Pasaniuc B, Price AL, Reich D, Morton CC, Pollak MR, Wilson JG, McCarroll SA. Genovese G, et al. Nat Genet. 2013 Apr;45(4):406-14, 414e1-2. doi: 10.1038/ng.2565. Epub 2013 Feb 24. Nat Genet. 2013. PMID: 23435088 Free PMC article.
  • Markers for mapping by admixture linkage disequilibrium in African American and Hispanic populations.
    Smith MW, Lautenberger JA, Shin HD, Chretien JP, Shrestha S, Gilbert DA, O'Brien SJ. Smith MW, et al. Am J Hum Genet. 2001 Nov;69(5):1080-94. doi: 10.1086/323922. Am J Hum Genet. 2001. PMID: 11590548 Free PMC article.
  • The sequence of the human genome.
    Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, Levine AJ, Roberts RJ, Simon M, Slayman C, Hunkapiller M, Bolanos R, Delcher A, Dew I, Fasulo D, Flanigan M, Florea L, Halpern A, Hannenhalli S, Kravitz S, Levy S, Mobarry C, Reinert K, Remington K, Abu-Threideh J, Beasley E, Biddick K, Bonazzi V, Brandon R, Cargill M, Chandramouliswaran I, Charlab R, Chaturvedi K, Deng Z, Di Francesco V, Dunn P, Eilbeck K, Evangelista C, Gabrielian AE, Gan W, Ge W, Gong F, Gu Z, Guan P, Heiman TJ, Higgins ME, Ji RR, Ke Z, Ketchum KA, Lai Z, Lei Y, Li Z, Li J, Liang Y, Lin X, Lu F, Merkulov GV, Milshina N, Moore HM, Naik AK, Narayan VA, Neelam B, Nusskern D, Rusch DB, Salzberg S, Shao W, Shue B, Sun J, Wang Z, Wang A, Wang X, Wang J, Wei M, Wides R, Xiao C, Yan C, Yao A, Ye J, Zhan M, Zhang W, Zhang H, Zhao Q, Zheng L, Zhong F, Zhong W, Zhu S, Zhao S, Gilbert D, Baumhueter S, Spier G, Carter C, Cravchik A, Woodage T, Ali F, An H, Awe A, Baldwin D, Baden H, Barnstead M, Barrow I, Beeson K, Busam D, Carver A, Center A, Cheng ML, Curry L, Danaher S, Davenport L, Desilets R, Dietz S, Dodson K, Doup L, Ferriera S, Garg N, Gluecksmann A, Hart B, Haynes J, Haynes C, Heiner C, Hladun S, Hostin D, Houck J, Howland T, Ibegwam C, Johnson J, Kalush F, Kline L, Koduru S, Love A, Mann F, May D, McCawley S, McIntosh T, McMullen I, Moy M, Moy L, Murphy B, Nelson K, Pfannkoch C, Pratts E, Puri V, Qureshi H, Reardon M, Rodriguez R, Rogers YH, Romblad D, Ruhfel B, Scott R, Sitter C, Smallwood M, Stewart E, Strong R, Suh E, Thomas R, Tint NN, Tse S, Vech C, Wang G, Wetter J, Williams S, Williams M, Windsor S, Winn-Deen E, Wolfe K, Zaveri J, Zaveri K, Abril JF, Guigó R, Campbell MJ, Sjolander KV, Karlak B, Kejariwal A, Mi H, Lazareva B, Hatton T, Narechania A, Diemer K, Muruganujan A, Guo N, Sato S, Bafna V, Istrail S, Lippert R, Schwartz R, Walenz B, Yooseph S, Allen D, Basu A, Baxendale J, Blick L, Caminha M, Carnes-Stine J, Caulk P, Chiang YH, Coyne M, Dahlke C, Mays A, Dombroski M, Donnelly M, Ely D, Esparham S, Fosler C, Gire H, Glanowski S, Glasser K, Glodek A, Gorokhov M, Graham K, Gropman B, Harris M, Heil J, Henderson S, Hoover J, Jennings D, Jordan C, Jordan J, Kasha J, Kagan L, Kraft C, Levitsky A, Lewis M, Liu X, Lopez J, Ma D, Majoros W, McDaniel J, Murphy S, Newman M, Nguyen T, Nguyen N, Nodell M, Pan S, Peck J, Peterson M, Rowe W, Sanders R, Scott J, Simpson M, Smith T, Sprague A, Stockwell T, Turner R, Venter E, Wang M, Wen M, Wu D, Wu M, Xia A, Zandieh A, Zhu X. Venter JC, et al. Science. 2001 Feb 16;291(5507):1304-51. doi: 10.1126/science.1058040. Science. 2001. PMID: 11181995
  • Coalescent methods for fine-scale disease-gene mapping.
    Morris AP. Morris AP. Methods Mol Biol. 2007;376:123-40. doi: 10.1007/978-1-59745-389-9_9. Methods Mol Biol. 2007. PMID: 17984542 Review.
  • Accelerating the search for the missing proteins in the human proteome.
    Baker MS, Ahn SB, Mohamedali A, Islam MT, Cantor D, Verhaert PD, Fanayan S, Sharma S, Nice EC, Connor M, Ranganathan S. Baker MS, et al. Nat Commun. 2017 Jan 24;8:14271. doi: 10.1038/ncomms14271. Nat Commun. 2017. PMID: 28117396 Free PMC article. Review.
See all similar articles

Cited by 16 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback