Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Oct 18;449(7164):851-61.
doi: 10.1038/nature06258.

A Second Generation Human Haplotype Map of Over 3.1 Million SNPs

International HapMap ConsortiumKelly A FrazerDennis G BallingerDavid R CoxDavid A HindsLaura L StuveRichard A GibbsJohn W BelmontAndrew BoudreauPaul HardenbolSuzanne M LealShiran PasternakDavid A WheelerThomas D WillisFuli YuHuanming YangChangqing ZengYang GaoHaoran HuWeitao HuChaohua LiWei LinSiqi LiuHao PanXiaoli TangJian WangWei WangJun YuBo ZhangQingrun ZhangHongbin ZhaoHui ZhaoJun ZhouStacey B GabrielRachel BarryBrendan BlumenstielAmy CamargoMatthew DefeliceMaura FaggartMary GoyetteSupriya GuptaJamie MooreHuy NguyenRobert C OnofrioMelissa ParkinJessica RoyErich StahlEllen WinchesterLiuda ZiaugraDavid AltshulerYan ShenZhijian YaoWei HuangXun ChuYungang HeLi JinYangfan LiuYayun ShenWeiwei SunHaifeng WangYi WangYing WangXiaoyan XiongLiang XuMary M Y WayeStephen K W TsuiHong XueJ Tze-Fei WongLuana M GalverJian-Bing FanKevin GundersonSarah S MurrayArnold R OliphantMark S CheeAlexandre MontpetitFanny ChagnonVincent FerrettiMartin LeboeufJean-François OlivierMichael S PhillipsStéphanie RoumyClémentine SalléeAndrei VernerThomas J HudsonPui-Yan KwokDongmei CaiDaniel C KoboldtRaymond D MillerLudmila PawlikowskaPatricia Taillon-MillerMing XiaoLap-Chee TsuiWilliam MakYou Qiang SongPaul K H TamYusuke NakamuraTakahisa KawaguchiTakuya KitamotoTakashi MorizonoAtsushi NagashimaYozo OhnishiAkihiro SekineToshihiro TanakaTatsuhiko TsunodaPanos DeloukasChristine P BirdMarcos DelgadoEmmanouil T DermitzakisRhian GwilliamSarah HuntJonathan MorrisonDon PowellBarbara E StrangerPamela WhittakerDavid R BentleyMark J DalyPaul I W de BakkerJeff BarrettYves R ChretienJulian MallerSteve McCarrollNick PattersonItsik Pe'erAlkes PriceShaun PurcellDaniel J RichterPardis SabetiRicha SaxenaStephen F SchaffnerPak C ShamPatrick VarillyDavid AltshulerLincoln D SteinLalitha KrishnanAlbert Vernon SmithMarcela K Tello-RuizGudmundur A ThorissonAravinda ChakravartiPeter E ChenDavid J CutlerCarl S KashukShin LinGonçalo R AbecasisWeihua GuanYun LiHeather M MunroZhaohui Steve QinDaryl J ThomasGilean McVeanAdam AutonLeonardo BottoloNiall CardinSusana EyheramendyColin FreemanJonathan MarchiniSimon MyersChris SpencerMatthew StephensPeter DonnellyLon R CardonGeraldine ClarkeDavid M EvansAndrew P MorrisBruce S WeirTatsuhiko TsunodaJames C MullikinStephen T SherryMichael FeoloAndrew SkolHoucan ZhangChangqing ZengHui ZhaoIchiro MatsudaYoshimitsu FukushimaDarryl R MacerEiko SudaCharles N RotimiClement A AdebamowoIke AjayiToyin AniagwuPatricia A MarshallChibuzor NkwodimmahCharmaine D M RoyalMark F LeppertMissy DixonAndy PeifferRenzong QiuAlastair KentKazuto KatoNorio NiikawaIsaac F AdewoleBartha M KnoppersMorris W FosterEllen Wright ClaytonJessica WatkinRichard A GibbsJohn W BelmontDonna MuznyLynne NazarethErica SodergrenGeorge M WeinstockDavid A WheelerImtaz YakubStacey B GabrielRobert C OnofrioDaniel J RichterLiuda ZiaugraBruce W BirrenMark J DalyDavid AltshulerRichard K WilsonLucinda L FultonJane RogersJohn BurtonNigel P CarterChristopher M CleeMark GriffithsMatthew C JonesKirsten McLayRobert W PlumbMark T RossSarah K SimsDavid L WilleyZhu ChenHua HanLe KangMartin GodboutJohn C WallenburgPaul L'ArchevêqueGuy BellemareKoji SaekiHongguang WangDaochang AnHongbo FuQing LiZhen WangRenwu WangArthur L HoldenLisa D BrooksJean E McEwenMark S GuyerVivian Ota WangJane L PetersonMichael ShiJack SpiegelLawrence M SungLynn F ZachariaFrancis S CollinsKaren KennedyRuth JamiesonJohn Stewart
Free PMC article

A Second Generation Human Haplotype Map of Over 3.1 Million SNPs

International HapMap Consortium et al. Nature. .
Free PMC article

Abstract

We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25-35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10-30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations.

Figures

Figure 1
Figure 1. SNP density in the Phase II HapMap
a, SNP density across the genome. Colours indicate the number of polymorphic SNPs per kb in the consensus data set. Gaps in the assembly are shown as white. b, Example of the fine-scale structure of SNP density for a 100-kb region on chromosome 17 showing Perlegen amplicons (black bars), polymorphic Phase I SNPs in the consensus data set (red triangles) and polymorphic Phase II SNPs in the consensus data set (blue triangles). Note the relatively even spacing of Phase I SNPs. c, The distribution of polymorphic SNPs in the consensus Phase II HapMap data (blue line and left-hand axis) around coding regions. Also shown is the density of SNPs in dbSNP release 125 around genes (red line and right-hand axis). Values were calculated separately 5′ from the coding start site (the left dotted line) and 3′ from the coding end site (right dotted line) and were joined at the median midpoint position of the coding unit (central dotted line).
Figure 2
Figure 2. Haplotype structure and recombination rate estimates from the Phase II HapMap
a, Haplotypes from YRI in a 100 kb region around the β-globin (HBB) gene. SNPs typed in Phase I are shown in dark blue. Additional SNPs in the Phase II HapMap are shown in light blue. Only SNPs for which the derived allele can be unambiguously identified by parsimony (by comparison with an outgroup sequence) are shown (89% of SNPs in the region); the derived allele is shown in colour. b, Recombination rates (lines) and the location of hotspots (horizontal blue bars) estimated for the same region from the Phase I (dark blue) and Phase II HapMap (light blue) data. Also shown are the location of genes within the region (grey bars) and the location of the experimentally verified recombination hotspot, at the 59′ end of the HBB gene (black bar).
Figure 3
Figure 3. The extent of recent co-ancestry among HapMap individuals
a, Three pairs of individuals with varying levels of identity-by-descent (IBD) sharing illustrate the continuum between very close and very distant relatedness and its relation to segmental sharing. The three pairs are: high sharing (NA19130 and NA19192 from YRI; previously identified as second-degree relatives3), moderate sharing (NA06994 and NA12892 from CEU) and low sharing (NA12006 and NA12155 from CEU). Along each chromosome, the probability of sharing at least one chromosome IBD is plotted, based on the HMM method described in Supplementary Text 5. Red sections indicate regions called as segments: in general, the proportion of the genome in segments is similar to each pair's estimated global relatedness. b, The extent of homozygosity on each chromosome for each individual in each analysis panel. Excludes segments <106 kb and chromosome X in males. Asterisk, NA12874, length=107 Mb. YRI, green; CEU, orange; CHB, blue; JPT, magenta.
Figure 4
Figure 4. Properties of untaggable SNPs
a–e, Properties of the genomic regions surrounding untaggable SNPs in terms of: a, the density of polymorphic SNPs within the consensus data set; b, mean minor allele frequency of polymorphic SNPs; c, maximum r2 of SNPs to any others in the Phase II data; d, the density of estimated recombination hotspots (defined from hotspot centres); and e, the estimated mean recombination rate. YRI, green; CEU, orange; CHB+JPT, purple.
Figure 5
Figure 5. Recombination rates around genes
a, The recombination rate, density of recombination-hotspot-associated motifs (all motifs with up to 1 bp different from the consensus CCTCCCTNNCCAC) and G+C content around genes. The blue line indicates the mean. For the recombination rate, grey lines indicate the quartiles of the distribution. Values were calculated separately 5′ from the transcription start site (the first dotted line) and 3′ from the transcription end site (third dotted line) and were joined at the median midpoint position of the transcription unit (central dotted line). Note the sharp drop in recombination rate within the transcription unit, the local increase around the transcription start site and the broad decrease away from the 3′ end of genes. These patterns only partly reflect the distribution of G+C content and the hotspot-associated motif, suggesting that additional factors influence recombination rates around genes. b, Recombination rates within genes of different molecular function. The chart shows the increase or decrease for each category compared to the genome average. P values were estimated by permutation of category; numbers of genes are shown in parentheses.
Figure 6
Figure 6. Properties of non-synonymous and synonymous SNPs
a, The derived allele frequency (DAF) spectrum in each analysis panel for all SNPs (black), synonymous SNPs (green) and non-synonymous SNPs (red). Note the excess of rare variants for coding sequence SNPs but no excess of high-frequency derived variants. b, Enrichment of non-synonymous SNPs among genic SNPs showing high differentiation. For each of ten classes of derived allele frequency (averaged across analysis panels) the fraction of non-synonymous (red) and synonymous (green) variants in that class that show FST > 0.5 is shown. Note the strong enrichment of non-synonymous SNPs among SNPs of moderate to high derived-allele frequency (asterisk, P < 0.05; double asterisk, P < 0.01). c, Lack of enrichment of non-synonymous SNPs among those showing long-range haplotype structure. The integrated extended haplotype homozygosity (iEHH) statistic was calculated for non-synonymous and synonymous SNPs in each analysis panel (YRI, green; CEU, orange; CHB+JPT, purple). For each of ten derived allele frequency classes, the proportion of non-synonymous SNPs among those showing the 5% most extreme statistics (within the allele frequency class) is shown (points). Also shown is the proportion of non-synonymous SNPs among SNPs in the coding sequence for each frequency class (dotted lines). Differences between synonymous and non-synonymous SNPs are tested for using a contingency table test.

Similar articles

See all similar articles

Cited by 2,132 articles

See all "Cited by" articles

Publication types

Feedback