Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Sep;81(3):559-75.
doi: 10.1086/519795. Epub 2007 Jul 25.

PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses

Free PMC article

PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses

Shaun Purcell et al. Am J Hum Genet. .
Free PMC article


Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.


Figure  A1.
Figure A1.
Example transmissions and corresponding IBD states. For two haploid genomes, C1 and C2, the figure illustrates four (of many) possible patterns of transmission and the corresponding IBD states at two positions, U and V. The text describes how consideration of these possible scenarios leads to the specification of transition matrices for IBD state along the chromosome.
Figure  1.
Figure 1.
MDS and classification of Asian HapMap individuals. MDS reveals in each panel two clear clusters that correspond to CHB (left) and JPT (right) HapMap populations. The figure’s three panels differ only in the color scheme, which represents classification according to PPC thresholds of 0.01 (A), 0.001 (B), and 0.0001 (C).
Figure  2.
Figure 2.
Example segment shared IBD between two HapMap CEU offspring individuals and their parents. The main set of plots show the multipoint estimate of IBD sharing, P(Z=1), for a 25-Mb region of chromosome 9, for the pairs of individuals between two families (CEPH1375 and CEPH1341). The region was selected because the two offspring (NA10863 and NA06991) showed sharing in this region, shown in plot a. The three other segments shared between seemingly unrelated individuals are shown—that is, between the offspring in one family and a parent in the other family (two plots labeled b and c) and between those two parents (plot d). The lower-left diagram illustrates the region shared; this extended haplotype spans multiple haplotype blocks and recombination hotspots in the full phase II data. The lower-right diagram depicts the pattern of gene flow for this particular region—that is, a segment of the original common chromosome (dark rectangles) appears in the two families as shown.
Figure  3.
Figure 3.
Schema of integration of PLINK, gPLINK, and Haploview. PLINK is the main C/C++ WGAS analytic engine that can run either as a stand-alone tool (from the command line or via shell scripting) or in conjunction with gPLINK, a Java-based graphical user interface (GUI). gPLINK also offers a simple project management framework to track PLINK analyses and facilitates integration with Haploview. It is easy to configure these tools, such that the whole-genome data and PLINK analyses (i.e., the computationally expensive aspects of this process) can reside on a remote server, but all initiation and viewing of results is done locally—for example, on a user’s laptop, connected to the whole-genome data via the Internet, by use of gPLINK’s secure shell networking.

Similar articles

See all similar articles

Cited by 10,364 articles

  • Genome Wide Assessment of Genetic Variation and Population Distinctiveness of the Pig Family in South Africa.
    Hlongwane NL, Hadebe K, Soma P, Dzomba EF, Muchadeyi FC. Hlongwane NL, et al. Front Genet. 2020 May 7;11:344. doi: 10.3389/fgene.2020.00344. eCollection 2020. Front Genet. 2020. PMID: 32457791 Free PMC article.
  • Using Genetic Risk Score Approaches to Infer Whether an Environmental Factor Attenuates or Exacerbates the Adverse Influence of a Candidate Gene.
    Lin WY, Lin YS, Chan CC, Liu YL, Tsai SJ, Kuo PH. Lin WY, et al. Front Genet. 2020 May 8;11:331. doi: 10.3389/fgene.2020.00331. eCollection 2020. Front Genet. 2020. PMID: 32457790 Free PMC article.
  • Germline Polymorphisms and Length of Survival of Nasopharyngeal Carcinoma: An Exome-Wide Association Study in Multiple Cohorts.
    Guo YM, Chen JR, Feng YC, Chua MLK, Zeng Y, Hui EP, Chan AKC, Tang LQ, Wang L, Cui Q, Han HQ, Luo CL, Lin GW, Liang Y, Liu Y, He ZL, Liu YX, Wei PP, Liu CJ, Peng W, Han BW, Zuo XY, Ong EHW, Yeo ELL, Low KP, Tan GS, Lim TKH, Hwang JSG, Li B, Feng QS, Xia X, Xia YF, Ko J, Dai W, Lung ML, Chan ATC, Lo DYM, Zeng MS, Mai HQ, Liu J, Zeng YX, Bei JX. Guo YM, et al. Adv Sci (Weinh). 2020 Mar 20;7(10):1903727. doi: 10.1002/advs.201903727. eCollection 2020 May. Adv Sci (Weinh). 2020. PMID: 32440486 Free PMC article.
  • Multi-ancestry GWAS of the electrocardiographic PR interval identifies 202 loci underlying cardiac conduction.
    Ntalla I, Weng LC, Cartwright JH, Hall AW, Sveinbjornsson G, Tucker NR, Choi SH, Chaffin MD, Roselli C, Barnes MR, Mifsud B, Warren HR, Hayward C, Marten J, Cranley JJ, Concas MP, Gasparini P, Boutin T, Kolcic I, Polasek O, Rudan I, Araujo NM, Lima-Costa MF, Ribeiro ALP, Souza RP, Tarazona-Santos E, Giedraitis V, Ingelsson E, Mahajan A, Morris AP, Del Greco M F, Foco L, Gögele M, Hicks AA, Cook JP, Lind L, Lindgren CM, Sundström J, Nelson CP, Riaz MB, Samani NJ, Sinagra G, Ulivi S, Kähönen M, Mishra PP, Mononen N, Nikus K, Caulfield MJ, Dominiczak A, Padmanabhan S, Montasser ME, O'Connell JR, Ryan K, Shuldiner AR, Aeschbacher S, Conen D, Risch L, Thériault S, Hutri-Kähönen N, Lehtimäki T, Lyytikäinen LP, Raitakari OT, Barnes CLK, Campbell H, Joshi PK, Wilson JF, Isaacs A, Kors JA, van Duijn CM, Huang PL, Gudnason V, Harris TB, Launer LJ, Smith AV, Bottinger EP, Loos RJF, Nadkarni GN, Preuss MH, Correa A, Mei H, Wilson J, Meitinger T, Müller-Nurasyid M, Peters A, Waldenberger M, Mangino M, Spector TD, Rienstra M, van de Vegte YJ, van der Harst P, Verweij N, Kääb S, Schramm K, Sinner MF, Strauch K, Cutler MJ, Fatkin D, London B, Olesen M, Roden DM, Benjamin Shoemaker M, Gustav Smith J, Biggs ML, Bis JC, Brody JA, Psaty BM, Rice K, Sotoodehnia N, De Grandi A, Fuchsberger C, Pattaro C, Pramstaller PP, Ford I, Wouter Jukema J, Macfarlane PW, Trompet S, Dörr M, Felix SB, Völker U, Weiss S, Havulinna AS, Jula A, Sääksjärvi K, Salomaa V, Guo X, Heckbert SR, Lin HJ, Rotter JI, Taylor KD, Yao J, de Mutsert R, Maan AC, Mook-Kanamori DO, Noordam R, Cucca F, Ding J, Lakatta EG, Qian Y, Tarasov KV, Levy D, Lin H, Newton-Cheh CH, Lunetta KL, Murray AD, Porteous DJ, Smith BH, Stricker BH, Uitterlinden A, van den Berg ME, Haessler J, Jackson RD, Kooperberg C, Peters U, Reiner AP, Whitsel EA, Alonso A, Arking DE, Boerwinkle E, Ehret GB, Soliman EZ, Avery CL, Gogarten SM, Kerr KF, Laurie CC, Seyerle AA, Stilp A, Assa S, Abdullah Said M, Yldau van der Ende M, Lambiase PD, Orini M, Ramirez J, Van Duijvenboden S, Arnar DO, Gudbjartsson DF, Holm H, Sulem P, Thorleifsson G, Thorolfsdottir RB, Thorsteinsdottir U, Benjamin EJ, Tinker A, Stefansson K, Ellinor PT, Jamshidi Y, Lubitz SA, Munroe PB. Ntalla I, et al. Nat Commun. 2020 May 21;11(1):2542. doi: 10.1038/s41467-020-15706-x. Nat Commun. 2020. PMID: 32439900 Free PMC article.
  • The relevance of gene flow with wild relatives in understanding the domestication process.
    Moreno-Letelier A, Aguirre-Liguori JA, Piñero D, Vázquez-Lobo A, Eguiarte LE. Moreno-Letelier A, et al. R Soc Open Sci. 2020 Apr 15;7(4):191545. doi: 10.1098/rsos.191545. eCollection 2020 Apr. R Soc Open Sci. 2020. PMID: 32431864 Free PMC article.
See all "Cited by" articles

Publication types