Population substructure and control selection in genome-wide association studies
- PMID: 18596976
- PMCID: PMC2432498
- DOI: 10.1371/journal.pone.0002551
Population substructure and control selection in genome-wide association studies
Abstract
Determination of the relevance of both demanding classical epidemiologic criteria for control selection and robust handling of population stratification (PS) represents a major challenge in the design and analysis of genome-wide association studies (GWAS). Empirical data from two GWAS in European Americans of the Cancer Genetic Markers of Susceptibility (CGEMS) project were used to evaluate the impact of PS in studies with different control selection strategies. In each of the two original case-control studies nested in corresponding prospective cohorts, a minor confounding effect due to PS (inflation factor lambda of 1.025 and 1.005) was observed. In contrast, when the control groups were exchanged to mimic a cost-effective but theoretically less desirable control selection strategy, the confounding effects were larger (lambda of 1.090 and 1.062). A panel of 12,898 autosomal SNPs common to both the Illumina and Affymetrix commercial platforms and with low local background linkage disequilibrium (pair-wise r(2)<0.004) was selected to infer population substructure with principal component analysis. A novel permutation procedure was developed for the correction of PS that identified a smaller set of principal components and achieved a better control of type I error (to lambda of 1.032 and 1.006, respectively) than currently used methods. The overlap between sets of SNPs in the bottom 5% of p-values based on the new test and the test without PS correction was about 80%, with the majority of discordant SNPs having both ranks close to the threshold. Thus, for the CGEMS GWAS of prostate and breast cancer conducted in European Americans, PS does not appear to be a major problem in well-designed studies. A study using suboptimal controls can have acceptable type I error when an effective strategy for the correction of PS is employed.
Conflict of interest statement
Figures
Similar articles
-
Establishing an adjusted p-value threshold to control the family-wide type 1 error in genome wide association studies.BMC Genomics. 2008 Oct 31;9:516. doi: 10.1186/1471-2164-9-516. BMC Genomics. 2008. PMID: 18976480 Free PMC article.
-
Genetic background comparison using distance-based regression, with applications in population stratification evaluation and adjustment.Genet Epidemiol. 2009 Jul;33(5):432-41. doi: 10.1002/gepi.20396. Genet Epidemiol. 2009. PMID: 19140130 Free PMC article.
-
SNP selection and multidimensional scaling to quantify population structure.Genet Epidemiol. 2009 Sep;33(6):488-96. doi: 10.1002/gepi.20401. Genet Epidemiol. 2009. PMID: 19194989
-
[Analysis of population stratification using random SNPs in genome-wide association studies].Yi Chuan. 2010 Sep;32(9):921-8. Yi Chuan. 2010. PMID: 20870613 Chinese.
-
Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies.BMC Genet. 2009 Jun 16;10:27. doi: 10.1186/1471-2156-10-27. BMC Genet. 2009. PMID: 19531258 Free PMC article.
Cited by 73 articles
-
Polygenic risk score for the prediction of breast cancer is related to lesser terminal duct lobular unit involution of the breast.NPJ Breast Cancer. 2020 Sep 7;6:41. doi: 10.1038/s41523-020-00184-7. eCollection 2020. NPJ Breast Cancer. 2020. PMID: 32964115 Free PMC article.
-
Low-frequency variation near common germline susceptibility loci are associated with risk of Ewing sarcoma.PLoS One. 2020 Sep 3;15(9):e0237792. doi: 10.1371/journal.pone.0237792. eCollection 2020. PLoS One. 2020. PMID: 32881892 Free PMC article.
-
Inherited genetic susceptibility to acute lymphoblastic leukemia in Down syndrome.Blood. 2019 Oct 10;134(15):1227-1237. doi: 10.1182/blood.2018890764. Blood. 2019. PMID: 31350265 Free PMC article.
-
A Powerful Method To Test Associations Between Ordinal Traits and Genotypes.G3 (Bethesda). 2019 Aug 8;9(8):2573-2579. doi: 10.1534/g3.119.400293. G3 (Bethesda). 2019. PMID: 31167832 Free PMC article.
-
Childhood asthma is associated with COPD and known asthma variants in COPDGene: a genome-wide association study.Respir Res. 2018 Oct 29;19(1):209. doi: 10.1186/s12931-018-0890-0. Respir Res. 2018. PMID: 30373671 Free PMC article.
References
-
- Yeager M, Orr N, Hayes RB, Jacobs KB, Kraft P, et al. Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat Genet. 2007;39:645–649. - PubMed
-
- Hunter DJ, Thomas G, Hoover RN, Chanock SJ. Scanning the horizon: what is the future of genome-wide association studies in accelerating discoveries in cancer etiology and prevention? Cancer Causes Control. 2007;18:479–484. - PubMed
-
- Carlson CS, Eberle MA, Rieder MJ, Smith JD, Kruglyak L, et al. Additional SNPs and linkage-disequilibrium analyses are necessary for whole-genome association studies in humans. Nat Genet. 2003;33:518–521. - PubMed
-
- Wacholder S, Rothman N, Caporaso N. Population stratification in epidemiologic studies of common genetic variants and cancer: quantification of bias. J Natl Cancer Inst. 2000;92:1151–1158. - PubMed
Publication types
MeSH terms
Grant support
- R01 CA050385/CA/NCI NIH HHS/United States
- P01 CA087969/CA/NCI NIH HHS/United States
- R01 CA067262/CA/NCI NIH HHS/United States
- 5U01CA098233/CA/NCI NIH HHS/United States
- CA87969/CA/NCI NIH HHS/United States
- CA49449/CA/NCI NIH HHS/United States
- U01 CA067262/CA/NCI NIH HHS/United States
- R01 CA065725/CA/NCI NIH HHS/United States
- CA67262/CA/NCI NIH HHS/United States
- Intramural NIH HHS/United States
- U01 CA098233/CA/NCI NIH HHS/United States
- R01 CA049449/CA/NCI NIH HHS/United States
- CA50385/CA/NCI NIH HHS/United States
- N01CO12400/CA/NCI NIH HHS/United States
- U01 CA049449/CA/NCI NIH HHS/United States
- N01-CO-12400/CO/NCI NIH HHS/United States
- CA65725/CA/NCI NIH HHS/United States
LinkOut - more resources
Full Text Sources
