Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 50 (8), 1112-1121

Gene Discovery and Polygenic Prediction From a Genome-Wide Association Study of Educational Attainment in 1.1 Million Individuals

James J Lee  1 Robbee Wedow  2   3   4 Aysu Okbay  5   6 Edward Kong  7 Omeed Maghzian  7 Meghan Zacher  8 Tuan Anh Nguyen-Viet  9 Peter Bowers  7 Julia Sidorenko  10   11 Richard Karlsson Linnér  12   13   14 Mark Alan Fontana  9   15 Tushar Kundu  9 Chanwook Lee  7 Hui Li  7 Ruoxi Li  9 Rebecca Royer  9 Pascal N Timshel  16   17 Raymond K Walters  18   19 Emily A Willoughby  1 Loïc Yengo  10 23andMe Research TeamCOGENT (Cognitive Genomics Consortium)Social Science Genetic Association ConsortiumMaris Alver  11 Yanchun Bao  20 David W Clark  21 Felix R Day  22 Nicholas A Furlotte  23 Peter K Joshi  21   24 Kathryn E Kemper  10 Aaron Kleinman  23 Claudia Langenberg  22 Reedik Mägi  11 Joey W Trampush  25   26 Shefali Setia Verma  27 Yang Wu  10 Max Lam  28   29 Jing Hua Zhao  22 Zhili Zheng  10   30 Jason D Boardman  2   3   4 Harry Campbell  21 Jeremy Freese  31 Kathleen Mullan Harris  32   33 Caroline Hayward  34 Pamela Herd  20   35 Meena Kumari  20 Todd Lencz  36   37   38 Jian'an Luan  22 Anil K Malhotra  36   37   38 Andres Metspalu  11   39 Lili Milani  11 Ken K Ong  22 John R B Perry  22 David J Porteous  40 Marylyn D Ritchie  27 Melissa C Smart  21 Blair H Smith  41   42 Joyce Y Tung  23 Nicholas J Wareham  22 James F Wilson  21   34 Jonathan P Beauchamp  43 Dalton C Conley  44 Tõnu Esko  11 Steven F Lehrer  45   46   47 Patrik K E Magnusson  48 Sven Oskarsson  49 Tune H Pers  16   17 Matthew R Robinson  10   50 Kevin Thom  51 Chelsea Watson  9 Christopher F Chabris  52 Michelle N Meyer  53 David I Laibson  7 Jian Yang  10   54 Magnus Johannesson  55 Philipp D Koellinger  12   13   14 Patrick Turley  18   19 Peter M Visscher  56   57 Daniel J Benjamin  58   59   60 David Cesarini  47   51   61
Collaborators, Affiliations

Gene Discovery and Polygenic Prediction From a Genome-Wide Association Study of Educational Attainment in 1.1 Million Individuals

James J Lee et al. Nat Genet.


Here we conducted a large-scale genetic association analysis of educational attainment in a sample of approximately 1.1 million individuals and identify 1,271 independent genome-wide-significant SNPs. For the SNPs taken together, we found evidence of heterogeneous effects across environments. The SNPs implicate genes involved in brain-development processes and neuron-to-neuron communication. In a separate analysis of the X chromosome, we identify 10 independent genome-wide-significant SNPs and estimate a SNP heritability of around 0.3% in both men and women, consistent with partial dosage compensation. A joint (multi-phenotype) analysis of educational attainment and three related cognitive phenotypes generates polygenic scores that explain 11-13% of the variance in educational attainment and 7-10% of the variance in cognitive performance. This prediction accuracy substantially increases the utility of polygenic scores as tools in research.

Conflict of interest statement

COMPETING FINANCIAL INTERESTS: Anil Malhotra is a consultant to Genomind Inc., Informed DNA, Concert Pharmaceuticals, and Biogen. Nicholas A. Furlotte, Aaron Kleinman, and Joyce Tung are employees of 23andMe, Inc.


Fig. 1.
Fig. 1.. Manhattan Plot for GWAS of EduYears (N = 1,131,881).
P values and the mean χ2 shown in figure are based on inflation-adjusted test statistics. The x-axis is chromosomal position, and the y-axis is the significance on a −log10 scale. The dashed line marks the threshold for genome-wide significance (P = 5×10−8).
Fig 2.
Fig 2.. Sign Concordance in Within-Family Association Analyses.
The set of LD-pruned SNPs is limited to SNPs with (a) P < 5×10−3, (b) P < 5×10−5, or (c) P < 5×10−8. Each panel compares the observed sign concordance between within-family and GWAS estimates to the distributions expected (i) by chance alone (pink); (ii) according to a Bayesian framework that adjusts the GWAS estimates for bias due to winner’s curse (green); and (iii) according to the same framework with an additional adjustment for bias due to assortative mating (blue). These results are based on a GWAS sample size of 1,070,751 individuals and a within-family sample of 22,135 sibling pairs (44,270 individuals).
Fig. 3.
Fig. 3.. Tissue-specific expression of genes in DEPICT-defined loci.
(a) We took microarray measurements from the Gene Expression Omnibus and determined whether the genes overlapping EduYears-associated loci (as defined by DEPICT) are significantly overexpressed (relative to genes in random sets of loci) in each of 180 tissues/cell types. These types are grouped in the figure by Medical Subject Headings (MeSH) first-level term. The y-axis is the one-sided P value from DEPICT on a –log10 scale. The 28 dark bars correspond to tissues/cell types in which the genes are significantly overexpressed (FDR < 0.01), including all 22 classified as part of the central nervous system (see Supplementary Table 6 for identifiers of all tissues/cell types). (b) Whereas genes prioritized by DEPICT in a previous analysis based on a smaller sample tend to be more strongly expressed in the brain prenatally (red curve), the 1,703 newly prioritized genes show a flat trajectory of expression across development (blue curve). Both groups of DEPICT-prioritized genes show elevated levels of expression relative to protein-coding genes that are not prioritized (gray curve). Analyses were based on RNA-seq data from the BrainSpan Developmental Transcriptome. These results are based on the full GWAS sample of 1,131,881 individuals. Error bars represents 95% confidence intervals.
Fig. 4.
Fig. 4.. Prediction Accuracy.
(a) Mean prevalence of college completion by EduYears PGS quintile. Error bars show the 95% confidence interval for the mean. (b) Incremental R2 of the EduYears PGS compared to that of other variables. (c) Incremental R2 of the PGS for EduYears and Cognitive Performance constructed from the respective GWAS or MTAG summary statistics. Error bars for the R2 values show bootstrapped 95% confidence intervals with 1000 iterations each. Sample sizes are N = 4,775 for Add Health and N = 8,609 for HRS.

Comment in

  • Lessons from 1 million genomes.
    Trenkmann M. Trenkmann M. Nat Rev Genet. 2018 Oct;19(10):592-593. doi: 10.1038/s41576-018-0047-5. Nat Rev Genet. 2018. PMID: 30093724 No abstract available.

Similar articles

  • GWAS of 126,559 individuals identifies genetic variants associated with educational attainment.
    Rietveld CA, Medland SE, Derringer J, Yang J, Esko T, Martin NW, Westra HJ, Shakhbazov K, Abdellaoui A, Agrawal A, Albrecht E, Alizadeh BZ, Amin N, Barnard J, Baumeister SE, Benke KS, Bielak LF, Boatman JA, Boyle PA, Davies G, de Leeuw C, Eklund N, Evans DS, Ferhmann R, Fischer K, Gieger C, Gjessing HK, Hägg S, Harris JR, Hayward C, Holzapfel C, Ibrahim-Verbaas CA, Ingelsson E, Jacobsson B, Joshi PK, Jugessur A, Kaakinen M, Kanoni S, Karjalainen J, Kolcic I, Kristiansson K, Kutalik Z, Lahti J, Lee SH, Lin P, Lind PA, Liu Y, Lohman K, Loitfelder M, McMahon G, Vidal PM, Meirelles O, Milani L, Myhre R, Nuotio ML, Oldmeadow CJ, Petrovic KE, Peyrot WJ, Polasek O, Quaye L, Reinmaa E, Rice JP, Rizzi TS, Schmidt H, Schmidt R, Smith AV, Smith JA, Tanaka T, Terracciano A, van der Loos MJ, Vitart V, Völzke H, Wellmann J, Yu L, Zhao W, Allik J, Attia JR, Bandinelli S, Bastardot F, Beauchamp J, Bennett DA, Berger K, Bierut LJ, Boomsma DI, Bültmann U, Campbell H, Chabris CF, Cherkas L, Chung MK, Cucca F, de Andrade M, De Jager PL, De Neve JE, Deary IJ, Dedoussis GV, Deloukas P, Dimitriou M, Eiríksdóttir G, Elderson MF, Eriksson JG, Evans DM, Faul JD, Ferrucci L, Garcia ME, Grönberg H, Guðnason V, Hall P, Harris JM, Harris TB, Hastie ND, Heath AC, Hernandez DG, Hoffmann W, Hofman A, Holle R, Holliday EG, Hottenga JJ, Iacono WG, Illig T, Järvelin MR, Kähönen M, Kaprio J, Kirkpatrick RM, Kowgier M, Latvala A, Launer LJ, Lawlor DA, Lehtimäki T, Li J, Lichtenstein P, Lichtner P, Liewald DC, Madden PA, Magnusson PK, Mäkinen TE, Masala M, McGue M, Metspalu A, Mielck A, Miller MB, Montgomery GW, Mukherjee S, Nyholt DR, Oostra BA, Palmer LJ, Palotie A, Penninx BW, Perola M, Peyser PA, Preisig M, Räikkönen K, Raitakari OT, Realo A, Ring SM, Ripatti S, Rivadeneira F, Rudan I, Rustichini A, Salomaa V, Sarin AP, Schlessinger D, Scott RJ, Snieder H, St Pourcain B, Starr JM, Sul JH, Surakka I, Svento R, Teumer A; LifeLines Cohort Study, Tiemeier H, van Rooij FJ, Van Wagoner DR, Vartiainen E, Viikari J, Vollenweider P, Vonk JM, Waeber G, Weir DR, Wichmann HE, Widen E, Willemsen G, Wilson JF, Wright AF, Conley D, Davey-Smith G, Franke L, Groenen PJ, Hofman A, Johannesson M, Kardia SL, Krueger RF, Laibson D, Martin NG, Meyer MN, Posthuma D, Thurik AR, Timpson NJ, Uitterlinden AG, van Duijn CM, Visscher PM, Benjamin DJ, Cesarini D, Koellinger PD. Rietveld CA, et al. Science. 2013 Jun 21;340(6139):1467-71. doi: 10.1126/science.1235488. Epub 2013 May 30. Science. 2013. PMID: 23722424 Free PMC article.
  • Educational attainment: a genome wide association study in 9538 Australians.
    Martin NW, Medland SE, Verweij KJ, Lee SH, Nyholt DR, Madden PA, Heath AC, Montgomery GW, Wright MJ, Martin NG. Martin NW, et al. PLoS One. 2011;6(6):e20128. doi: 10.1371/journal.pone.0020128. Epub 2011 Jun 9. PLoS One. 2011. PMID: 21694764 Free PMC article.
  • Genome-wide association study of cognitive functions and educational attainment in UK Biobank (N=112 151).
    Davies G, Marioni RE, Liewald DC, Hill WD, Hagenaars SP, Harris SE, Ritchie SJ, Luciano M, Fawns-Ritchie C, Lyall D, Cullen B, Cox SR, Hayward C, Porteous DJ, Evans J, McIntosh AM, Gallacher J, Craddock N, Pell JP, Smith DJ, Gale CR, Deary IJ. Davies G, et al. Mol Psychiatry. 2016 Jun;21(6):758-67. doi: 10.1038/mp.2016.45. Epub 2016 Apr 5. Mol Psychiatry. 2016. PMID: 27046643 Free PMC article.
  • Polygenic risk scores: a biased prediction?
    De La Vega FM, Bustamante CD. De La Vega FM, et al. Genome Med. 2018 Dec 27;10(1):100. doi: 10.1186/s13073-018-0610-x. Genome Med. 2018. PMID: 30591078 Free PMC article. Review.
  • Complex Trait Prediction from Genome Data: Contrasting EBV in Livestock to PRS in Humans: Genomic Prediction.
    Wray NR, Kemper KE, Hayes BJ, Goddard ME, Visscher PM. Wray NR, et al. Genetics. 2019 Apr;211(4):1131-1141. doi: 10.1534/genetics.119.301859. Genetics. 2019. PMID: 30967442 Free PMC article. Review.
See all similar articles

Cited by 127 articles

See all "Cited by" articles

Publication types

LinkOut - more resources