Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 1;108(7):1217-1230.
doi: 10.1016/j.ajhg.2021.05.004. Epub 2021 Jun 1.

Large-scale machine-learning-based phenotyping significantly improves genomic discovery for optic nerve head morphology

Affiliations

Large-scale machine-learning-based phenotyping significantly improves genomic discovery for optic nerve head morphology

Babak Alipanahi et al. Am J Hum Genet. .

Abstract

Genome-wide association studies (GWASs) require accurate cohort phenotyping, but expert labeling can be costly, time intensive, and variable. Here, we develop a machine learning (ML) model to predict glaucomatous optic nerve head features from color fundus photographs. We used the model to predict vertical cup-to-disc ratio (VCDR), a diagnostic parameter and cardinal endophenotype for glaucoma, in 65,680 Europeans in the UK Biobank (UKB). A GWAS of ML-based VCDR identified 299 independent genome-wide significant (GWS; p ≤ 5 × 10-8) hits in 156 loci. The ML-based GWAS replicated 62 of 65 GWS loci from a recent VCDR GWAS in the UKB for which two ophthalmologists manually labeled images for 67,040 Europeans. The ML-based GWAS also identified 93 novel loci, significantly expanding our understanding of the genetic etiologies of glaucoma and VCDR. Pathway analyses support the biological significance of the novel hits to VCDR: select loci near genes involved in neuronal and synaptic biology or harboring variants are known to cause severe Mendelian ophthalmic disease. Finally, the ML-based GWAS results significantly improve polygenic prediction of VCDR and primary open-angle glaucoma in the independent EPIC-Norfolk cohort.

Keywords: GWAS, phenotyping, machine learning, glaucoma.

PubMed Disclaimer

Conflict of interest statement

P.J.F. and A.P.K. are employees of the UCL Institute of Ophthalmology, London, UK. The remaining authors are employees and shareholders of Google LLC.

Figures

Figure 1
Figure 1
ML-based phenotyping concept and its application to VCDR (A) “Model training” phase in which a phenotype prediction model is trained with expert-labeled data. (B) “Model application” phase in which the validated phenotype prediction model is applied to new, unlabeled data followed by genomic discovery. (C) Definition of vertical cup-to-disc ratio (VCDR) in a real fundus image. (D) Schematic of the multi-task ensemble model used in phenotype prediction. (E–H) Scatterplots of the ML-based VCDR versus expert-labeled VCDR values for the train (E), tune (F), test (G), and UK Biobank (H) datasets. Number of grades per image is shown in parentheses.
Figure 2
Figure 2
ML-based VCDR GWAS results and comparison to known associations (A) Manhattan plot depicting ML-based VCDR-associated GWAS p values from the BOLT-LMM analysis. There are 156 GWS (genome-wide significant) loci, representing 299 independent (R2 = 0.1) GWS hits. For each locus, the closest gene is shown. Blue gene names and dots indicate loci also identified in the Craig et al. study and red dots and black gene names indicate novel loci. The dashed red line denotes the GWS p value, 5 × 10−8. (B) Venn diagram of loci overlap for three VCDR GWASs. ML-based GWAS replicates all 22 loci of the IGGC VCDR meta-analysis and 62 of 65 loci identified by Craig et al., while discovering 93 novel loci (supplemental information). (C) Effect sizes for the 73 GWS hits shared by the Craig et al. and ML-based VCDR GWAS. The three Craig et al. hits not included failed the ML-based GWAS QC (rs61952219 for low imputation quality and rs7039467 and rs146055611 for violating Hardy-Weinberg equilibrium). Blue and red dots denote the SNP’s being more significant in the ML-based and Craig et al. GWAS, respectively. Error bars depict standard errors. The banding in Craig et al. effect sizes is due to large effect sizes’ being reported in multiples of 0.01. The blue line is the best fit line and the shaded area shows the 95% confidence interval.
Figure 3
Figure 3
VCDR polygenic risk score performance metrics (A and B) Pearson’s correlations between measured VCDR values and predictions of the pruning and thresholding (P+T) (A) and the elastic net models (B) are shown for the PRS learned from ML-based and Craig et al. hits. Error bars depict 95% confidence intervals. Numbers above bars are the observed Pearson’s correlations. Indications of p value ranges (permutation test): p ≤ 0.05, ∗∗p ≤ 0.01, ∗∗∗p ≤ 0.001. The Craig et al. P+T model uses 58 out of 76 hits. Measured VCDR values were obtained from adjudicated expert labeling of fundus images (UKB, n = 2,076) and scanning laser ophthalmoscopy (HRT) (EPIC-Norfolk, n = 5,868).
Figure 4
Figure 4
Relationship between glaucoma and VCDR (A) Glaucoma odds ratios for each ML-based VCDR bin versus the bottom bin is shown. The fraction of individuals in each bin is shown (n = 65,193). (B) Glaucoma odds ratios for different VCDR elastic net PRS bins versus the bottom bin for individuals with a glaucoma phenotype not used in the GWAS or developing the PRS (n = 98,151). The fractions are selected to match those from (A). (C) A histogram of ML-based glaucoma liability versus ML-based VCDR (Pearson’s correlation R = 0.91, n = 65,680, p < 1 × 10−300). (D) LocusZoom for the strongest associated variant (rs12913832, p = 2.2 × 10−66) in the ML-based glaucoma liability GWAS conditioned on the ML-based VCDR.
Figure 5
Figure 5
Primary open-angle glaucoma (POAG) prediction in the EPIC-Norfolk cohort (A–C) Odds ratios and 95% CIs for POAG prevalence by decile of VCDR PRS; reference is decile 1. Results are from logistic regression models adjusted for age and sex for primary open-angle glaucoma (175 cases, 5,693 controls) (A), high-tension glaucoma (HTG; 98 cases, 5,693 controls) (B), and normal-tension glaucoma (NTG; 77 cases, 5,693 controls) (C). Results are presented for the ML-based elastic net VCDR PRS (blue) and the Craig et al. elastic net VCDR PRS (yellow). Note the y axis log scale.

Similar articles

Cited by

References

    1. Tung J.Y., Do C.B., Hinds D.A., Kiefer A.K., Macpherson J.M., Chowdry A.B., Francke U., Naughton B.T., Mountain J.L., Wojcicki A., Eriksson N. Efficient replication of over 180 genetic associations with self-reported medical data. PLoS ONE. 2011;6:e23473. - PMC - PubMed
    1. Deveza L.A., Melo L., Yamato T., Mills K., Hunter D.J. Knee osteoarthritis phenotypes and their relevance for outcomes: a systematic review of the literature. Osteoarthritis Cartilage. 2017;25:S57–S58. - PubMed
    1. Sudlow C., Gallacher J., Allen N., Beral V., Burton P., Danesh J., Downey P., Elliott P., Green J., Landray M. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015;12:e1001779. - PMC - PubMed
    1. Nagai A., Hirata M., Kamatani Y., Muto K., Matsuda K., Kiyohara Y., Ninomiya T., Tamakoshi A., Yamagata Z., Mushiroda T., BioBank Japan Cooperative Hospital Group Overview of the BioBank Japan Project: Study design and profile. J. Epidemiol. 2017;27(3S):S2–S8. - PMC - PubMed
    1. DeBoever C., Tanigawa Y., Aguirre M., McInnes G., Lavertu A., Rivas M.A. Assessing Digital Phenotyping to Enhance Genetic Studies of Human Diseases. Am. J. Hum. Genet. 2020;106:611–622. - PMC - PubMed

Publication types

LinkOut - more resources