Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct;44(7):759-777.
doi: 10.1002/gepi.22336. Epub 2020 Aug 2.

Chances and challenges of machine learning-based disease classification in genetic association studies illustrated on age-related macular degeneration

Affiliations

Chances and challenges of machine learning-based disease classification in genetic association studies illustrated on age-related macular degeneration

Felix Guenther et al. Genet Epidemiol. 2020 Oct.

Abstract

Imaging technology and machine learning algorithms for disease classification set the stage for high-throughput phenotyping and promising new avenues for genome-wide association studies (GWAS). Despite emerging algorithms, there has been no successful application in GWAS so far. We establish machine learning-based phenotyping in genetic association analysis as misclassification problem. To evaluate chances and challenges, we performed a GWAS based on automatically classified age-related macular degeneration (AMD) in UK Biobank (images from 135,500 eyes; 68,400 persons). We quantified misclassification of automatically derived AMD in internal validation data (4,001 eyes; 2,013 persons) and developed a maximum likelihood approach (MLA) to account for it when estimating genetic association. We demonstrate that our MLA guards against bias and artifacts in simulation studies. By combining a GWAS on automatically derived AMD and our MLA in UK Biobank data, we were able to dissect true association (ARMS2/HTRA1, CFH) from artifacts (near HERC2) and identified eye color as associated with the misclassification. On this example, we provide a proof-of-concept that a GWAS using machine learning-derived disease classification yields relevant results and that misclassification needs to be considered in analysis. These findings generalize to other phenotypes and emphasize the utility of genetic data for understanding misclassification structure of machine learning algorithms.

Keywords: UK Biobank; age-related macular degeneration (AMD); genome-wide association study; machine learning-based disease classification; response misclassification.

PubMed Disclaimer

Similar articles

Cited by

References

REFERENCES

    1. Brandl, C., Zimmermann, M. E., Günther, F., Barth, T., Olden, M., Schelter, S. C., … Heid, I. M. (2018). On the impact of different approaches to classify age-related macular degeneration: Results from the German AugUR study. Scientific Reports, 8(1), 8675. https://doi.org/10.1038/s41598-018-26629-5
    1. Buniello, A., Macarthur, J. A. L., Cerezo, M., Harris, L. W., Hayhurst, J., Malangone, C., … Parkinson, H. (2019). The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Research, 47(D1), D1005-D1012. https://doi.org/10.1093/nar/gky1120
    1. Burlina, P. M., Joshi, N., Pekala, M., Pacheco, K. D., Freund, D. E., & Bressler, N. M. (2017). Automated grading of age-related macular degeneration from color fundus images using deep convolutional neural networks. JAMA Ophthalmology, 135(11), 1170. https://doi.org/10.1001/jamaophthalmol.2017.3782
    1. Bycroft, C., Freeman, C., Petkova, D., Band, G., Elliott, L. T., Sharp, K., … Marchini, J. (2018). The UK Biobank resource with deep phenotyping and genomic data. Nature, 562(7726), 203-209. https://doi.org/10.1038/s41586-018-0579-z
    1. Carroll, R. J., Ruppert, D., Stefanski, L. A., & Crainiceanu, C. M. (2006). Measurement error in nonlinear models (2nd ed.). Boca Raton, FL: Chapman and Hall/CRC.

Publication types

Substances

LinkOut - more resources