Can Deep Learning Improve Genomic Prediction of Complex Human Traits?
- PMID: 30171033
- PMCID: PMC6218236
- DOI: 10.1534/genetics.118.301298
Can Deep Learning Improve Genomic Prediction of Complex Human Traits?
Abstract
The genetic analysis of complex traits does not escape the current excitement around artificial intelligence, including a renewed interest in "deep learning" (DL) techniques such as Multilayer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs). However, the performance of DL for genomic prediction of complex human traits has not been comprehensively tested. To provide an evaluation of MLPs and CNNs, we used data from distantly related white Caucasian individuals (n ∼100k individuals, m ∼500k SNPs, and k = 1000) of the interim release of the UK Biobank. We analyzed a total of five phenotypes: height, bone heel mineral density, body mass index, systolic blood pressure, and waist-hip ratio, with genomic heritabilities ranging from ∼0.20 to 0.70. After hyperparameter optimization using a genetic algorithm, we considered several configurations, from shallow to deep learners, and compared the predictive performance of MLPs and CNNs with that of Bayesian linear regressions across sets of SNPs (from 10k to 50k) that were preselected using single-marker regression analyses. For height, a highly heritable phenotype, all methods performed similarly, although CNNs were slightly but consistently worse. For the rest of the phenotypes, the performance of some CNNs was comparable or slightly better than linear methods. Performance of MLPs was highly dependent on SNP set and phenotype. In all, over the range of traits evaluated in this study, CNN performance was competitive to linear models, but we did not find any case where DL outperformed the linear model by a sizable margin. We suggest that more research is needed to adapt CNN methodology, originally motivated by image analysis, to genetic-based problems in order for CNNs to be competitive with linear models.
Keywords: Convolutional Neural Networks; GenPred; Genomic Prediction regressions; Multilayer Perceptrons; UK Biobank; complex traits; deep learning; genomic prediction; whole-genome.
Copyright © 2018 by the Genetics Society of America.
Figures
Similar articles
-
Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes.Genet Sel Evol. 2020 Feb 24;52(1):12. doi: 10.1186/s12711-020-00531-z. Genet Sel Evol. 2020. PMID: 32093611 Free PMC article.
-
A Guide for Using Deep Learning for Complex Trait Genomic Prediction.Genes (Basel). 2019 Jul 20;10(7):553. doi: 10.3390/genes10070553. Genes (Basel). 2019. PMID: 31330861 Free PMC article. Review.
-
deepGBLUP: joint deep learning networks and GBLUP framework for accurate genomic prediction of complex traits in Korean native cattle.Genet Sel Evol. 2023 Jul 31;55(1):56. doi: 10.1186/s12711-023-00825-y. Genet Sel Evol. 2023. PMID: 37525091 Free PMC article.
-
Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice.G3 (Bethesda). 2022 Apr 4;12(4):jkac039. doi: 10.1093/g3journal/jkac039. G3 (Bethesda). 2022. PMID: 35166767 Free PMC article.
-
The Use of Deep Learning Software in the Detection of Voice Disorders: A Systematic Review.Otolaryngol Head Neck Surg. 2024 Jun;170(6):1531-1543. doi: 10.1002/ohn.636. Epub 2024 Jan 3. Otolaryngol Head Neck Surg. 2024. PMID: 38168017 Review.
Cited by
-
Human genotype-to-phenotype predictions: Boosting accuracy with nonlinear models.PLoS One. 2022 Aug 31;17(8):e0273293. doi: 10.1371/journal.pone.0273293. eCollection 2022. PLoS One. 2022. PMID: 36044406 Free PMC article.
-
A review of model evaluation metrics for machine learning in genetics and genomics.Front Bioinform. 2024 Sep 10;4:1457619. doi: 10.3389/fbinf.2024.1457619. eCollection 2024. Front Bioinform. 2024. PMID: 39318760 Free PMC article. Review.
-
Multi-Trait, Multi-Environment Genomic Prediction of Durum Wheat With Genomic Best Linear Unbiased Predictor and Deep Learning Methods.Front Plant Sci. 2019 Nov 8;10:1311. doi: 10.3389/fpls.2019.01311. eCollection 2019. Front Plant Sci. 2019. PMID: 31787990 Free PMC article.
-
A Machine-Learning-Based Approach to Prediction of Biogeographic Ancestry within Europe.Int J Mol Sci. 2023 Oct 11;24(20):15095. doi: 10.3390/ijms242015095. Int J Mol Sci. 2023. PMID: 37894775 Free PMC article.
-
Fully-Connected Neural Networks with Reduced Parameterization for Predicting Histological Types of Lung Cancer from Somatic Mutations.Biomolecules. 2020 Aug 28;10(9):1249. doi: 10.3390/biom10091249. Biomolecules. 2020. PMID: 32872133 Free PMC article.
References
-
- Abadi M., Agarwal A., Barham P., Brevdo E., Chen Z., et al. , 2015. TensorFlow: large-scale machine learning on heterogeneous systems. Available at: tensorflow.org. Accessed: July 1, 2018.
-
- Chollet F., 2015. Keras: deep learning library for theano and tensorflow. Available at: https://keras.io/. Accessed May 1, 2018.
-
- de Los Campos, G., and A. Grueneberg, 2017 BGData: a suite of packages for analysis of big genomic data. R package version 1.0.0.9000. Available at:https://github.com/QuantGen/BGData - PMC - PubMed
Publication types
MeSH terms
Associated data
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
