A deep convolutional neural network approach for predicting phenotypes from genotypes

Planta. 2018 Nov;248(5):1307-1318. doi: 10.1007/s00425-018-2976-9. Epub 2018 Aug 12.

Abstract

Deep learning is a promising technology to accurately select individuals with high phenotypic values based on genotypic data. Genomic selection (GS) is a promising breeding strategy by which the phenotypes of plant individuals are usually predicted based on genome-wide markers of genotypes. In this study, we present a deep learning method, named DeepGS, to predict phenotypes from genotypes. Using a deep convolutional neural network, DeepGS uses hidden variables that jointly represent features in genotypes when making predictions; it also employs convolution, sampling and dropout strategies to reduce the complexity of high-dimensional genotypic data. We used a large GS dataset to train DeepGS and compared its performance with other methods. The experimental results indicate that DeepGS can be used as a complement to the commonly used RR-BLUP in the prediction of phenotypes from genotypes. The complementarity between DeepGS and RR-BLUP can be utilized using an ensemble learning approach for more accurately selecting individuals with high phenotypic values, even for the absence of outlier individuals and subsets of genotypic markers. The source codes of DeepGS and the ensemble learning approach have been packaged into Docker images for facilitating their applications in different GS programs.

Keywords: Deep learning; Ensemble learning; Genomic selection; Genotypic marker; High phenotypic values; Machine learning.

MeSH terms

  • Genetic Association Studies / methods*
  • Genome-Wide Association Study / methods
  • Machine Learning
  • Models, Genetic
  • Neural Networks, Computer*
  • Plants / genetics*
  • Selection, Genetic