TrG2P: A transfer learning-based tool integrating multi-trait data for accurate prediction of crop yield

Plant Commun. 2024 May 14:100975. doi: 10.1016/j.xplc.2024.100975. Online ahead of print.

Abstract

Yield prediction is the primary goal of genomic selection (GS)-assisted crop breeding. As yield is a complex quantitative trait, making predictions from genotypic data is challenging. Transfer learning can produce an effective model for a target task by leveraging knowledge from a different, but related, source domain, considered as a great potential method for improving yield prediction by integrating multi-trait data. However, it has not been applied to genotype-to-phenotype prediction before due to the lack of an efficient implementation framework. We therefore developed TrG2P, a transfer learning-based framework. TrG2P first employs convolutional neural networks (CNN) to train models using non-yield trait phenotypic and genotypic data, thus obtaining the pre-trained models. Subsequently, the convolutional layer parameters from these pre-trained models are transferred to the yield prediction task, and the fully connected layers are retrained, thus obtaining the fine-tuned models. Finally, the convolutional layer and the first fully connected layer of the fine-tuned models are fused, and the last fully connected layer is trained to enhance prediction performance. We applied TrG2P to five sets of genotypic and phenotypic data from maize (Zea mays), rice (Oryza sativa), and wheat (Triticum aestivum), and compared model precision to that of seven other popular GS tools: rrBLUP, Random Forest, Support Vector Regression, LightGBM, CNN, DeepGS, and DNNGP. TrG2P improved the accuracy of yield prediction by 39.9%, 6.8%, and 1.8% in rice, maize, and wheat, respectively, compared to predictions generated by the best performing comparison model. Our work therefore demonstrated that transfer learning is an effective strategy for improving yield prediction by integrating information from non-yield trait data. We attribute the enhanced prediction accuracy to the valuable information available from traits associated with yield and to training dataset augmentation. The Python implementation of TrG2P is available at https://github.com/lijinlong1991/TrG2P. The web-based tool is available at http://trg2p.ebreed.cn:81.

Keywords: Crop; Genotype to phenotype; Multi-trait; Transfer learning; Yield prediction.