Optimizing genomic prediction model given causal genes in a dairy cattle population

Jinyan Teng; Shuwen Huang; Zitao Chen; Ning Gao; Shaopan Ye; Shuqi Diao; Xiangdong Ding; Xiaolong Yuan; Hao Zhang; Jiaqi Li; Zhe Zhang

doi:10.3168/jds.2020-18233

Optimizing genomic prediction model given causal genes in a dairy cattle population

J Dairy Sci. 2020 Nov;103(11):10299-10310. doi: 10.3168/jds.2020-18233. Epub 2020 Sep 18.

Authors

Jinyan Teng¹, Shuwen Huang¹, Zitao Chen¹, Ning Gao², Shaopan Ye¹, Shuqi Diao¹, Xiangdong Ding³, Xiaolong Yuan¹, Hao Zhang¹, Jiaqi Li¹, Zhe Zhang⁴

Affiliations

¹ Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
² State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, North Third Road, Guangzhou Higher Education Mega Center, Guangzhou 510006, China.
³ National Engineering Laboratory for Animal Breeding, Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China.
⁴ Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China. Electronic address: zhezhang@scau.edu.cn.

PMID: 32952023
DOI: 10.3168/jds.2020-18233

Abstract

As genotypic data are moving from SNP chip toward whole-genome sequence, the accuracy of genomic prediction (GP) exhibits a marginal gain, although all genetic variation, including causal genes, are contained in whole-genome sequence data. Meanwhile, genetic analyses on complex traits, such as genome-wide association studies, have identified an increasing number of genomic regions, including potential causal genes, which would be reliable prior knowledge for GP. Many studies have tried to improve the performance of GP by modifying the prediction model to incorporate prior knowledge. Although several plausible results have been obtained from model modification or strategy optimization, most of them were validated in a specific empirical population with a limited variety of genetic architecture for complex traits. An alternative approach is to use simulated genetic architecture with known causal genes (e.g., simulated causative SNP) to evaluate different GP models with given causal genes. Our objectives were to (1) evaluate the performance of GP under a variety of genetic architectures with a subset of known causal genes and (2) compare different GP models modified by highlighting causal genes and different strategies to weight causal genes. In this study, we simulated pseudo-phenotypes under a variety of genetic architectures based on the real genotypes and phenotypes of a dairy cattle population. Besides classical genomic best linear unbiased prediction, we evaluated 3 modified GP models that highlight causal genes as follows: (1) by treating them as fixed effects, (2) by treating them as a separate random component, and (3) by combining them into the genomic relationship matrix as random effects. Our results showed that highlighting the known causal genes, which explained a considerable proportion of genetic variance in the GP models, increased the predictive accuracy. Combining all given causal genes into the genomic relationship matrix was the optimal strategy under all the scenarios validated, and treating causal genes as a separate random component is also recommended, when more than 20% of genetic variance was explained by known causal genes. Moreover, assigning differential weights to each causal gene further improved the predictive accuracy.

Keywords: causal gene; genomic selection; prior knowledge; whole-genome sequence data.

MeSH terms

Animals
Cattle / genetics*
Female
Genome / genetics*
Genome-Wide Association Study / veterinary
Genomics*
Genotype
Models, Genetic
Multifactorial Inheritance / genetics*
Phenotype
Whole Genome Sequencing / veterinary