Estimated allele substitution effects underlying genomic evaluation models depend on the scaling of allele counts

Aniek C Bouwman; Ben J Hayes; Mario P L Calus

doi:10.1186/s12711-017-0355-9

Estimated allele substitution effects underlying genomic evaluation models depend on the scaling of allele counts

Genet Sel Evol. 2017 Oct 30;49(1):79. doi: 10.1186/s12711-017-0355-9.

Authors

Aniek C Bouwman¹, Ben J Hayes^{2

3}, Mario P L Calus⁴

Affiliations

¹ Animal Breeding and Genomics Centre, Wageningen Livestock Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands. Aniek.Bouwman@wur.nl.
² Queensland Alliance for Agriculture and Food Innovation, Centre for Animal Science, University of Queensland, Brisbane, QLD, Australia.
³ Department of Economic Development, Jobs, Transport and Resources, Government of Victoria, 5 Ring Rd., Bundoora, VIC, 3083, Australia.
⁴ Animal Breeding and Genomics Centre, Wageningen Livestock Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands.

Abstract

Background: Genomic evaluation is used to predict direct genomic values (DGV) for selection candidates in breeding programs, but also to estimate allele substitution effects (ASE) of single nucleotide polymorphisms (SNPs). Scaling of allele counts influences the estimated ASE, because scaling of allele counts results in less shrinkage towards the mean for low minor allele frequency (MAF) variants. Scaling may become relevant for estimating ASE as more low MAF variants will be used in genomic evaluations. We show the impact of scaling on estimates of ASE using real data and a theoretical framework, and in terms of power, model fit and predictive performance.

Results: In a dairy cattle dataset with 630 K SNP genotypes, the correlation between DGV for stature from a random regression model using centered allele counts (RRc) and centered and scaled allele counts (RRcs) was 0.9988, whereas the overall correlation between ASE using RRc and RRcs was 0.27. The main difference in ASE between both methods was found for SNPs with a MAF lower than 0.01. Both the ratio (ASE from RRcs/ASE from RRc) and the regression coefficient (regression of ASE from RRcs on ASE from RRc) were much higher than 1 for low MAF SNPs. Derived equations showed that scenarios with a high heritability, a large number of individuals and a small number of variants have lower ratios between ASE from RRc and RRcs. We also investigated the optimal scaling parameter [from - 1 (RRcs) to 0 (RRc) in steps of 0.1] in the bovine stature dataset. We found that the log-likelihood was maximized with a scaling parameter of - 0.8, while the mean squared error of prediction was minimized with a scaling parameter of - 1, i.e., RRcs.

Conclusions: Large differences in estimated ASE were observed for low MAF SNPs when allele counts were scaled or not scaled because there is less shrinkage towards the mean for scaled allele counts. We derived a theoretical framework that shows that the difference in ASE due to shrinkage is heavily influenced by the power of the data. Increasing the power results in smaller differences in ASE whether allele counts are scaled or not.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Animals
Cattle / genetics
Female
Gene Frequency*
Genome-Wide Association Study / methods*
Genome-Wide Association Study / standards
Male
Models, Genetic
Polymorphism, Single Nucleotide*