A function accounting for training set size and marker density to model the average accuracy of genomic prediction

PLoS One. 2013 Dec 5;8(12):e81046. doi: 10.1371/journal.pone.0081046. eCollection 2013.

Abstract

Prediction of genomic breeding values is of major practical relevance in dairy cattle breeding. Deterministic equations have been suggested to predict the accuracy of genomic breeding values in a given design which are based on training set size, reliability of phenotypes, and the number of independent chromosome segments ([Formula: see text]). The aim of our study was to find a general deterministic equation for the average accuracy of genomic breeding values that also accounts for marker density and can be fitted empirically. Two data sets of 5'698 Holstein Friesian bulls genotyped with 50 K SNPs and 1'332 Brown Swiss bulls genotyped with 50 K SNPs and imputed to ∼600 K SNPs were available. Different k-fold (k = 2-10, 15, 20) cross-validation scenarios (50 replicates, random assignment) were performed using a genomic BLUP approach. A maximum likelihood approach was used to estimate the parameters of different prediction equations. The highest likelihood was obtained when using a modified form of the deterministic equation of Daetwyler et al. (2010), augmented by a weighting factor (w) based on the assumption that the maximum achievable accuracy is [Formula: see text]. The proportion of genetic variance captured by the complete SNP sets ([Formula: see text]) was 0.76 to 0.82 for Holstein Friesian and 0.72 to 0.75 for Brown Swiss. When modifying the number of SNPs, w was found to be proportional to the log of the marker density up to a limit which is population and trait specific and was found to be reached with ∼20'000 SNPs in the Brown Swiss population studied.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Breeding*
  • Cattle
  • Dairying
  • Female
  • Genetic Markers / genetics*
  • Genomics / methods*
  • Genotyping Techniques
  • Likelihood Functions
  • Male
  • Models, Statistical*
  • Polymorphism, Single Nucleotide
  • Reproducibility of Results

Substances

  • Genetic Markers

Grant support

This research was funded by the German Federal Ministry of Education and Research within the AgroClustEr ‘Synbreed – Synergistic plant and animal breeding’ (Funding ID: 0315528C) in association with the Deutsche Forschungsgemeinschaft (DFG) research training group ‘Scaling problems in statistics’ (RTG1644). The authors gratefully acknowledge co-funding from the European Commission, under the Seventh Framework Program for Research and Technological Development, for the Collaborative Project LowInputBreeds (Grant agreement No 222623). However, the views expressed by the authors do not necessarily reflect the views of the European Commission, nor do they in any way anticipate the Commission’s future policy in this area. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The authors further acknowledge support by the Open Access Publication Funds of the Göttingen University.