Predicting genomic selection efficiency to optimize calibration set and to assess prediction accuracy in highly structured populations

Theor Appl Genet. 2017 Nov;130(11):2231-2247. doi: 10.1007/s00122-017-2956-7. Epub 2017 Aug 9.


We propose a criterion to predict genomic selection efficiency for structured populations. This criterion is useful to define optimal calibration set and to estimate prediction reliability for multiparental populations. Genomic selection refers to the use of genotypic information for predicting the performance of selection candidates. It has been shown that prediction accuracy depends on various parameters including the composition of the calibration set (CS). Assessing the level of accuracy of a given prediction scenario is of highest importance because it can be used to optimize CS sampling before collecting phenotypes, and once the breeding values are predicted it informs the breeders about the reliability of these predictions. Different criteria were proposed to optimize CS sampling in highly diverse panels, which can be useful to screen collections of genotypes. But plant breeders often work on structured material such as biparental or multiparental populations, for which these criteria are less adapted. We derived from the generalized coefficient of determination (CD) theory different criteria to optimize CS sampling and to assess the reliability associated to predictions in structured populations. These criteria were evaluated on two nested association mapping (NAM) populations and two highly diverse panels of maize. They were efficient to sample optimized CS in most situations. They could also estimate at least partly the reliability associated to predictions between NAM families, but they could not estimate differences in the reliability associated to the predictions of NAM families using the highly diverse panels as calibration sets. We illustrated that the CD criteria could be adapted to various prediction scenarios including inter and intra-family predictions, resulting in higher prediction accuracies.

MeSH terms

  • Calibration
  • Genetics, Population*
  • Genomics / methods*
  • Genotype
  • Models, Genetic*
  • Phenotype
  • Plant Breeding
  • Polymorphism, Single Nucleotide
  • Reproducibility of Results
  • Selection, Genetic*
  • Zea mays / genetics