Optimization of genomic selection training populations with a genetic algorithm

Genet Sel Evol. 2015 May 6;47(1):38. doi: 10.1186/s12711-015-0116-6.


In this article, we imagine a breeding scenario with a population of individuals that have been genotyped but not phenotyped. We derived a computationally efficient statistic that uses this genetic information to measure the reliability of genomic estimated breeding values (GEBV) for a given set of individuals (test set) based on a training set of individuals. We used this reliability measure with a genetic algorithm scheme to find an optimized training set from a larger set of candidate individuals. This subset was phenotyped to create the training set that was used in a genomic selection model to estimate GEBV in the test set. Our results show that, compared to a random sample of the same size, the use of a set of individuals selected by our method improved accuracies. We implemented the proposed training selection methodology on four sets of data on Arabidopsis, wheat, rice and maize. This dynamic model building process that takes genotypes of the individuals in the test sample into account while selecting the training individuals improves the performance of genomic selection models.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Arabidopsis / genetics
  • Genomics / methods*
  • Genotype
  • Models, Genetic
  • Oryza / genetics
  • Phenotype
  • Plant Breeding / methods*
  • Triticum / genetics
  • Zea mays / genetics