Application of support vector regression to genome-assisted prediction of quantitative traits

Theor Appl Genet. 2011 Nov;123(7):1065-74. doi: 10.1007/s00122-011-1648-y. Epub 2011 Jul 8.

Abstract

A byproduct of genome-wide association studies is the possibility of carrying out genome-enabled prediction of disease risk or of quantitative traits. This study is concerned with predicting two quantitative traits, milk yield in dairy cattle and grain yield in wheat, using dense molecular markers as predictors. Two support vector regression (SVR) models, ε-SVR and least-squares SVR, were explored and compared to a widely applied linear regression model, the Bayesian Lasso, the latter assuming additive marker effects. Predictive performance was measured using predictive correlation and mean squared error of prediction. Depending on the kernel function chosen, SVR can model either linear or nonlinear relationships between phenotypes and marker genotypes. For milk yield, where phenotypes were estimated breeding values of bulls (a linear combination of the data), SVR with a Gaussian radial basis function (RBF) kernel had a slightly better performance than with a linear kernel, and was similar to the Bayesian Lasso. For the wheat data, where phenotype was raw grain yield, the RBF kernel provided clear advantages over the linear kernel, e.g., a 17.5% increase in correlation when using the ε-SVR. SVR with a RBF kernel also compared favorably to the Bayesian Lasso in this case. It is concluded that a nonlinear RBF kernel may be an optimal choice for SVR, especially when phenotypes to be predicted have a nonlinear dependency on genotypes, as it might have been the case in the wheat data.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Alleles
  • Animals
  • Bayes Theorem
  • Cattle
  • Computational Biology / methods*
  • Genome-Wide Association Study
  • Genomics / methods
  • Genotype
  • Humans
  • Models, Statistical
  • Normal Distribution
  • Phenotype
  • Predictive Value of Tests
  • Regression Analysis
  • Support Vector Machine*
  • Triticum / genetics*