Computational strategies for national integration of phenotypic, genomic, and pedigree data in a single-step best linear unbiased prediction

J Dairy Sci. 2012 Aug;95(8):4629-45. doi: 10.3168/jds.2011-4982.

Abstract

The single-step genomic BLUP (SSGBLUP) is a method that can integrate pedigree and genotypes at molecular markers in an optimal way. However, its present form (regular SSGBLUP) has a high computational cost (cubic in the number of genotyped animals) and may need extensive rewriting of genetic evaluation software. In this work, we propose several strategies to implement the single step in a simpler manner. The first one expands the single-step mixed-model equations to obtain equivalent equations from which the regular (including pedigree and records only) mixed-model equations are a subset. These new equations (unsymmetric extended SSGBLUP) have low computational cost, but require a nonsymmetric solver such as the biconjugate gradient stabilized method or successive underrelaxation, which is a variant of successive overrelaxation, with a relaxation factor lower than 1. In addition, we show a new derivation of the single-step method, which includes, as an extra effect, deviations from strictly polygenic breeding values. As a result, the same set of equations as above is obtained. We show that, whereas the new derivation shows apparent problems of nonpositive definiteness for certain covariance matrices, a proper equivalent model including imaginary effects always exists, leading always to the regular SSGBLUP mixed model equations. The system of equations can be solved (iterative SSGBLUP) by iterating between a pedigree and records evaluation and a genomic evaluation (each one solved by any iterative or direct method), whereas global iteration can use a block version of successive underrelaxation, which ensures convergence. The genomic evaluation can explicitly include marker or haplotype effects and possibly involve nonlinear (e.g., Bayesian by Markov chain Monte Carlo) methods. In a simulated example with 28,800 individuals and 1,800 genotyped individuals, all methods converged quickly to the same solutions. Using existing efficient methods with limited memory requirements to compute the products Gt and A(22)t for any t (where G and A(22) are genomic and pedigree relationships for genotyped animals, and t is a vector), all strategies can be converted to iteration on data procedures for which the total number of operations is linear in the number of animals + number of genotyped animals × number of markers.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Cattle / genetics*
  • Computer Simulation
  • Female
  • Genome*
  • Male
  • Models, Genetic*
  • Multifactorial Inheritance*
  • Pedigree*