Random Forests approach for identifying additive and epistatic single nucleotide polymorphisms associated with residual feed intake in dairy cattle

J Dairy Sci. 2013 Oct;96(10):6716-29. doi: 10.3168/jds.2012-6237. Epub 2013 Aug 9.

Abstract

Feed efficiency is an economically important trait in the beef and dairy cattle industries. Residual feed intake (RFI) is a measure of partial efficiency that is independent of production level per unit of body weight. The objective of this study was to identify significant associations between single nucleotide polymorphism (SNP) markers and RFI in dairy cattle using the Random Forests (RF) algorithm. Genomic data included 42,275 SNP genotypes for 395 Holstein cows, whereas phenotypic measurements were daily RFI from 50 to 150 d postpartum. Residual feed intake was defined as the difference between an animal's feed intake and the average intake of its cohort, after adjustment for year and season of calving, year and season of measurement, age at calving nested within parity, days in milk, milk yield, body weight, and body weight change. Random Forests is a widely used machine-learning algorithm that has been applied to classification and regression problems. By analyzing the tree structures produced within RF, the 25 most frequent pairwise SNP interactions were reported as possible epistatic interactions. The importance scores that are generated by RF take into account both main effects of variables and interactions between variables, and the most negative value of all importance scores can be used as the cutoff level for declaring SNP effects as significant. Ranking by importance scores, 188 SNP surpassed the threshold, among which 38 SNP were mapped to RFI quantitative trait loci (QTL) regions reported in a previous study in beef cattle, and 2 SNP were also detected by a genome-wide association study in beef cattle. The ratio of number of SNP located in RFI QTL to the total number of SNP in the top 188 SNP chosen by RF was significantly higher than in all 42,275 whole-genome markers. Pathway analysis indicated that many of the top 188 SNP are in genomic regions that contain annotated genes with biological functions that may influence RFI. Frequently occurring ancestor-descendant SNP pairs can be explored as possible epistatic effects for further study. The importance scores generated by RF can be used effectively to identify large additive or epistatic SNP and informative QTL. The consistency in results of our study and previous studies in beef cattle indicates that the genetic architecture of RFI in dairy cattle might be similar to that of beef cattle.

Keywords: Random Forest; dairy cattle; residual feed intake; single nucleotide polymorphism.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Animal Feed
  • Animals
  • Artificial Intelligence
  • Body Weight / genetics
  • Cattle
  • Eating / genetics*
  • Epistasis, Genetic*
  • Female
  • Genetic Markers
  • Genome-Wide Association Study
  • Genotype
  • Meat*
  • Phenotype
  • Polymorphism, Single Nucleotide*
  • Quantitative Trait Loci*
  • Random Allocation

Substances

  • Genetic Markers