Evaluating the performance of machine learning methods and variable selection methods for predicting difficult-to-measure traits in Holstein dairy cattle using milk infrared spectral data

J Dairy Sci. 2021 Jul;104(7):8107-8121. doi: 10.3168/jds.2020-19861. Epub 2021 Apr 15.


Fourier-transform infrared (FTIR) spectroscopy is a powerful high-throughput phenotyping tool for predicting traits that are expensive and difficult to measure in dairy cattle. Calibration equations are often developed using standard methods, such as partial least squares (PLS) regression. Methods that employ penalization, rank-reduction, and variable selection, as well as being able to model the nonlinear relations between phenotype and FTIR, might offer improvements in predictive ability and model robustness. This study aimed to compare the predictive ability of 2 machine learning methods, namely random forest (RF) and gradient boosting machine (GBM), and penalized regression against PLS regression for predicting 3 phenotypes differing in terms of biological meaning and relationships with milk composition (i.e., phenotypes measurable directly and not directly in milk, reflecting different biological processes which can be captured using milk spectra) in Holstein-Friesian cattle under 2 cross-validation scenarios. The data set comprised phenotypic information from 471 Holstein-Friesian cows, and 3 target phenotypes were evaluated: (1) body condition score (BCS), (2) blood β-hydroxybutyrate (BHB, mmol/L), and (3) κ-casein expressed as a percentage of nitrogen (κ-CN, % N). The data set was split considering 2 cross-validation scenarios: samples-out random in which the population was randomly split into 10-folds (8-folds for training and 1-fold for validation and testing); and herd/date-out in which the population was randomly assigned to training (70% herd), validation (10%), and testing (20% herd) based on the herd and date in which the samples were collected. The random grid search was performed using the training subset for the hyperparameter optimization and the validation set was used for the generalization of prediction error. The trained model was then used to assess the final prediction in the testing subset. The grid search for penalized regression evidenced that the elastic net (EN) was the best regularization with increase in predictive ability of 5%. The performance of PLS (standard model) was compared against 2 machine learning techniques and penalized regression using 2 cross-validation scenarios. Machine learning methods showed a greater predictive ability for BCS (0.63 for GBM and 0.61 for RF), BHB (0.80 for GBM and 0.79 for RF), and κ-CN (0.81 for GBM and 0.80 for RF) in samples-out cross-validation. Considering a herd/date-out cross-validation these values were 0.58 (GBM and RF) for BCS, 0.73 (GBM and RF) for BHB, and 0.77 (GBM and RF) for κ-CN. The GBM model tended to outperform other methods in predictive ability around 4%, 1%, and 7% for EN, RF, and PLS, respectively. The prediction accuracies of the GBM and RF models were similar, and differed statistically from the PLS model in samples-out random cross-validation. Although, machine learning techniques outperformed PLS in herd/date-out cross-validation, no significant differences were observed in terms of predictive ability due to the large standard deviation observed for predictions. Overall, GBM achieved the highest accuracy of FTIR-based prediction of the different phenotypic traits across the cross-validation scenarios. These results indicate that GBM is a promising method for obtaining more accurate FTIR-based predictions for different phenotypes in dairy cattle.

Keywords: dairy cattle; gradient boosting machine; milk spectra; phenotypic prediction.

MeSH terms

  • 3-Hydroxybutyric Acid
  • Animals
  • Cattle
  • Female
  • Machine Learning*
  • Milk*
  • Phenotype
  • Spectroscopy, Fourier Transform Infrared / veterinary


  • 3-Hydroxybutyric Acid