Portable near-infrared hyperspectral technology fused with machine learning enables simultaneous prediction and classification of protein solubility in peas

Food Chem. 2026 Feb 28:503:147805. doi: 10.1016/j.foodchem.2025.147805. Epub 2025 Dec 31.

Abstract

Portable near-infrared (NIR) spectroscopy combined with multivariate analysis was used to develop models for quantifying and classifying pea protein solubility. The optimal quantitative model (D1-CARS-PLSR) utilized fourteen characteristic wavelengths, achieving a coefficient of determination for the prediction set (RP2) of 0.93 and a root mean square error of prediction (RMSEP) of 0.03. Cross-year validation yielded a coefficient of determination (R2) of 0.99, with a mean absolute error (MAE) of 0.01 and a mean relative error (MRE) of 3.86 %. The optimal classification model (S-G-D2-UVE-RF) required only four characteristic wavelengths, attaining 100.00 % accuracy on the prediction set. The average precision, recall, and F1-score across all categories exceeded 0.90, alongside a five-fold cross-validation accuracy of 96.25 % and an overall cross-year validation accuracy of 91.30 %. The average accuracy of the permutation test results was 38.60 %. This study enables rapid, non-destructive analysis of pea protein solubility, providing a theoretical foundation for food quality control.

Keywords: Cross-validation; Eigenwavelengths; Machine learning (ML); Near-infrared (NIR) spectroscopy; Pea protein; Solubility; Varieties classification.

Publication types

  • Evaluation Study

MeSH terms

  • Machine Learning*
  • Pea Proteins* / chemistry
  • Pisum sativum* / chemistry
  • Plant Proteins* / chemistry
  • Solubility
  • Spectroscopy, Near-Infrared / instrumentation
  • Spectroscopy, Near-Infrared / methods

Substances

  • Pea Proteins
  • Plant Proteins