Portable near-infrared (NIR) spectroscopy combined with multivariate analysis was used to develop models for quantifying and classifying pea protein solubility. The optimal quantitative model (D1-CARS-PLSR) utilized fourteen characteristic wavelengths, achieving a coefficient of determination for the prediction set (RP2) of 0.93 and a root mean square error of prediction (RMSEP) of 0.03. Cross-year validation yielded a coefficient of determination (R2) of 0.99, with a mean absolute error (MAE) of 0.01 and a mean relative error (MRE) of 3.86 %. The optimal classification model (S-G-D2-UVE-RF) required only four characteristic wavelengths, attaining 100.00 % accuracy on the prediction set. The average precision, recall, and F1-score across all categories exceeded 0.90, alongside a five-fold cross-validation accuracy of 96.25 % and an overall cross-year validation accuracy of 91.30 %. The average accuracy of the permutation test results was 38.60 %. This study enables rapid, non-destructive analysis of pea protein solubility, providing a theoretical foundation for food quality control.
Keywords: Cross-validation; Eigenwavelengths; Machine learning (ML); Near-infrared (NIR) spectroscopy; Pea protein; Solubility; Varieties classification.
Copyright © 2024. Published by Elsevier Ltd.