Lipid extraction using the traditional, destructive Soxhlet method is not able to measure oil content (OC) on a single olive. As the color and near infrared spectrum are key parameters to build an oil estimation model (EM), this study grouped olives with similar color and NIR for building EM of oil content obtained by Soxhlet from a cluster of similar olives. The objective was to estimate OC of individual olives, based on clusters of similar color and NIR in two seasons. This study was performed with Arbequina olives in 2016 and 2017. The descriptor of the cluster consisted of the three color channels of c1c2c3 color model plus 11 reflectance points between 1710 and 1735 nm of each olive, normalized with the Z-score index. Clusters of similar color and NIR spectrum were formed with the k-means++ algorithm, leaving a sufficient number of olives to perform the Soxhlet analysis of OC, as reference value of EM. The training of EM was based on Support Vector Machine. The test was performed with Leave One-Out Cross Validation in different training-testing combinations. The best EM predicted the OC with 6 and 13% deviation with respect to the real value when one season was tested with itself and with another season, respectively. The use of clustering in EM is discussed.
Keywords: infrared spectroscopy; olive quality; support vector machine; visible image.