Using spatial information for evaluating the quality of prediction maps from hyperspectral images: A geostatistical approach

Anal Chim Acta. 2019 Oct 24;1077:116-128. doi: 10.1016/j.aca.2019.05.067. Epub 2019 May 30.


Applying a calibration model onto hyperspectral (HS) images is of great interest because it produces images of chemical or physical properties. HS imaging is widely used in this way in food processing industries for monitoring product quality and process control. In this context, one of the main difficulties in the application of regression models to HS images is to evaluate the error of the obtained predictions, since in a proximal imaging set up, the size of the pixels is usually much smaller than the area required to obtain a wet chemical reference. Moreover, the selection of regression model parameters, such as the number of latent variables (LV) in a partial least squares (PLS) model, can modify the appearance of the prediction maps. The objective of this work is to propose an approach based on geostatistical indices to use spatial information of prediction maps for supporting the evaluation of regression models applied to HS images. This work stablishes a theoretical connection between linear regression model performance estimates and the spatial decomposition of variance in prediction maps, when the ground truth to estimate is spatially structured. This approach was tested in a simulated dataset and two real case studies. Geostatistical indices of the prediction maps were compared to model performance metrics for PLS models with increasing number of LV. The theoretical framework was proven by the results on the simulated dataset. In particular, the evolution of the nugget effect, C0, corresponded with the evolution of the random error of the model. Conversely, the error term of the model related with the slope of the model corresponded with the evolution of the structured variance observed in the prediction maps. On the real case studies, geostatistical indexes, extracted from the prediction maps, allowed to quantitatively evaluate the spatial structure of the estimations and complement the Root Mean Standard Error of Cross Validation (RMSECV) for the choice of optimal number of LV to consider in the model. The main advantage of this approach is that it does not require ground truth values. It could be used as a source of information for supporting the choice of optimum calibration parameters, such as the number of latent variables, or the choice of pre-treatments, complementing the traditional visual inspection of prediction maps with quantitative and objective metrics.

Keywords: Geostatistics; Hyperspectral; Image; Overfitting; Regression; Spatial.