Soil contamination by potentially toxic elements (PTEs) is intensifying under increasing industrialization. Thus, the ability to efficiently delineate contaminated sites is crucial. Visible-near infrared (vis-NIR: 350-2500 nm) and X-ray fluorescence (XRF: 0.02-41.08 keV) spectroscopic techniques have attracted tremendous attention for the assessment of PTEs. Recently, the application of fused vis-NIR and XRF spectroscopy, which is based on the complementary effect of data fusion, is also increasing. Moreover, different data manipulation methods, including feature selection approaches, affect the prediction performance. This study investigated the feasibility of using single and fused vis-NIR and XRF spectra while exploring feature selection algorithms for the assessment of key soil PTEs. The soil samples were collected from one of the most heavily polluted areas of the Czech Republic and scanned using laboratory vis-NIR and XRF spectrometers. Univariate filter (UF) and genetic algorithm (GA) were used to select the bands of greater importance for the PTE prediction. Support vector machine (SVM) was then used to train the models using the full-range and feature-selected spectra of single sensors and their fusion. It was found that XRF spectra alone (primarily GA-selected) performed better than single vis-NIR and fused spectral data for predictions of PTEs. Moreover, the prediction models that were derived from the fused data set (particularly the GA-selected) enhanced the models' accuracies as compared with the single vis-NIR spectra. In general, the results suggest that the GA-selected spectra obtained from the single XRF spectrometer (for As and Pb) and from the fusion of vis-NIR and XRF (for Pb) are promising for accurate quantitative estimation detection of the mentioned PTEs.
Keywords: XRF spectroscopy; data fusion; feature selection; genetic algorithm; soil contamination; univariate filter; vis–NIR spectroscopy.