Development of a Soil Organic Matter Content Prediction Model Based on Supervised Learning Using Vis-NIR/SWIR Spectroscopy

Sensors (Basel). 2022 Jul 8;22(14):5129. doi: 10.3390/s22145129.


In the current scenario of anthropogenic climate change, carbon credit security is becoming increasingly important worldwide. Topsoil is the terrestrial ecosystem component with the largest carbon sequestration capacity. Since soil organic matter (SOM), which is mostly composed of organic carbon, and can be affected by rainfall, cultivation, and pollutant inflow, predicting SOM content through regular monitoring is necessary to secure a stable carbon sink. In addition, topsoil in the Republic of Korea is vulnerable to erosion due to climate, topography, and natural and anthropogenic causes, which is also a serious issue worldwide. To mitigate topsoil erosion, establish an efficient topsoil management system, and maximize topsoil utilization, it is necessary to construct a database or gather data for the construction of a database of topsoil environmental factors and topsoil composition. Spectroscopic techniques have been used in recent studies to rapidly measure topsoil composition. In this study, we investigated the spectral characteristics of the topsoil from four major rivers in the Republic of Korea and developed a machine learning-based SOM content prediction model using spectroscopic techniques. A total of 138 topsoil samples were collected from the waterfront area and drinking water protection zone of each river. The reflection spectrum was measured under the condition of an exposure time of 136 ms using a spectroradiometer (Fieldspec4, ASD Inc., Alpharetta, GA, USA). The reflection spectrum was measured three times in wavelengths ranging from 350 to 2500 nm. To predict the SOM content, partial least squares regression and support vector regression were used. The performance of each model was evaluated through the coefficient of determination (R2) and root mean square error. The result of the SOM content prediction model for the total topsoil was R2 = 0.706. Our findings identified the important wavelength of SOM in topsoil using spectroscopic technology and confirmed the predictability of the SOM content. These results could be used for the construction of a national topsoil database.

Keywords: partial least square regression; reflectance spectroscopy; soil organic matter; support vector machine regression; topsoil.

MeSH terms

  • Carbon
  • Climate Change
  • Ecosystem*
  • Soil* / chemistry
  • Supervised Machine Learning


  • Soil
  • Carbon