Spatial modeling of land subsidence using machine learning models and statistical methods

Environ Sci Pollut Res Int. 2022 Apr;29(19):28866-28883. doi: 10.1007/s11356-021-18037-6. Epub 2022 Jan 6.

Abstract

Land subsidence causes many problems every year and damages residential areas and agricultural lands. The purpose of this study is to prepare a susceptibility map to the phenomenon of land subsidence in the central and eastern plains of Fars province in Iran using statistical and machine learning models. Initially, with a wide inspection, the locations of land subsidence in the study region were recorded using the global positioning system (GPS), and a spatial distribution of subsidence was provided then for building and evaluating learning models; the data was partitioned into two sections of calibration (70%) and testing (30%) dataset. In the following stage, the maps of the factors affecting the land subsidence were prepared using basic information (geological and topographic maps and satellite images) in raster format, and the relationship between land subsidence locations and the effective factors including slope percentage, slope aspect, distance from the road, distance from the river, land use, plan curvature, topographic wetness index, geology (lithological units), distance from the faults, and groundwater level changes was considered in the study area. To investigate the multicollinearity between independent variables, tolerance and variance inflation factor (VIF) measures were used, and to prioritize the effective factors, the random forest (RF) algorithm was applied. The results indicated that the most important factors affecting land subsidence were groundwater level changes, land use, height, distance from the fault, distance from the river, and topographic wetness index, respectively. For further analysis, a land subsidence susceptibility zoning map was prepared using logistic regression (LR), random forest (RF), boosting regression tree (BRT), and support vector machine (SVM) models, and the results were evaluated. The evaluation results indicated that the models mentioned have high accuracy in modeling land subsidence such that the boosting regression tree and the logistic regression have high (0.873 and 0.853, respectively) and the random forest and support vector machine models have very high accuracy (0.953 and 0.926, respectively). The findings of this study indicated that the machine learning techniques and prepared maps can be applied for land use planning, groundwater management, and management of the study area for future agriculture tasks.

Keywords: Boosting regression tree; Land subsidence; Logistic regression; Random forest; Support vector machine.

MeSH terms

  • Geographic Information Systems
  • Geology
  • Groundwater* / analysis
  • Machine Learning
  • Rivers