Mapping the geogenic radon potential for Germany by machine learning

Sci Total Environ. 2021 Feb 1;754:142291. doi: 10.1016/j.scitotenv.2020.142291. Epub 2020 Sep 14.


The radioactive gas radon (Rn) is considered as an indoor air pollutant due to its detrimental effects on human health. In fact, exposure to Rn belongs to the most important causes for lung cancer after tobacco smoking. The dominant source of indoor Rn is the ground beneath the house. The geogenic Rn potential (GRP) - a function of soil gas Rn concentration and soil gas permeability - quantifies what "earth delivers in terms of Rn" and represents a hazard indicator for elevated indoor Rn concentration. In this study, we aim at developing an improved spatial continuous GRP map based on 4448 field measurements of GRP distributed across Germany. We fitted three different machine learning algorithms, multivariate adaptive regression splines, random forest and support vector machines utilizing 36 candidate predictors. Predictor selection, hyperparameter tuning and performance assessment were conducted using a spatial cross-validation where the data was iteratively left out by spatial blocks of 40 km*40 km. This procedure counteracts the effect of spatial auto-correlation in predictor and response data and minimizes dependence of training and test data. The spatial cross-validated performance statistics revealed that random forest provided the most accurate predictions. The predictors selected as informative reflect geology, climate (temperature, precipitation and soil moisture), soil hydraulic, soil physical (field capacity, coarse fraction) and soil chemical properties (potassium and nitrogen concentration). Model interpretation techniques such as predictor importance as well as partial and spatial dependence plots confirmed the hypothesized dominant effect of geology on GRP, but also revealed significant contributions of the other predictors. Partial and spatial dependence plots gave further valuable insight into the quantitative predictor-response relationship and its spatial distribution. A comparison with a previous version of the German GRP map using 1359 independent test data indicates a significantly better performance of the random forest based map.

Keywords: Digital soil mapping; Geogenic radon potential; Machine learning; Partial dependence; Soil radon; Spatial cross-validation.