Bayesian machine learning ensemble approach to quantify model uncertainty in predicting groundwater storage change

Sci Total Environ. 2021 May 15;769:144715. doi: 10.1016/j.scitotenv.2020.144715. Epub 2021 Jan 20.


Agricultural water demand, groundwater extraction, surface water delivery and climate have complex nonlinear relationships with groundwater storage in agricultural regions. As an alternative to elaborate computationally intensive physical models, machine learning methods are often adopted as surrogate to capture such complex relationships due to their high computational efficiency. Inevitably, using only one machine learning model is prone to underestimate prediction uncertainty and subjected to poor accuracy. This study presents a novel machine learning-based groundwater ensemble modeling framework in conjunction with a Bayesian model averaging approach to predict groundwater storage change and improve overall model predicting reliability. Three different machine learning models have been developed namely artificial neural network, support vector machine and response surface regression. To explicitly quantify uncertainty from machine learning model parameter and structure, Bayesian model averaging is employed to produce a forecast distribution associated with each machine learning prediction. Model weights and variances are obtained based on model performance to construct ensemble models. Then, the developed individual and Bayesian model averaging machine learning ensemble models are applied, evaluated and validated at different spatial scales including subregional and regional scales in an overdrafted agricultural region-the San Joaquin River Basin, through independent training and testing dataset. Results shows the machine learning models have remarkable predicting capability without sacrificing accuracy but with higher computational efficiency. Compared to a single-model approach, the ensemble model is able to produce consistently reliable predictions across the basin, yet it does not always outperform the best model in the ensemble. Additionally, model results suggest that groundwater pumping for agricultural irrigation is the primary driving force of groundwater storage change across the region. The modeling framework can serve as an alternative approach to simulating groundwater response, especially in those agricultural regions where lack of subsurface data hinders physically-based modeling.

Keywords: Bayesian model averaging; Groundwater storage change; Irrigation pumping; Machine learning ensemble; Uncertainty quantification.