Hydrogeochemistry and prediction of arsenic contamination in groundwater of Vehari, Pakistan: comparison of artificial neural network, random forest and logistic regression models

Environ Geochem Health. 2023 Dec 26;46(1):14. doi: 10.1007/s10653-023-01782-7.


Arsenic contamination in the groundwater occurs in various parts of the world due to anthropogenic and natural sources, adversely affecting human health and ecosystems. The current study intends to examine the groundwater hydrogeochemistry containing elevated arsenic (As), predict As levels in groundwater, and determine the aptness of groundwater for drinking in the Vehari district, Pakistan. Four hundred groundwater samples from the study region were collected for physiochemical analysis. As levels in groundwater samples ranged from 0.1 to 52 μg/L, with an average of 11.64 μg/L, (43.5%), groundwater samples exceeded the WHO 2022 recommended limit of 10 μg/L for drinking purposes. Ion-exchange processes and the adsorption of ions significantly impacted the concentration of As. The HCO3- and Na+ are the dominant ions in the study area, and the water types of samples were CaHCO3, mixed CaMgCl, and CaCl, demonstrating that rock-water contact significantly impacts hydrochemical behavior. The geochemical modeling indicated negative saturation indices with calcium carbonate and other salt minerals, encompassing aragonite, calcite, dolomite, and halite. The dissolution mechanism suggested that these minerals might have implications for the mobilization of As in groundwater. A combination of human-induced and natural sources of contamination was unveiled through principal component analysis (PCA). Artificial neural networks (ANN), random forest (RF), and logistic regression (LR) were used to predict As in the groundwater. The data have been divided into two parts for statistical analysis: 20% for testing and 80% for training. The most significant input variables for As prediction was determined using Chi-squared analysis. The receiver operating characteristic area under the curve and confusion matrix were used to evaluate the models; the RF, ANN, and LR accuracies were 0.89, 0.85, and 0.76. The permutation feature and mean decrease in impurity determine ten parameters that influence groundwater arsenic in the study region, including F-, Fe2+, K+, Mg2+, Ca2+, Cl-, SO42-, NO3-, HCO3-, and Na+. The present study shows RF is the best model for predicting groundwater As contamination in the research area. The water quality index showed that 161 samples represent poor water, and 121 samples are unsuitable for drinking. Establishing effective strategies and regulatory measures is imperative in Vehari to ensure the sustainability of groundwater resources.

Keywords: Artificial neural network; Groundwater arsenic; Logistic regression; Machine learning; Random forest; Water quality index.

MeSH terms

  • Arsenic*
  • Ecosystem
  • Groundwater*
  • Humans
  • Ions
  • Logistic Models
  • Neural Networks, Computer
  • Pakistan
  • Random Forest


  • Arsenic
  • Ions