Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar 10;655:512-519.
doi: 10.1016/j.scitotenv.2018.11.022. Epub 2018 Nov 5.

Modeling Groundwater Nitrate Exposure in Private Wells of North Carolina for the Agricultural Health Study

Affiliations
Free PMC article

Modeling Groundwater Nitrate Exposure in Private Wells of North Carolina for the Agricultural Health Study

Kyle P Messier et al. Sci Total Environ. .
Free PMC article

Abstract

Unregulated private wells in the United States are susceptible to many groundwater contaminants. Ingestion of nitrate, the most common anthropogenic private well contaminant in the United States, can lead to the endogenous formation of N-nitroso-compounds, which are known human carcinogens. In this study, we expand upon previous efforts to model private well groundwater nitrate concentration in North Carolina by developing multiple machine learning models and testing against out-of-sample prediction. Our purpose was to develop exposure estimates in unmonitored areas for use in the Agricultural Health Study (AHS) cohort. Using approximately 22,000 private well nitrate measurements in North Carolina, we trained and tested continuous models including a censored maximum likelihood-based linear model, random forest, gradient boosted machine, support vector machine, neural networks, and kriging. Continuous nitrate models had low predictive performance (R2 < 0.33), so multiple random forest classification models were also trained and tested. The final classification approach predicted <1 mg/L, 1-5 mg/L, and ≥5 mg/L using a random forest model with 58 variables and maximizing the Cohen's kappa statistic. The final model had an overall accuracy of 0.75 and high specificity for the higher two categories and high sensitivity for the lowest category. The results will be used for the categorical prediction of private well nitrate for AHS cohort participants that reside in North Carolina.

Keywords: Agricultural Health Study; Exposure assessment; Groundwater contamination; Nitrate; Random Forest.

Figures

Figure 1.
Figure 1.
Variable importance for top 20 most important variables in random forest continuous model for nitrate (all predictor variables are defined in the SI).
Figure 2.
Figure 2.
Variable importance for the random forest classification model that maximized the kappa agreement for three categories of nitrate (all predictor variables are defined in the SI)
Figure 3.
Figure 3.
(Top) Max kappa classification model predictions from the tuned random forest model. (Bottom) Observed test set nitrate concentrations (mg/L NO3-N) in the three categories. The areas of Duplin County and Sampson County with high observed nitrate concentrations are highlighted with solid fill.

Similar articles

See all similar articles

Cited by 2 articles

LinkOut - more resources

Feedback