Unregulated private wells in the United States are susceptible to many groundwater contaminants. Ingestion of nitrate, the most common anthropogenic private well contaminant in the United States, can lead to the endogenous formation of N-nitroso-compounds, which are known human carcinogens. In this study, we expand upon previous efforts to model private well groundwater nitrate concentration in North Carolina by developing multiple machine learning models and testing against out-of-sample prediction. Our purpose was to develop exposure estimates in unmonitored areas for use in the Agricultural Health Study (AHS) cohort. Using approximately 22,000 private well nitrate measurements in North Carolina, we trained and tested continuous models including a censored maximum likelihood-based linear model, random forest, gradient boosted machine, support vector machine, neural networks, and kriging. Continuous nitrate models had low predictive performance (R2 < 0.33), so multiple random forest classification models were also trained and tested. The final classification approach predicted <1 mg/L, 1-5 mg/L, and ≥5 mg/L using a random forest model with 58 variables and maximizing the Cohen's kappa statistic. The final model had an overall accuracy of 0.75 and high specificity for the higher two categories and high sensitivity for the lowest category. The results will be used for the categorical prediction of private well nitrate for AHS cohort participants that reside in North Carolina.
Keywords: Agricultural Health Study; Exposure assessment; Groundwater contamination; Nitrate; Random Forest.
Copyright © 2018. Published by Elsevier B.V.