Evaluating the impact of land uses on stream integrity using machine learning algorithms

Sci Total Environ. 2019 Dec 15:696:133858. doi: 10.1016/j.scitotenv.2019.133858. Epub 2019 Aug 9.

Abstract

A general pattern of declining aquatic ecological integrity with increasing urban land use has been well established for a number of watersheds worldwide. A more nuanced characterization of the influence of different urban land uses and the determination of cumulative thresholds will further inform watershed planning and management. To this end, we investigated the utility of two machine learning algorithms (Random Forests (RF) and Boosted Regression Trees (BRT)) to model stream impairment through multimetric macroinvertebrate index known as High Gradient Macroinvertebrate Index (HGMI) in an urbanizing watershed located in north-central New Jersey, United States. These machine learning algorithms were able to explain at least 50% of the variability of stream integrity based on watershed land use/land cover. While comparable in results, RF was found to be easier to train and was somewhat more robust to model overfitting compared to BRT. Our results document the influence of increasing high-medium density (> 30% Impervious Surface cover (ISC)), low density (15-30% ISC) urban and transitional/barren land had in negatively affecting stream biological integrity. The thresholds generated by partial plots suggest that the stream integrity decreased abruptly when the percentage of high-medium and low density urban, and transitional/barren land went above 10%, 8%, and 2% of the watershed, respectively. Additionally, when rural residential surpassed 30% threshold, it behaved similar to low density urban towards stream integrity. Identification of such cumulative thresholds can help watershed managers and policymakers to craft land use zoning regulations and design restoration programs that are grounded by objective scientific criteria.

Keywords: Boosted regression trees; Eco-hydrology; Machine learning algorithms; Macroinvertebrate index; Partial dependence plot; Random forests; Relative influence; Spearman correlation matrix; Stream integrity.