Predictive modeling of nitrogen and phosphorus concentrations in rivers using a machine learning framework: A case study in an urban-rural transitional area in Wenzhou China

Sci Total Environ. 2024 Feb 1:910:168521. doi: 10.1016/j.scitotenv.2023.168521. Epub 2023 Nov 18.

Abstract

Rapid urbanization in China since 1980 generated environmental pressures of non-point source pollution (NPSP) and increased wide public concerns. Excessive quantities of nitrogen (N) and phosphorus (P) is a significant source of aquatic pollution, despite of their roles as essential nutritional elements for aquatic life processes. In this study, we present a new framework using random forest (RF) as a powerful machine learning algorithm driven by geo-datasets to estimate and map the concentration of total nitrogen (TN) and phosphorus (TP) at a spatial resolution for the Wen-Rui Tang River (WRTR) watershed, which is a typically urban-rural transitional area in east coastal region of China. A comprehensive GIS database of 26 in-house built environmental variables was adopted to build the predictive models of TN and TP in open waters over the watershed. The performances of the RF regression models were evaluated in comparison with in-situ measurements, and the results indicated the ability of RF regression models to accurately predict the spatiotemporal distribution of N and P concentration in rivers. Charactering the explanatory variable importance measures in the calibrated RF regression model defined the most significant variables impacting N and P contaminations in open waters across the urban-rural transitional area, and the results showed that these variables are aquaculture, direct domestic sewage, industrial wastewater discharges and the changing meteorological variables. Besides, mapping of the TN and TP concentrations across the continuous river at high spatiotemporal resolution (daily, 1 km × 1 km) in this study were informative. The results in this study provided the valuable data to various different stakeholders for managing water quality and pollution control where similar regions with rapid urbanization and a lack of water quality monitoring datasets.

Keywords: Non-point source pollution; Random forest regression model; Total nitrogen; Total phosphorus; Urban-rural transitional area; Water quality.