Urban ozone variability using automated machine learning: inference from different feature importance schemes

Environ Monit Assess. 2024 Mar 23;196(4):393. doi: 10.1007/s10661-024-12549-7.

Abstract

Tropospheric ozone is an air pollutant at the ground level and a greenhouse gas which significantly contributes to the global warming. Strong anthropogenic emissions in and around urban environments enhance surface ozone pollution impacting the human health and vegetation adversely. However, observations are often scarce and the factors driving ozone variability remain uncertain in the developing regions of the world. In this regard, here, we conducted machine learning (ML) simulations of ozone variability and comprehensively examined the governing factors over a major urban environment (Ahmedabad) in western India. Ozone precursors (NO2, NO, CO, C5H8 and CH2O) from the CAMS (Copernicus Atmosphere Monitoring Service) reanalysis and meteorological parameters from the ERA5 (European Centre for Medium-Range Weather Forecast's (ECMWF) fifth-generation reanalysis) were included as features in the ML models. Automated ML (AutoML) fitted the deep learning model optimally and simulated the daily ozone with root mean square error (RMSE) of ~2 ppbv reproducing 84-88% of variability. The model performance achieved here is comparable to widely used ML models (RF-Random Forest and XGBoost-eXtreme Gradient Boosting). Explainability of the models is discussed through different schemes of feature importance, including SAGE (Shapley Additive Global importancE) and permutation importance. The leading features are found to be different from different feature importance schemes. We show that urban ozone could be simulated well (RMSE = 2.5 ppbv and R2 = 0.78) by considering first four leading features, from different schemes, which are consistent with ozone photochemistry. Our study underscores the need to conduct science-informed analysis of feature importance from multiple schemes to infer the roles of input variables in ozone variability. AutoML-based studies, exploiting potentials of long-term observations, can strongly complement the conventional chemistry-transport modelling and can also help in accurate simulation and forecast of urban ozone.

Keywords: Air pollution; Air quality; Artificial intelligence; Atmospheric chemistry; AutoML; Machine learning; Meteorology; Modelling; Ozone; Precursors; Random Forest; XGBoost.

MeSH terms

  • Air Pollutants* / analysis
  • Air Pollution* / analysis
  • Environmental Monitoring
  • Humans
  • Machine Learning
  • Ozone* / analysis

Substances

  • Ozone
  • Air Pollutants