Development of a stacked ensemble model for forecasting and analyzing daily average PM2.5 concentrations in Beijing, China
- PMID: 29679837
- DOI: 10.1016/j.scitotenv.2018.04.040
Development of a stacked ensemble model for forecasting and analyzing daily average PM2.5 concentrations in Beijing, China
Abstract
A stacked ensemble model is developed for forecasting and analyzing the daily average concentrations of fine particulate matter (PM2.5) in Beijing, China. Special feature extraction procedures, including those of simplification, polynomial, transformation and combination, are conducted before modeling to identify potentially significant features based on an exploratory data analysis. Stability feature selection and tree-based feature selection methods are applied to select important variables and evaluate the degrees of feature importance. Single models including LASSO, Adaboost, XGBoost and multi-layer perceptron optimized by the genetic algorithm (GA-MLP) are established in the level 0 space and are then integrated by support vector regression (SVR) in the level 1 space via stacked generalization. A feature importance analysis reveals that nitrogen dioxide (NO2) and carbon monoxide (CO) concentrations measured from the city of Zhangjiakou are taken as the most important elements of pollution factors for forecasting PM2.5 concentrations. Local extreme wind speeds and maximal wind speeds are considered to extend the most effects of meteorological factors to the cross-regional transportation of contaminants. Pollutants found in the cities of Zhangjiakou and Chengde have a stronger impact on air quality in Beijing than other surrounding factors. Our model evaluation shows that the ensemble model generally performs better than a single nonlinear forecasting model when applied to new data with a coefficient of determination (R2) of 0.90 and a root mean squared error (RMSE) of 23.69μg/m3. For single pollutant grade recognition, the proposed model performs better when applied to days characterized by good air quality than when applied to days registering high levels of pollution. The overall classification accuracy level is 73.93%, with most misclassifications made among adjacent categories. The results demonstrate the interpretability and generalizability of the stacked ensemble model.
Keywords: Air quality forecast; Feature extraction; Feature importance analysis; Feature selection; Stacked generalization strategy.
Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Similar articles
-
The impact of the congestion charging scheme on air quality in London. Part 1. Emissions modeling and analysis of air pollution measurements.Res Rep Health Eff Inst. 2011 Apr;(155):5-71. Res Rep Health Eff Inst. 2011. PMID: 21830496
-
Characterization of PM2.5, gaseous pollutants, and meteorological interactions in the context of time-series health effects models.J Expo Sci Environ Epidemiol. 2007 Dec;17 Suppl 2:S45-60. doi: 10.1038/sj.jes.7500627. J Expo Sci Environ Epidemiol. 2007. PMID: 18079764
-
Spatial and temporal characteristics of air quality and air pollutants in 2013 in Beijing.Environ Sci Pollut Res Int. 2016 Jul;23(14):13996-4007. doi: 10.1007/s11356-016-6518-3. Epub 2016 Apr 4. Environ Sci Pollut Res Int. 2016. PMID: 27040547
-
Time-sensitive prediction of NO2 concentration in China using an ensemble machine learning model from multi-source data.J Environ Sci (China). 2024 Mar;137:30-40. doi: 10.1016/j.jes.2023.02.026. Epub 2023 Feb 26. J Environ Sci (China). 2024. PMID: 37980016 Review.
-
Short-term prediction of urban PM2.5 based on a hybrid modified variational mode decomposition and support vector regression model.Environ Sci Pollut Res Int. 2021 Jan;28(1):56-72. doi: 10.1007/s11356-020-11065-8. Epub 2020 Oct 12. Environ Sci Pollut Res Int. 2021. PMID: 33044693 Review.
Cited by
-
Trends, Extreme Events and Long-term Health Impacts of Particulate Matter in a Southern Indian Industrial Area.Water Air Soil Pollut. 2023;234(5):303. doi: 10.1007/s11270-023-06302-y. Epub 2023 Apr 28. Water Air Soil Pollut. 2023. PMID: 37152894 Free PMC article.
-
Using satellite data on remote transportation of air pollutants for PM2.5 prediction in northern Taiwan.PLoS One. 2023 Mar 10;18(3):e0282471. doi: 10.1371/journal.pone.0282471. eCollection 2023. PLoS One. 2023. PMID: 36897845 Free PMC article.
-
Influence of land-sea breeze on PM[Formula: see text] prediction in central and southern Taiwan using composite neural network.Sci Rep. 2023 Mar 7;13(1):3827. doi: 10.1038/s41598-023-29845-w. Sci Rep. 2023. PMID: 36882455 Free PMC article.
-
Applying an Improved Stacking Ensemble Model to Predict the Mortality of ICU Patients with Heart Failure.J Clin Med. 2022 Oct 31;11(21):6460. doi: 10.3390/jcm11216460. J Clin Med. 2022. PMID: 36362686 Free PMC article.
-
Compute Tomography Radiomics Analysis on Whole Pancreas Between Healthy Individual and Pancreatic Ductal Adenocarcinoma Patients: Uncertainty Analysis and Predictive Modeling.Technol Cancer Res Treat. 2022 Jan-Dec;21:15330338221126869. doi: 10.1177/15330338221126869. Technol Cancer Res Treat. 2022. PMID: 36184987 Free PMC article.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
