Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Sep 1:635:644-658.
doi: 10.1016/j.scitotenv.2018.04.040. Epub 2018 Apr 24.

Development of a stacked ensemble model for forecasting and analyzing daily average PM2.5 concentrations in Beijing, China

Affiliations
Free article

Development of a stacked ensemble model for forecasting and analyzing daily average PM2.5 concentrations in Beijing, China

Binxu Zhai et al. Sci Total Environ. .
Free article

Abstract

A stacked ensemble model is developed for forecasting and analyzing the daily average concentrations of fine particulate matter (PM2.5) in Beijing, China. Special feature extraction procedures, including those of simplification, polynomial, transformation and combination, are conducted before modeling to identify potentially significant features based on an exploratory data analysis. Stability feature selection and tree-based feature selection methods are applied to select important variables and evaluate the degrees of feature importance. Single models including LASSO, Adaboost, XGBoost and multi-layer perceptron optimized by the genetic algorithm (GA-MLP) are established in the level 0 space and are then integrated by support vector regression (SVR) in the level 1 space via stacked generalization. A feature importance analysis reveals that nitrogen dioxide (NO2) and carbon monoxide (CO) concentrations measured from the city of Zhangjiakou are taken as the most important elements of pollution factors for forecasting PM2.5 concentrations. Local extreme wind speeds and maximal wind speeds are considered to extend the most effects of meteorological factors to the cross-regional transportation of contaminants. Pollutants found in the cities of Zhangjiakou and Chengde have a stronger impact on air quality in Beijing than other surrounding factors. Our model evaluation shows that the ensemble model generally performs better than a single nonlinear forecasting model when applied to new data with a coefficient of determination (R2) of 0.90 and a root mean squared error (RMSE) of 23.69μg/m3. For single pollutant grade recognition, the proposed model performs better when applied to days characterized by good air quality than when applied to days registering high levels of pollution. The overall classification accuracy level is 73.93%, with most misclassifications made among adjacent categories. The results demonstrate the interpretability and generalizability of the stacked ensemble model.

Keywords: Air quality forecast; Feature extraction; Feature importance analysis; Feature selection; Stacked generalization strategy.

PubMed Disclaimer

Similar articles

Cited by

MeSH terms

LinkOut - more resources