Environmental variable importance for under-five mortality in Malaysia: A random forest approach

Sci Total Environ. 2022 Nov 1:845:157312. doi: 10.1016/j.scitotenv.2022.157312. Epub 2022 Jul 13.

Abstract

Background: Environmental factors have been associated with adverse health effects in epidemiological studies. The main exposure variable is usually determined via prior knowledge or statistical methods. It may be challenging when evidence is scarce to support prior knowledge, or to address collinearity issues using statistical methods. This study aimed to investigate the importance level of environmental variables for the under-five mortality in Malaysia via random forest approach.

Method: We applied a conditional permutation importance via a random forest (CPI-RF) approach to evaluate the relative importance of the weather- and air pollution-related environmental factors on daily under-five mortality in Malaysia. This study spanned from January 1, 2014 to December 31, 2016. In data preparation, deviation mortality counts were derived through a generalized additive model, adjusting for long-term trend and seasonality. Analyses were conducted considering mortality causes (all-cause, natural-cause, or external-cause) and data structures (continuous, categorical, or all types [i.e., include all variables of continuous type and all variables of categorical type]). The main analysis comprised of two stages. In Stage 1, Boruta selection was applied for preliminary screening to remove highly unimportant variables. In Stage 2, the retained variables from Boruta were used in the CPI-RF analysis. The final importance value was obtained as an average value from a 10-fold cross-validation.

Result: Some heat-related variables (maximum temperature, heat wave), temperature variability, and haze-related variables (PM10, PM10-derived haze index, PM10- and fire-derived haze index, fire hotspot) were among the prominent variables associated with under-five mortality in Malaysia. The important variables were consistent for all- and natural-cause mortality and sensitivity analyses. However, different most important variables were observed between natural- and external-cause under-five mortality.

Conclusion: Heat-related variables, temperature variability, and haze-related variables were consistently prominent for all- and natural-cause under-five mortalities, but not for external-cause.

Keywords: Children; Conditional permutation importance; Environmental factor; Feature selection; Mortality; Under-five.

MeSH terms

  • Air Pollutants* / analysis
  • Air Pollution* / analysis
  • Environmental Exposure / analysis
  • Hot Temperature
  • Malaysia / epidemiology
  • Mortality
  • Particulate Matter / analysis
  • Weather

Substances

  • Air Pollutants
  • Particulate Matter