Impacts of socioeconomic and environmental factors on neoplasms incidence rates using machine learning and GIS: a cross-sectional study in Iran

Sci Rep. 2024 May 8;14(1):10604. doi: 10.1038/s41598-024-61397-5.

Abstract

Neoplasm is an umbrella term used to describe either benign or malignant conditions. The correlations between socioeconomic and environmental factors and the occurrence of new-onset of neoplasms have already been demonstrated in a body of research. Nevertheless, few studies have specifically dealt with the nature of relationship, significance of risk factors, and geographic variation of them, particularly in low- and middle-income communities. This study, thus, set out to (1) analyze spatiotemporal variations of the age-adjusted incidence rate (AAIR) of neoplasms in Iran throughout five time periods, (2) investigate relationships between a collection of environmental and socioeconomic indicators and the AAIR of neoplasms all over the country, and (3) evaluate geographical alterations in their relative importance. Our cross-sectional study design was based on county-level data from 2010 to 2020. AAIR of neoplasms data was acquired from the Institute for Health Metrics and Evaluation (IHME). HotSpot analyses and Anselin Local Moran's I indices were deployed to precisely identify AAIR of neoplasms high- and low-risk clusters. Multi-scale geographically weight regression (MGWR) analysis was worked out to evaluate the association between each explanatory variable and the AAIR of neoplasms. Utilizing random forests (RF), we also examined the relationships between environmental (e.g., UV index and PM2.5 concentration) and socioeconomic (e.g., Gini coefficient and literacy rate) factors and AAIR of neoplasms. AAIR of neoplasms displayed a significant increasing trend over the study period. According to the MGWR, the only factor that significantly varied spatially and was associated with the AAIR of neoplasms in Iran was the UV index. A good accuracy RF model was confirmed for both training and testing data with correlation coefficients R2 greater than 0.91 and 0.92, respectively. UV index and Gini coefficient ranked the highest variables in the prediction of AAIR of neoplasms, based on the relative influence of each variable. More research using machine learning approaches taking the advantages of considering all possible determinants is required to assess health strategies outcomes and properly formulate policy planning.

Keywords: Incidence rate of neoplasms; Multiscale geographical weighted regression; Random forest; Risk factors; Spatiotemporal analysis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cross-Sectional Studies
  • Environmental Exposure / adverse effects
  • Female
  • Geographic Information Systems
  • Humans
  • Incidence
  • Iran / epidemiology
  • Machine Learning*
  • Male
  • Neoplasms* / epidemiology
  • Neoplasms* / etiology
  • Risk Factors
  • Socioeconomic Factors*