Thousands of deaths associated with air pollution each year could be prevented by forecasting the behavior of factors that pose risks to people's health and their geographical distribution. Proximity to pollution sources, degree of urbanization, and population density are some of the factors whose spatial distribution enables the identification of possible influence on the presence of respiratory diseases (RD). Currently, Bogotá is among the cities with the poorest air quality in Latin America. Specifically, the locality of Kennedy is one of the zones in the city with the highest recorded concentration levels of local pollutants over the last 10 years. From 2009 to 2016, there were 8619 deaths associated with respiratory and cardiovascular diseases in the locality. Given these characteristics, this study set out to identify and analyze the areas in which the primary socioeconomic and environmental conditions contribute to the presence of symptoms associated with RD. To this end, information collected in field by performing georeferenced surveys was analyzed through geostatistical and machine learning tools which carried out cluster and pattern analyses. Random forests and AdaBoost were applied to establish hot spots where RD could occur, given the conjugation of predictor variables in the micro-territory. It was found that random forests outperformed AdaBoost with 0.63 AUC. In particular, this study's approach applies to densely populated municipalities with high levels of air pollution. In using these tools, municipalities can anticipate environmental health situations and reduce the cost of respiratory disease treatments.
Keywords: Air quality; Geostatistics; Hot spots; Machine learning; Sustainable development.