Statistically enriched geospatial datasets of Brazilian municipalities for data-driven modeling

Sci Data. 2022 Aug 10;9(1):489. doi: 10.1038/s41597-022-01581-2.


The lack of georeferencing in geospatial datasets hinders the accomplishment of scientific studies that rely on accurate data. This is particularly concerning in the field of health sciences, where georeferenced data could lead to scientific results of great relevance to society. The Brazilian health systems, especially those for Notifiable Diseases, in practice do not register georeferenced data; instead, the records indicate merely the municipality in which the event occurred. Typically in data-driven modeling, accurate disease prediction models based on occurrence requires socioenvironmental characteristics of the exact location of each event, which is often unavailable. To enrich the expressiveness of data-driven models when the municipality of the event is the best available information, we produced datasets with statistical characterization of all 5,570 Brazilian municipalities in 642 layers of thematic data that represent the natural and artificial characteristics of the municipalities' landscapes over time. This resulted in a collection of datasets comprising a total of 11,556 descriptive statistics attributes for each municipality.