Mapping chronic disease prevalence based on medication use and socio-demographic variables: an application of LASSO on administrative data sources in healthcare in the Netherlands

BMC Public Health. 2021 Jun 2;21(1):1039. doi: 10.1186/s12889-021-10754-4.

Abstract

Background: Policymakers generally lack sufficiently detailed health information to develop localized health policy plans. Chronic disease prevalence mapping is difficult as accurate direct sources are often lacking. Improvement is possible by adding extra information such as medication use and demographic information to identify disease. The aim of the current study was to obtain small geographic area prevalence estimates for four common chronic diseases by modelling based on medication use and socio-economic variables and next to investigate regional patterns of disease.

Methods: Administrative hospital records and general practitioner registry data were linked to medication use and socio-economic characteristics. The training set (n = 707,021) contained GP diagnosis and/or hospital admission diagnosis as the standard for disease prevalence. For the entire Dutch population (n = 16,777,888), all information except GP diagnosis and hospital admission was available. LASSO regression models for binary outcomes were used to select variables strongly associated with disease. Dutch municipality (non-)standardized prevalence estimates for stroke, CHD, COPD and diabetes were then based on averages of predicted probabilities for each individual inhabitant.

Results: Adding medication use data as a predictor substantially improved model performance. Estimates at the municipality level performed best for diabetes with a weighted percentage error (WPE) of 6.8%, and worst for COPD (WPE 14.5%)Disease prevalence showed clear regional patterns, also after standardization for age.

Conclusion: Adding medication use as an indicator of disease prevalence next to socio-economic variables substantially improved estimates at the municipality level. The resulting individual disease probabilities could be aggregated into any desired regional level and provide a useful tool to identify regional patterns and inform local policy.

Keywords: Disease prevalence; Machine learning; Small area estimates.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chronic Disease
  • Delivery of Health Care*
  • Humans
  • Information Storage and Retrieval*
  • Netherlands / epidemiology
  • Prevalence