Objectives: As highlighted by the COVID-19 pandemic, researchers are eager to make use of a wide variety of data sources, both government-sponsored and alternative, to characterise the epidemiology of infectious diseases. The objective of this study is to investigate the strengths and limitations of sources currently being used for research.
Design: Retrospective descriptive analysis.
Primary and secondary outcome measures: Yearly number of national-level and state-level disease-specific case counts and disease clusters for three diseases (measles, mumps and varicella) during a 5-year study period (2013-2017) across four different data sources: Optum (health insurance billing claims data), HealthMap (online news surveillance data), Morbidity and Mortality Weekly Reports (official government reports) and National Notifiable Disease Surveillance System (government case surveillance data).
Results: Our study demonstrated drastic differences in reported infectious disease incidence across data sources. When compared with the other three sources of interest, Optum data showed substantially higher, implausible standardised case counts for all three diseases. Although there was some concordance in identified state-level case counts and disease clusters, all four sources identified variations in state-level reporting.
Conclusions: Researchers should consider data source limitations when attempting to characterise the epidemiology of infectious diseases. Some data sources, such as billing claims data, may be unsuitable for epidemiological research within the infectious disease context.
Keywords: EPIDEMIOLOGY; Epidemiology; Health informatics; Health policy; Public health.
© Author(s) (or their employer(s)) 2023. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.