Who are we missing? Underrepresentation of data sources used for pharmacoepidemiology research in the United States

Pharmacoepidemiol Drug Saf. 2020 Nov;29(11):1494-1498. doi: 10.1002/pds.5087. Epub 2020 Aug 20.


Purpose: Research using healthcare databases often includes patients frequently excluded from clinical trials; yet it is not known whether commonly used data represents the overall population or specific sub-populations of interest. We aimed to examine population representativeness from data sources in recent research studies in the United States (US).

Methods: We identified data sources from abstracts accepted to the 34th International Conference on Pharmacoepidemiology & Therapeutic Risk Management. The final sample included research studies using ≥1 data source from the US. We classified data sources broadly as claims, linkage, electronic health records (EHR), survey, distributed data network, and other. Studies using claims and EHRs were further classified into more specific categories, including special populations of interest (eg, children).

Results: We identified 356 abstracts. The majority used claims data (n = 201, 56.5%), followed by data linkages (n = 46, 12.9%), and EHR data (n = 39, 11.0%). Among EHR studies, most (n = 16, 41.0%) came from network data sources (eg, Kaiser Permanente). Almost half (49.4%) of claims-based studies used commercial claims data sources, followed by Medicare (22.1%), Medicaid (11.3%), and Medicare Supplemental (6.1%). Only 15% of studies included children in the study population (n = 53), with 8% focused on a pediatric topic (n = 27).

Conclusions: We find that certain populations in the US are under-represented in pharmacoepidemiology, particularly Medicaid enrollees and children. Researchers should strive to utilize data sources that may be more representative of the US population, particularly vulnerable populations.

Keywords: claims; data sources; electronic health record; pharmacoepidemiology; representativeness.