Comparing person-level matching algorithms to identify risk across disparate datasets among patients with a controlled substance prescription: retrospective analysis

JAMIA Open. 2022 Mar 30;5(1):ooac020. doi: 10.1093/jamiaopen/ooac020. eCollection 2022 Apr.


Background: The opioid epidemic in the United States has precipitated a need for public health agencies to better understand risk factors associated with fatal overdoses. Matching person-level information stored in public health, medical, and human services datasets can enhance the understanding of opioid overdose risk factors and interventions.

Objective: This study compares approximate match versus exact match algorithms to link disparate datasets together for identifying persons at risk from an applied perspective.

Methods: This study used statewide prescription drug monitoring program (PDMP), arrest, and mortality data matched at the person-level using an approximate match and 2 exact match algorithms. Impact of matching was assessed by analyzing 3 independent concepts: (1) the prevalence of key risk indicators used by PDMP programs in practice, (2) the prevalence of arrests and fatal opioid overdose, and (3) the performance of a multivariate logistic regression for fatal opioid overdose. The PDMP key risk indicators included (1) multiple provider episodes (MPE), or patients with prescriptions from multiple prescribers and dispensers, (2) high morphine milligram equivalents (MMEs), which represents an opioid's potency relative to morphine, and (3) overlapping opioid and benzodiazepine prescriptions.

Results: Prevalence of PDMP-based risk indicators were higher in the approximate match population for MPEs (n = 4893/1 859 445 [0.26%]) and overlapping opioid/benzodiazepines (n = 57 888/1 859 445 [4.71%]), but the exact-basic match population had the highest prevalence of individuals with high MMEs (n = 664/1 910 741 [3.11%]). Prevalence of arrests and deaths were highest for the approximate match population compared with the exact match populations. Model performance was comparable across the 3 matching algorithms (exact-basic validation area under the receiver operating characteristic curve [AUC]: 0.854; approximate validation AUC: 0.847; exact + zip validation AUC: 0.826) but resulted in different cutoff points balancing sensitivity and specificity.

Conclusions: Our study illustrates the specific tradeoffs of different matching methods. Further research should be performed to compare matching algorithms and its impact on the prevalence of key risk indicators in an applied setting that can improve understanding of risk within a population.

Keywords: analgesics; databases; factual; medical record linkage; opioid; overdose; public health.