Data Mining for Adverse Drug Events With a Propensity Score-matched Tree-based Scan Statistic

Epidemiology. 2018 Nov;29(6):895-903. doi: 10.1097/EDE.0000000000000907.


The tree-based scan statistic is a statistical data mining tool that has been used for signal detection with a self-controlled design in vaccine safety studies. This disproportionality statistic adjusts for multiple testing in evaluation of thousands of potential adverse events. However, many drug safety questions are not well suited for self-controlled analysis. We propose a method that combines tree-based scan statistics with propensity score-matched analysis of new initiator cohorts, a robust design for investigations of drug safety. We conducted plasmode simulations to evaluate performance. In multiple realistic scenarios, tree-based scan statistics in cohorts that were propensity score matched to adjust for confounding outperformed tree-based scan statistics in unmatched cohorts. In scenarios where confounding moved point estimates away from the null, adjusted analyses recovered the prespecified type 1 error while unadjusted analyses inflated type 1 error. In scenarios where confounding moved point estimates toward the null, adjusted analyses preserved power, whereas unadjusted analyses greatly reduced power. Although complete adjustment of true confounders had the best performance, matching on a moderately mis-specified propensity score substantially improved type 1 error and power compared with no adjustment. When there was true elevation in risk of an adverse event, there were often co-occurring signals for clinically related concepts. TreeScan with propensity score matching shows promise as a method for screening and prioritization of potential adverse events. It should be followed by clinical review and safety studies specifically designed to quantify the magnitude of effect, with confounding control targeted to the outcome of interest.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Confounding Factors, Epidemiologic
  • Data Mining / methods*
  • Drug-Related Side Effects and Adverse Reactions / epidemiology*
  • Humans
  • Propensity Score
  • Software
  • Statistics as Topic