Background: Methodological research to evaluate the performance of methods requires a benchmark to serve as a referent comparison. In drug safety, the performance of analyses of spontaneous adverse event reporting databases and observational healthcare data, such as administrative claims and electronic health records, has been limited by the lack of such standards.
Objectives: To establish a reference set of test cases that contain both positive and negative controls, which can serve the basis for methodological research in evaluating methods performance in identifying drug safety issues.
Research design: Systematic literature review and natural language processing of structured product labeling was performed to identify evidence to support the classification of drugs as either positive controls or negative controls for four outcomes: acute liver injury, acute kidney injury, acute myocardial infarction, and upper gastrointestinal bleeding.
Results: Three-hundred and ninety-nine test cases comprised of 165 positive controls and 234 negative controls were identified across the four outcomes. The majority of positive controls for acute kidney injury and upper gastrointestinal bleeding were supported by randomized clinical trial evidence, while the majority of positive controls for acute liver injury and acute myocardial infarction were only supported based on published case reports. Literature estimates for the positive controls shows substantial variability that limits the ability to establish a reference set with known effect sizes.
Conclusions: A reference set of test cases can be established to facilitate methodological research in drug safety. Creating a sufficient sample of drug-outcome pairs with binary classification of having no effect (negative controls) or having an increased effect (positive controls) is possible and can enable estimation of predictive accuracy through discrimination. Since the magnitude of the positive effects cannot be reliably obtained and the quality of evidence may vary across outcomes, assumptions are required to use the test cases in real data for purposes of measuring bias, mean squared error, or coverage probability.