Background: The self-controlled case series (SCCS) offers potential as an statistical method for risk identification involving medical products from large-scale observational healthcare data. However, analytic design choices remain in encoding the longitudinal health records into the SCCS framework and its risk identification performance across real-world databases is unknown.
Objectives: To evaluate the performance of SCCS and its design choices as a tool for risk identification in observational healthcare data.
Research design: We examined the risk identification performance of SCCS across five design choices using 399 drug-health outcome pairs in five real observational databases (four administrative claims and one electronic health records). In these databases, the pairs involve 165 positive controls and 234 negative controls. We also consider several synthetic databases with known relative risks between drug-outcome pairs.
Measures: We evaluate risk identification performance through estimating the area under the receiver-operator characteristics curve (AUC) and bias and coverage probability in the synthetic examples.
Results: The SCCS achieves strong predictive performance. Twelve of the twenty health outcome-database scenarios return AUCs >0.75 across all drugs. Including all adverse events instead of just the first per patient and applying a multivariate adjustment for concomitant drug use are the most important design choices. However, the SCCS as applied here returns relative risk point-estimates biased towards the null value of 1 with low coverage probability.
Conclusions: The SCCS recently extended to apply a multivariate adjustment for concomitant drug use offers promise as a statistical tool for risk identification in large-scale observational healthcare databases. Poor estimator calibration dampens enthusiasm, but on-going work should correct this short-coming.