Background: Secondary use of data, whether from clinical information systems or registries, for carrying out clinical research in rare diseases is a common practice but is fraught with potential errors. We sought to elucidate some of the limitations of database research and describe possible solutions to overcome these limitations.
Methods: Using a disease model of a rare postsurgical outcome, we evaluated the ability of four different data sources to correctly identify patients who had that outcome both as individual databases and also when used in conjunction with each other. These results were compared with manual chart review.
Results: The sensitivity of the various databases to pick up a rare and specific outcome was poor (9.9%-37%), while the specificities were fairly good (91%-96.7%). By combining the databases, the sensitivity was increased to as much as 56.8% without a large decrease in the specificity (85.2%-91.6%). The electronic medical record (EMR) search engine had the highest sensitivity (96.9%) and a high specificity (89.3%) with a very high negative predictive value (99.4%).
Conclusion: For rare and specific diseases or outcomes, a single data source search methodology can miss large numbers of patients and potentially bias study results. Combining overlapping databases can improve the ability to capture these rare diseases or outcomes. While chart review remains the most accurate way to obtain complete case capture, new tools like EMR search engines can facilitate the efficiency of this process without sacrificing search quality.
Keywords: arrhythmia; congenital heart disease; congenital heart surgery; database.