Background: Electronic Medical Records (EMRs) are increasingly used in the provision of primary care and have been compiled into databases which can be utilized for surveillance, research and informing practice. The primary purpose of these records is for the provision of individual patient care; validation and examination of underlying limitations is crucial for use for research and data quality improvement. This study examines and describes the validity of chronic disease case definition algorithms and factors affecting data quality in a primary care EMR database.
Methods: A retrospective chart audit of an age stratified random sample was used to validate and examine diagnostic algorithms applied to EMR data from the Manitoba Primary Care Research Network (MaPCReN), part of the Canadian Primary Care Sentinel Surveillance Network (CPCSSN). The presence of diabetes, hypertension, depression, osteoarthritis and chronic obstructive pulmonary disease (COPD) was determined by review of the medical record and compared to algorithm identified cases to identify discrepancies and describe the underlying contributing factors.
Results: The algorithm for diabetes had high sensitivity, specificity and positive predictive value (PPV) with all scores being over 90%. Specificities of the algorithms were greater than 90% for all conditions except for hypertension at 79.2%. The largest deficits in algorithm performance included poor PPV for COPD at 36.7% and limited sensitivity for COPD, depression and osteoarthritis at 72.0%, 73.3% and 63.2% respectively. Main sources of discrepancy included missing coding, alternative coding, inappropriate diagnosis detection based on medications used for alternate indications, inappropriate exclusion due to comorbidity and loss of data.
Conclusions: Comparison to medical chart review shows that at MaPCReN the CPCSSN case finding algorithms are valid with a few limitations. This study provides the basis for the validated data to be utilized for research and informs users of its limitations. Analysis of underlying discrepancies provides the ability to improve algorithm performance and facilitate improved data quality.