Background: Community associated methicillin-resistant Staphylococcus aureus (CA-MRSA) is one of the most common causes of skin and soft tissue infections in the United States, and a variety of genetic host factors are suspected to be risk factors for recurrent infection. Based on the CDC definition, we have developed and validated an electronic health record (EHR) based CA-MRSA phenotype algorithm utilizing both structured and unstructured data.
Methods: The algorithm was validated at three eMERGE consortium sites, and positive predictive value, negative predictive value and sensitivity, were calculated. The algorithm was then run and data collected across seven total sites. The resulting data was used in GWAS analysis.
Results: Across seven sites, the CA-MRSA phenotype algorithm identified a total of 349 cases and 7761 controls among the genotyped European and African American biobank populations. PPV ranged from 68 to 100% for cases and 96 to 100% for controls; sensitivity ranged from 94 to 100% for cases and 75 to 100% for controls. Frequency of cases in the populations varied widely by site. There were no plausible GWAS-significant (p < 5 E -8) findings.
Conclusions: Differences in EHR data representation and screening patterns across sites may have affected identification of cases and controls and accounted for varying frequencies across sites. Future work identifying these patterns is necessary.
Keywords: Electronic Health Record; GWAS; Phenotyping; ca-MRSA Phenotype; ca_MRSA.