Conversion of CPRD AURUM Data into the OMOP Common Data Model

Inform Med Unlocked. 2023:43:101407. doi: 10.1016/j.imu.2023.101407. Epub 2023 Nov 10.


Introduction: Efforts to standardize clinical data using Common Data Models (CDMS) has grown in recent years. Use of CDMs allows for quicker understanding of data structure and reuse of existing tools. One CDM is the Observational Medical Outcomes Partnership (OMOP) CDM. Clinical Practice Research Datalink (CPRD) is a data collection program collecting general practitioner data in the UK.

Objective: Our objective was to convert a static copy of CPRD AURUM data into the OMOP CDM and run existing tools on the converted data.

Methods: Two methods were used to convert each CPRD file into the OMOP CDM. The first was direct mapping used when converting CPRD files that had comparable tables in the OMOP CDM. The original names were changed to the OMOP equivalent and source values converted to standardized OMOP concepts. CPRD files: Patient (to OMOP Person), Staff (to Provider), Drug Issue (to Drug Exposure) and Practice (to Care Site) were directly mapped. The second method was indirect where for the CPRD Observation file the domain of each data row was used to assign data to proper OMOP tables or columns done by converting all source values to standard concepts.

Results: The OMOP CDM conversion populated 12 tables and 20,240,453,339 rows, with the largest table being the Measurement table (5,202,579,174 data row). Mapping source values to OMOP standard concepts, we found 60.2% (46,413 of 77,149) of source concepts were also standard concepts. The Drug Exposure table had the fewest source values already in the standard form as only 4.7% (1,433 of 30,194) of the source concepts were standard concepts. On a data retention level, only 2.00% of all data rows were excluded as they did not have a clear fit in the developed CDM and were not able to stand alone without additional information which was not present.

Conclusion: CPRD AURUM was successfully converted into the OMOP CDM with minimal data loss. Existing OHDSI tools were used with the converted data to show efficacy of the converted data. The existence of a standardized version of CPRD AURUM data vastly increases its reusability in future research due to increased understanding and tools available.

Keywords: Clinical Informatics; Common Data Model; Data Science; Real World Data.