Learning important common data elements from shared study data: The All of Us program analysis

PLoS One. 2023 Jul 7;18(7):e0283601. doi: 10.1371/journal.pone.0283601. eCollection 2023.

Abstract

There are many initiatives attempting to harmonize data collection across human clinical studies using common data elements (CDEs). The increased use of CDEs in large prior studies can guide researchers planning new studies. For that purpose, we analyzed the All of Us (AoU) program, an ongoing US study intending to enroll one million participants and serve as a platform for numerous observational analyses. AoU adopted the OMOP Common Data Model to standardize both research (Case Report Form [CRF]) and real-world (imported from Electronic Health Records [EHRs]) data. AoU standardized specific data elements and values by including CDEs from terminologies such as LOINC and SNOMED CT. For this study, we defined all elements from established terminologies as CDEs and all custom concepts created in the Participant Provided Information (PPI) terminology as unique data elements (UDEs). We found 1 033 research elements, 4 592 element-value combinations and 932 distinct values. Most elements were UDEs (869, 84.1%), while most CDEs were from LOINC (103 elements, 10.0%) or SNOMED CT (60, 5.8%). Of the LOINC CDEs, 87 (53.1% of 164 CDEs) originated from previous data collection initiatives, such as PhenX (17 CDEs) and PROMIS (15 CDEs). On a CRF level, The Basics (12 of 21 elements, 57.1%) and Lifestyle (10 of 14, 71.4%) were the only CRFs with multiple CDEs. On a value level, 61.7% of distinct values are from an established terminology. AoU demonstrates the use of the OMOP model for integrating research and routine healthcare data (64 elements in both contexts), which allows for monitoring lifestyle and health changes outside the research setting. The increased inclusion of CDEs in large studies (like AoU) is important in facilitating the use of existing tools and improving the ease of understanding and analyzing the data collected, which is more challenging when using study specific formats.

Publication types

  • Research Support, N.I.H., Intramural

MeSH terms

  • Common Data Elements*
  • Data Collection
  • Delivery of Health Care
  • Humans
  • Population Health*
  • Systematized Nomenclature of Medicine