Common Data Elements (CDEs) are necessary for ensuring data sharing across studies, providing comparability, and enabling aggregation and meta-analyses. The process of developing a set of CDEs for a given clinical research area has typically been arduous and time-consuming. In this work we introduce an automated pipeline that can greatly aid the process by identifying, aggregating, and ranking relevant CDEs from the outcomes of studies registered on clinicaltrials.gov (CTG). The pipeline uses the Medical Subject Headings (MeSH) ontology to group and rank candidate CDEs by specific diseases. The initial CDE pipeline has been tested using an emerging research domain. The resulting CDEs output was aligned with the current recommendations in the corresponding subject area. Further development of automated means for CDE generation based on structured information from CTG and MeSH is warranted.
Keywords: Common Data Elements; MeSH tree; automated data extraction; clinical trials; clinicaltrials.gov; outcomes; xml.