Data Discovery for Integration of Heterogeneous Medical Datasets in the German Center for Lung Research (DZL)

Stud Health Technol Inform. 2018:253:65-69.

Abstract

The German Centre for Lung Research (DZL) is an association of Germany's leading university and non-university institutions dedicated to lung research. Institutes and disease areas within the DZL manage their own data in several databases and registers using different software tools. Aim of our data integration effort is to provide a single central data warehouse frontend, where all patient related data is combined and made accessible. A two-stage survey was used to determine the data collections suitable for data integration. Integration was performed via extract-transform-load (ETL) steps using custom software. Original software (e.g. eCRF) used by the data collections did not need any modifications. The survey yielded 68 data collections. Until Jan 2018, 20 collections were successfully integrated. 10 collections were withdrawn by their owners while the integration of 38 was delayed. Data discovery, the process of finding existing data collections in a large research network, proved to be the step most underestimated. From technical point of view, data integration proved to be of minor complexity in comparison to the effort required for harmonization/mapping of data elements and management of common terminology.

Keywords: Database; data integration; data warehouse; lung research.

MeSH terms

  • Biomedical Research
  • Databases, Factual*
  • Germany
  • Humans
  • Lung Diseases*
  • Registries*
  • Software
  • Statistics as Topic