Practical Extension of Provenance to Healthcare Data Based on the W3C PROV Standard

Stud Health Technol Inform. 2018;253:28-32.


Secondary use of healthcare data is dependent on the availability of provenance data for assessing its quality, reliability or trustworthiness. Usually, instance-level data that might be communicated by HL7 interfaces entail limited metadata about involved software systems, persons or organizations bearing responsibility for those systems. This paper proposes a strategy for capturing interoperable provenance data needed by data stewards for assessing healthcare data that are reused in a research context. Aimed at a realistic level of granularity even system-level metadata will support a data steward trying to trace the origins or provenance of healthcare data that have been transferred to the research context. Those metadata are extracted from the 3LGM2-system, used for modelling hospital information systems. Based on the W3C provenance specification interrelated activities, entities and agents can be integrated and stored in RDF triple stores and therefore queried and visualized.

Keywords: Secondary use of EHR data; interoperable provenance data.

MeSH terms

  • Delivery of Health Care / statistics & numerical data*
  • Metadata*
  • Reproducibility of Results
  • Software
  • Statistics as Topic*