ProvCaRe Semantic Provenance Knowledgebase: Evaluating Scientific Reproducibility of Research Studies

AMIA Annu Symp Proc. 2018 Apr 16;2017:1705-1714. eCollection 2017.


Scientific reproducibility is critical for biomedical research as it enables us to advance science by building on previous results, helps ensure the success of increasingly expensive drug trials, and allows funding agencies to make informed decisions. However, there is a growing "crisis" of reproducibility as evidenced by a recent Nature journal survey of more than 1500 researchers that found that 70% of researchers were not able to replicate results from other research groups and more than 50% of researchers were not able reproduce their own research results. In 2016, the National Institutes of Health (NIH) announced the "Rigor and Reproducibility" guidelines to support reproducibility in biomedical research. A key component of the NIH Rigor and Reproducibility guidelines is the recording and analysis of "provenance" information, which describes the origin or history of data and plays a central role in ensuring scientific reproducibility. As part of the NIH Big Data to Knowledge (BD2K)-funded data provenance project, we have developed a new informatics framework called Provenance for Clinical and Healthcare Research (ProvCaRe) to extract, model, and analyze provenance information from published literature describing research studies. Using sleep medicine research studies that have made their data available through the National Sleep Research Resource (NSRR), we have developed an automated pipeline to identify and extract provenance metadata from published literature that is made available for analysis in the ProvCaRe knowledgebase. NSRR is the largest repository of sleep data from over 40,000 studies involving 36,000 participants and we used 75 published articles describing 6 research studies to populate the ProvCaRe knowledgebase. We evaluated the ProvCaRe knowledgebase with 28,474 "provenance triples" using hypothesis-driven queries to identify and rank research studies based on the provenance information extracted from published articles.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Biological Ontologies
  • Biomedical Research / standards*
  • Guidelines as Topic
  • Health Services Research / standards
  • Humans
  • Knowledge Bases*
  • Metadata*
  • National Institutes of Health (U.S.)
  • Reproducibility of Results*
  • Semantics
  • Sleep
  • United States