A method and software framework for enriching private biomedical sources with data from public online repositories

J Biomed Inform. 2016 Apr:60:177-86. doi: 10.1016/j.jbi.2016.02.004. Epub 2016 Feb 10.

Abstract

Modern biomedical research relies on the semantic integration of heterogeneous data sources to find data correlations. Researchers access multiple datasets of disparate origin, and identify elements-e.g. genes, compounds, pathways-that lead to interesting correlations. Normally, they must refer to additional public databases in order to enrich the information about the identified entities-e.g. scientific literature, published clinical trial results, etc. While semantic integration techniques have traditionally focused on providing homogeneous access to private datasets-thus helping automate the first part of the research, and there exist different solutions for browsing public data, there is still a need for tools that facilitate merging public repositories with private datasets. This paper presents a framework that automatically locates public data of interest to the researcher and semantically integrates it with existing private datasets. The framework has been designed as an extension of traditional data integration systems, and has been validated with an existing data integration platform from a European research project by integrating a private biological dataset with data from the National Center for Biotechnology Information (NCBI).

Keywords: Public databases; RDF; Semantic integration.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomedical Research
  • Computational Biology / methods
  • Databases, Factual
  • Humans
  • Information Storage and Retrieval / methods*
  • MicroRNAs / genetics
  • Semantics*
  • Software*
  • Systems Integration*
  • User-Computer Interface
  • Wilms Tumor / genetics

Substances

  • MicroRNAs