IDPredictor: predict database links in biomedical database

Hendrik Mehlhorn; Matthias Lange; Uwe Scholz; Falk Schreiber

doi:10.2390/biecoll-jib-2012-190

IDPredictor: predict database links in biomedical database

J Integr Bioinform. 2012 Jun 26;9(2):190. doi: 10.2390/biecoll-jib-2012-190.

Authors

Hendrik Mehlhorn¹, Matthias Lange, Uwe Scholz, Falk Schreiber

Affiliation

¹ Leibniz Institute of Plant Genetics and Crop Plant Research, Gatersleben, Germany. mehlhorn@ipk-gatersleben.de

PMID: 22736059
DOI: 10.2390/biecoll-jib-2012-190

Abstract

Knowledge found in biomedical databases, in particular in Web information systems, is a major bioinformatics resource. In general, this biological knowledge is worldwide represented in a network of databases. These data is spread among thousands of databases, which overlap in content, but differ substantially with respect to content detail, interface, formats and data structure. To support a functional annotation of lab data, such as protein sequences, metabolites or DNA sequences as well as a semi-automated data exploration in information retrieval environments, an integrated view to databases is essential. Search engines have the potential of assisting in data retrieval from these structured sources, but fall short of providing a comprehensive knowledge except out of the interlinked databases. A prerequisite of supporting the concept of an integrated data view is to acquire insights into cross-references among database entities. This issue is being hampered by the fact, that only a fraction of all possible cross-references are explicitely tagged in the particular biomedical informations systems. In this work, we investigate to what extend an automated construction of an integrated data network is possible. We propose a method that predicts and extracts cross-references from multiple life science databases and possible referenced data targets. We study the retrieval quality of our method and report on first, promising results. The method is implemented as the tool IDPredictor, which is published under the DOI 10.5447/IPK/2012/4 and is freely available using the URL: http://dx.doi.org/10.5447/IPK/2012/4.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Computational Biology / methods*
Databases, Factual*
Information Storage and Retrieval
Neural Networks, Computer
Software*