Automated gene data integration with Databio

BMC Res Notes. 2020 Apr 1;13(1):195. doi: 10.1186/s13104-020-05038-w.

Abstract

Objective: Although sequencing and other high-throughput data production technologies are increasingly affordable, data analysis and interpretation remains a significant factor in the cost of -omics studies. Despite the broad acceptance of findable, accessible, interoperable, and reusable (FAIR) data principles which focus on data discoverability and annotation, data integration remains a significant bottleneck in linking prior work in order to better understand novel research. Relevant and timely information discovery is difficult for increasingly multi-disciplinary projects when scientists cannot easily keep up with work across multiple fields. Computational tools are necessary to accurately describe data contents, and empower linkage to existing resources without prior knowledge of the various database resources.

Results: We developed the Databio tool, accessible at https://datab.io/, to automate data parsing, identifier detection, and streamline common tasks to provide a point-and-click approach to data manipulation and integration in life sciences research and translational medicine. Databio uses fast real-time data structures and a data warehouse of 137 million identifiers, with automated heuristics to describe data provenance without highly specialized knowledge or bioinformatics training.

Keywords: Data integration; Knowledge discovery; Workflow automation.

MeSH terms

  • Computational Biology*
  • Databases, Genetic*
  • Electronic Data Processing*
  • Internet
  • Software*
  • User-Computer Interface
  • Workflow