Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Feb 27;5:180023.
doi: 10.1038/sdata.2018.23.

Datasets2Tools, Repository and Search Engine for Bioinformatics Datasets, Tools and Canned Analyses

Affiliations
Free PMC article

Datasets2Tools, Repository and Search Engine for Bioinformatics Datasets, Tools and Canned Analyses

Denis Torre et al. Sci Data. .
Free PMC article

Abstract

Biomedical data repositories such as the Gene Expression Omnibus (GEO) enable the search and discovery of relevant biomedical digital data objects. Similarly, resources such as OMICtools, index bioinformatics tools that can extract knowledge from these digital data objects. However, systematic access to pre-generated 'canned' analyses applied by bioinformatics tools to biomedical digital data objects is currently not available. Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles. By incorporating community engagement, Datasets2Tools promotes sharing of digital resources to stimulate the extraction of knowledge from biomedical research data. Datasets2Tools is freely available from: http://amp.pharm.mssm.edu/datasets2tools.

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1. Schematic illustration of the Datasets2Tools components and workflow.
Bioinformatics analyses, biomedical datasets, and computational tools are collected through web crawling, API access, and manual curation. These digital objects are indexed within the Datasets2Tools database and made available through a website, a Google Chrome extension, and an API. Users can additionally submit their own analyses and evaluate the FAIRness of indexed digital objects through a form.
Figure 2
Figure 2. Structure of the canned analysis digital object.
The canned analysis is a new type of digital object defined by three sets of components. First, a set of core elements that include a title, a description, and a link to a webpage containing the results of the analysis. Second, references to the biomedical datasets and computational tools used to generate the object. Third, a set of metadata annotations consisting of keywords and structured key-value pairs.
Figure 3
Figure 3. Examples of multiple canned analyses generated from the same dataset.
Canned analyses can be used to index information about different types of bioinformatics analyses. The figure shows how a single RNA-seq dataset can be used to generate eight different canned analyses, such as interactive clustered heatmap visualizations and gene expression signature analyses.
Figure 4
Figure 4. Screenshot of the tool search interface on the Datasets2Tools website.
(a) Results of a computational tool search for the keyword enrichment. Search results are represented as cards, and are sorted by a combination of the tool’s attention metrics, citations, FAIR evaluation results, and number of analyses associated to them. (b) Overview of the information described on a tool card for the tool Enrichr. This includes the tool’s name and description, the number of canned analyses generated by the tool, date and citations of the associated publication, Altmetric and PlumX Metrics badges, and a summary of the results of the FAIRness evaluation.
Figure 5
Figure 5. Screenshot of the FAIRness evaluation interfaces for a canned analysis.
(a) Screenshot of a FAIR evaluation form of a canned analysis, as displayed on an example canned analysis landing page. The form consists of nine Yes or No questions concerning the canned analysis findability, accessibility, interoperability, and reusability. (b) Insignia representing the results of the FAIRness evaluations submitted by users for a bioinformatics tool. Each square represents the results of an individual question, and its color ranges from blue (100% positive answers) to red (100% negative answers).
Figure 6
Figure 6. Interface embedded by the Chrome extension on a GEO search results page.
(a) The Chrome extension embeds toolbars below datasets indexed by Datasets2Tools. The icons on the toolbar represent different tools that have been used to analyze the dataset. (b) By clicking on the icons, the user can access the results of the analyses run by the tool applied to the dataset, as well as view metadata associated to the tool, dataset, and canned analysis.

Similar articles

See all similar articles

Cited by 3 articles

References

    1. Edgar R., Domrachev M. & Lash A. E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic acids research 30, 207–210 (2002). - PMC - PubMed
    1. Ohno-Machado L. et al. Finding useful data across multiple biomedical data repositories using DataMed. Nature Genetics 49, 816–819 (2017). - PMC - PubMed
    1. McQuilton P. et al. BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences. Database (Oxford) 2016, baw075 (2016). - PMC - PubMed
    1. Henry V. J., Bandrowski A. E., Pepin A.-S., Gonzalez B. J. & Desfeux A. OMICtools: an informative directory for multi-omic data analysis. Database 2014, bau069 (2014). - PMC - PubMed
    1. Warde-Farley D. et al. The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic acids research 38, W214–W220 (2010). - PMC - PubMed

Publication types

Feedback