Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;6 Suppl 3(Suppl 3):S20.
doi: 10.1186/1752-0509-6-S3-S20. Epub 2012 Dec 17.

A Semantic Proteomics Dashboard (SemPoD) for Data Management in Translational Research

Affiliations
Free PMC article

A Semantic Proteomics Dashboard (SemPoD) for Data Management in Translational Research

Catherine P Jayapandian et al. BMC Syst Biol. .
Free PMC article

Abstract

Background: One of the primary challenges in translational research data management is breaking down the barriers between the multiple data silos and the integration of 'omics data with clinical information to complete the cycle from the bench to the bedside. The role of contextual metadata, also called provenance information, is a key factor ineffective data integration, reproducibility of results, correct attribution of original source, and answering research queries involving "What", "Where", "When", "Which", "Who", "How", and "Why" (also known as the W7 model). But, at present there is limited or no effective approach to managing and leveraging provenance information for integrating data across studies or projects. Hence, there is an urgent need for a paradigm shift in creating a "provenance-aware" informatics platform to address this challenge. We introduce an ontology-driven, intuitive Semantic Proteomics Dashboard (SemPoD) that uses provenance together with domain information (semantic provenance) to enable researchers to query, compare, and correlate different types of data across multiple projects, and allow integration with legacy data to support their ongoing research.

Results: The SemPoD platform, currently in use at the Case Center for Proteomics and Bioinformatics (CPB), consists of three components: (a) Ontology-driven Visual Query Composer, (b) Result Explorer, and (c) Query Manager. Currently, SemPoD allows provenance-aware querying of 1153 mass-spectrometry experiments from 20 different projects. SemPod uses the systems molecular biology provenance ontology (SysPro) to support a dynamic query composition interface, which automatically updates the components of the query interface based on previous user selections and efficiently prunes the result set usinga "smart filtering" approach. The SysPro ontology re-uses terms from the PROV-ontology (PROV-O) being developed by the World Wide Web Consortium (W3C) provenance working group, the minimum information required for reporting a molecular interaction experiment (MIMIx), and the minimum information about a proteomics experiment (MIAPE) guidelines. The SemPoD was evaluated both in terms of user feedback and as scalability of the system.

Conclusions: SemPoD is an intuitive and powerful provenance ontology-driven data access and query platform that uses the MIAPE and MIMIx metadata guideline to create an integrated view over large-scale systems molecular biology datasets. SemPoD leverages the SysPro ontology to create an intuitive dashboard for biologists to compose queries, explore the results, and use a query manager for storing queries for later use. SemPoD can be deployed over many existing database applications storing 'omics data, including, as illustrated here, the LabKey data-management system. The initial user feedback evaluating the usability and functionality of SemPoD has been very positive and it is being considered for wider deployment beyond the proteomics domain, and in other 'omics' centers.

Figures

Figure 1
Figure 1
The SemPoD Architecture with the SysPro ontology. Figure shows the high-level architecture of SemPoD web-based dynamic query interface that uses the Model-View-Controller (MVC) architecture design pattern. The figure shows how SemPoD integrates 'omics datasets from heterogenous data sources like Proteus LIMS using a ontology-driven, provenance based approach. The figure also shows how SemPoD interfaces with data analysis and viewing applications like Labkey.
Figure 2
Figure 2
The SysPro class hierarchy and instances for class 'Cell Line'. This figure shows the hierarchy of classes in the SysPro ontology. This figure is a screenshot taken from Protege tool that was used to create the SysPro ontology. On selecting a class, for example 'Bait Type', the instances of this class is shown on the right pane, namely 'Endogenous', 'Exogenous', 'KnockIn', 'Tagged' and 'Untagged'.
Figure 3
Figure 3
The mapping process of SysPro ontology terms in the LabKey. This figure shows the steps for configuring a data source for querying the underlying 'omics data. A data source can be dynamically configured by mapping the SysPro ontology classes to the underlying database. After configuration, a query can be composed using the query builder and submitted on this data source.
Figure 4
Figure 4
The four constituent modules of SemPoD. This figure shows the four main components of SemPoD namely the hierarchical ontology browser with checkboxes for selection of multiple concepts that will used as parameters in query composition, query builder, results viewer that interfaces with third-party data analysis applications like Labkey and query manager that shows the list of saved queries that act as templates for future querying.
Figure 5
Figure 5
SemPoD Query Builder. Query Builder is an intuitive interface that allows selection of query conditions from the SysPro ontology browser and create dynamic queries by selecting different logical connectives and parameter instances.
Figure 6
Figure 6
Screenshot illustrating the "smart filtering" feature implemented in the query builder. Smart filtering is an feature that enables effective selection of query parameter and their instances during query composition. Smart filtering updates the drop-down list for a selected query condition based on all the previously selected query parameters. The advantage of this approach is to eliminate selection of query parameters that will not bring any valid results.
Figure 7
Figure 7
Screenshot illustrating the use of property linking two instances for populating drop-down menus. Smart Filtering feature leverages instance-level relationships defined in SysPro ontology, which links only specific instance values with each other.
Figure 8
Figure 8
Use of property hadRole in the SysPro ontology linking classes Cell Line and Bait Gene. This figure shows an example of instance-level relationships between the Bait Gene and Cell Line classes defined in SysPro ontology.
Figure 9
Figure 9
The result explorer allows users to link out to the underlying Labkey database. Query results are shown in separate tabs, one for each project. The experiment files are listed which then interface with underlying Labkey proteomics data.
Figure 10
Figure 10
The query manager showing a list of queries with details describing date of creation and update. Query Manager shows the list of saved queries by differnet users. On expanding the row, the details of the query are shown.
Figure 11
Figure 11
Results of user feedback after 2 months and 4 months of SemPoD deployment. User ratings for 2 surveys are shown for questions 1-16. Survey 1 was done after 2 months of deployment and survey 2 was done after 4 months of deployment.
Figure 12
Figure 12
Results for queries with increasing complexity over two datasets and two servers. Performance evaluation of queries for increasing query complexity for the queries listed in Table 1.

Similar articles

See all similar articles

Cited by 3 articles

References

    1. Editorial-Introduction. Challenges and Opportunities. Science. 2011;331(6018):692–692. - PubMed
    1. Editorial. Integrating with integrity. Nat Genet. 2010;42(1):1. - PubMed
    1. Goble C. Position Statement: Musings on Provenance, Workflow and (Semantic Web) Annotations for Bioinformatics. Workshop on Data Derivation and Provenance: 2002; Chicago. 2002.
    1. Sahoo SS, Nguyen V, Bodenreider O, Parikh P, Minning T, Sheth AP. A unified framework for managing provenance information in translational research. BMC Bioinformatics. 2011;12:461. doi: 10.1186/1471-2105-12-461. - DOI - PMC - PubMed
    1. Lee T, Bressan S. Multimodal Integration of Disparate Information Sources with Attribution. Entity Relationship Workshop on Information Retrieval and Conceptual Modeling. 1997.

Publication types

MeSH terms

LinkOut - more resources

Feedback