Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Dec 10;15(1):386.
doi: 10.1186/s12859-014-0386-y.

Finding Gene Regulatory Network Candidates Using the Gene Expression Knowledge Base

Affiliations
Free PMC article

Finding Gene Regulatory Network Candidates Using the Gene Expression Knowledge Base

Aravind Venkatesan et al. BMC Bioinformatics. .
Free PMC article

Abstract

Background: Network-based approaches for the analysis of large-scale genomics data have become well established. Biological networks provide a knowledge scaffold against which the patterns and dynamics of 'omics' data can be interpreted. The background information required for the construction of such networks is often dispersed across a multitude of knowledge bases in a variety of formats. The seamless integration of this information is one of the main challenges in bioinformatics. The Semantic Web offers powerful technologies for the assembly of integrated knowledge bases that are computationally comprehensible, thereby providing a potentially powerful resource for constructing biological networks and network-based analysis.

Results: We have developed the Gene eXpression Knowledge Base (GeXKB), a semantic web technology based resource that contains integrated knowledge about gene expression regulation. To affirm the utility of GeXKB we demonstrate how this resource can be exploited for the identification of candidate regulatory network proteins. We present four use cases that were designed from a biological perspective in order to find candidate members relevant for the gastrin hormone signaling network model. We show how a combination of specific query definitions and additional selection criteria derived from gene expression data and prior knowledge concerning candidate proteins can be used to retrieve a set of proteins that constitute valid candidates for regulatory network extensions.

Conclusions: Semantic web technologies provide the means for processing and integrating various heterogeneous information sources. The GeXKB offers biologists such an integrated knowledge resource, allowing them to address complex biological questions pertaining to gene expression. This work illustrates how GeXKB can be used in combination with gene expression results and literature information to identify new potential candidates that may be considered for extending a gene regulatory network.

Figures

Figure 1
Figure 1
Core CCK2R network and novel candidate regulators. The core of the gastrin mediated signal transduction network (CCK2R), and the novel candidate regulators resulting from our queries are shown. The CCK2R DbTFs that were targeted in our queries are colored light green. The network components in grey and the solid lines connecting them are part of the core CCK2R network and documented as regulators of the CCK2R DbTFs and respond to gastrin. The dotted lines represent new relations identified by the queries which could be verified against literature: blue pointed arrows denote ‘activation or positive influence’ and red bar-headed arrows depict ‘repression or negative influence’. CREB1 candidate regulators identified through Q1, Q2 and Q3 are colored yellow. Candidate regulators of NFκB1 identified through Q4 are colored turquoise, and candidate regulators of TCF7L2 identified through Q5 are colored orange. The target genes shared by the CCK2R DbTFs (CREB1 and NFκB1) and the DbTF candidates identified through Q6 are colored light red (JUN and BRCA2) and their connections are shown as solid arrows.
Figure 2
Figure 2
The data integration pipeline. The integration starts by generating an Upper Level Ontology, which is then linked with the different ontologies: GO (Biological Process, Molecular Function and Cellular Component fragments), the MI ontology and the Biorel ontology, forming a seed ontology. Mouse, human and rat-specific data are integrated from Gene Ontology Annotation files and IntAct. Next, these species-specific ontologies are merged and additional data is integrated including protein information (UniProt), pathway annotations (KEGG), basic information for genes (NCBI) and orthology relations for proteins (orthAgogue). The final ontology is available in OBO and RDF formats.
Figure 3
Figure 3
Upper Level Ontology (ULO). The ULO was developed on the basis of terms imported from other ontologies. The three application ontologies have structurally identical ULOs, differing only in the sub-domain specific terms. The figure illustrates the ULO structure of GeXO.
Figure 4
Figure 4
Conceptual model of Q1. The figure displays the different concepts, ontology terms and relationships that together form a graph that was used as a SPARQL query to find matching patterns in GeXKB. The query specifies proteins that A) exhibit positive regulation of CREB transcription factor activity (GO:0032793); B) exhibit positive regulation of sequence-specific DNA binding transcription factor activity (GO:0051091) and are linked to the CREB1 protein through an association (MI:0914); C) are linked to the CREB1 protein through a direct interaction (MI:0407); and D) have function cAMP response element binding protein binding (GO:0008140).
Figure 5
Figure 5
GeXKB ontologies. The illustration shows the layout of the nested GeXKB ontologies (GeXO, ReXO and ReTO).The blue nodes represent the upper level ontology (ULO), the common root of the three ontologies. The black and red edges depict ‘is_a’ and ‘part_of’ relations, respectively. The three ontologies cover an increasingly wide domain. Each GO sub-domain term (e.g. GO:0010467; denoting ‘gene expression’) and its descendants are linked to the ULO as a subclass of ‘Biological Process’ represented by the ‘dotted edges’.
Figure 6
Figure 6
Result evaluation. The flowchart illustrates the evaluation of the results returned for the use cases I through IV. The proteins retrieved for use cases I, II and III were first classified based on their presence in the CCK2R map, constituting two groups a and b. The proteins under group b were further evaluated based on evidence of gastrin induced regulation constituting sub-group b1. Proteins in b1 were prioritized based on literature evidence implicating them to respond to stimuli other than gastrin (b1i ), and proteins not reported to be responsive to other stimuli (b1j). Proteins qualifying both as b1 and b1i were considered to be the most promising new putative network members. Similarly, the target genes returned for use case IV were evaluated for their expression in the AR42J cell system and whether these target genes were gastrin responsive. Genes that satisfied both criteria were prioritized as putative network members.

Similar articles

See all similar articles

Cited by 3 articles

References

    1. Weake VM, Workman JL. Inducible gene expression: diverse regulatory mechanisms. Nat Rev Genet. 2010;11:426–437. doi: 10.1038/nrg2781. - DOI - PubMed
    1. Perissi V, Jepsen K, Glass CK, Rosenfeld MG. Deconstructing repression: evolving models of co-repressor action. Nat Rev Genet. 2010;11:109–123. doi: 10.1038/nrg2736. - DOI - PubMed
    1. Thomas MC, Chiang CM. The general transcription machinery and general cofactors. Crit Rev Biochem Mol Biol. 2006;41:105–178. doi: 10.1080/10409230600648736. - DOI - PubMed
    1. Mitchell PJ, Tjian R. Transcriptional regulation in mammalian cells by sequence-specific DNA binding proteins. Science. 1989;245:371–378. doi: 10.1126/science.2667136. - DOI - PubMed
    1. Davidson SB, Overton C, Buneman P. Challenges in integrating biological data sources. J Comput Biol. 1995;2:557–572. doi: 10.1089/cmb.1995.2.557. - DOI - PubMed

Publication types

LinkOut - more resources

Feedback