Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2015 Aug;53(8):547-60.
doi: 10.1002/dvg.22869. Epub 2015 Jul 8.

Cross-organism Analysis Using InterMine

Affiliations
Free PMC article
Review

Cross-organism Analysis Using InterMine

Rachel Lyne et al. Genesis. .
Free PMC article

Abstract

InterMine is a data integration warehouse and analysis software system developed for large and complex biological data sets. Designed for integrative analysis, it can be accessed through a user-friendly web interface. For bioinformaticians, extensive web services as well as programming interfaces for most common scripting languages support access to all features. The web interface includes a useful identifier look-up system, and both simple and sophisticated search options. Interactive results tables enable exploration, and data can be filtered, summarized, and browsed. A set of graphical analysis tools provide a rich environment for data exploration including statistical enrichment of sets of genes or other entities. InterMine databases have been developed for the major model organisms, budding yeast, nematode worm, fruit fly, zebrafish, mouse, and rat together with a newly developed human database. Here, we describe how this has facilitated interoperation and development of cross-organism analysis tools and reports. InterMine as a data exploration and analysis tool is also described. All the InterMine-based systems described in this article are resources freely available to the scientific community.

Keywords: comparative analysis; cross-organism analysis; data analysis; data integration; genomics; integrative analysis; proteomics.

Figures

Figure 1
Figure 1. Data exploration through the InterMine web interface
Data exploration through the InterMine web interface, illustrating navigation between data types within a report page, between report pages and between report pages for orthologous genes in different organisms. The workflow begins with a keyword search for ey in FlyMine followed by navigation to the D. melanogaster ey gene report page, where several data types are examined. Navigation to the corresponding protein report page, PAX6_DROME, allows protein domain data to be viewed. Links to report pages for orthologous genes allow data for equivalent genes in human, mouse, rat, zebrafish and yeast to be examined.
Figure 2
Figure 2. A template search from FlyMine
A template search, Expression + Interactions → Genes, from the FlyMine database, showing two constraints (filters), one for a tissue (in this case adult eye) and one for the interacting gene (in this case ey). This template will return any genes expressed in the adult eye that also interact (physically or genetically) with ey.
Figure 3
Figure 3. The InterMine query builder
The query builder allows navigation of the data model (left pane), where “Constrain” buttons allow the configuration of constraints (filters) on the attribute or class and “Show” buttons add an attribute to the results output. The right pane shows a summary of the query as it is built. A query that will return all Gene Ontology annotations for the D. melanogaster ey gene, together with the associated evidence code, is shown.
Figure 4
Figure 4. An InterMine results table
An InterMine results table, generated by running the template search “Gene -> GO terms” with the ey gene in the FlyMine database. A “column summary” for the Gene Ontology evidence code column is shown, allowing filtering of the results table to show only terms annotated through specific evidence codes. Note that some columns have been removed from the original results for illustration purposes.
Figure 5
Figure 5. Data Exploitation through the InterMine web interface
A hypothetical workflow in which a candidate gene list is filtered through several consecutive analysis tools. Step1: a candidate gene list, identified through a screen for lipid and cholesterol markers as part of a study on atherosclerosis, is uploaded to the HumanMine database. Step 2: A search of the database identifies those genes from the candidate list that are already associated with the disease atherosclerosis. A list is made of these genes. Step3: Using the list operation, asymmetric distribution, a new list is created which does not contain the genes identified as already being associated with atherosclerosis. This list is called the non-atherosclerosis set. Step 4: Links to MouseMine and ZebrafishMine directly from HumanMine allow lists of mouse (Step 4a) and zebrafish (Step 4b) genes orthologous to the non-atherosclerosis list to be analysed in the respective databases. Enrichment statistics for various annotations can be viewed, and in particular an enrichment for the Gene Ontology term “Cholesterol transport” is noted. Step 5: The zebrafish genes from the list annotated with the Gene Ontology term “Cholesterol transport” are saved as a list. Step 6: A database search and filtering for homologues of these genes reveals a gene, Cetp, present in Human and Zebrafish but not in mouse.
Figure 6
Figure 6
Gene Ontology enrichment analysis of a list of genes in ZebrafishMine. A Gene Ontology enrichment table showing terms from the Gene Ontology biological process ontology enriched in a set of zebrafish genes. A hypergeometric distribution is used to calculate the p-value, which is shown in the table, after a Holm-Bonferonni test correction has been applied. The number of genes with each annotation are shown. Lists of genes with each Gene Ontology annotation can be created directly from the table.

Similar articles

See all similar articles

Cited by 12 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback