Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Sep 3;10:411.
doi: 10.1186/1471-2164-10-411.

GEM-TREND: A Web Tool for Gene Expression Data Mining Toward Relevant Network Discovery

Affiliations
Free PMC article

GEM-TREND: A Web Tool for Gene Expression Data Mining Toward Relevant Network Discovery

Chunlai Feng et al. BMC Genomics. .
Free PMC article

Abstract

Background: DNA microarray technology provides us with a first step toward the goal of uncovering gene functions on a genomic scale. In recent years, vast amounts of gene expression data have been collected, much of which are available in public databases, such as the Gene Expression Omnibus (GEO). To date, most researchers have been manually retrieving data from databases through web browsers using accession numbers (IDs) or keywords, but gene-expression patterns are not considered when retrieving such data. The Connectivity Map was recently introduced to compare gene expression data by introducing gene-expression signatures (represented by a set of genes with up- or down-regulated labels according to their biological states) and is available as a web tool for detecting similar gene-expression signatures from a limited data set (approximately 7,000 expression profiles representing 1,309 compounds). In order to support researchers to utilize the public gene expression data more effectively, we developed a web tool for finding similar gene expression data and generating its co-expression networks from a publicly available database.

Results: GEM-TREND, a web tool for searching gene expression data, allows users to search data from GEO using gene-expression signatures or gene expression ratio data as a query and retrieve gene expression data by comparing gene-expression pattern between the query and GEO gene expression data. The comparison methods are based on the nonparametric, rank-based pattern matching approach of Lamb et al. (Science 2006) with the additional calculation of statistical significance. The web tool was tested using gene expression ratio data randomly extracted from the GEO and with in-house microarray data, respectively. The results validated the ability of GEM-TREND to retrieve gene expression entries biologically related to a query from GEO. For further analysis, a network visualization interface is also provided, whereby genes and gene annotations are dynamically linked to external data repositories.

Conclusion: GEM-TREND was developed to retrieve gene expression data by comparing query gene-expression pattern with those of GEO gene expression data. It could be a very useful resource for finding similar gene expression profiles and constructing its gene co-expression networks from a publicly available database. GEM-TREND was designed to be user-friendly and is expected to support knowledge discovery. GEM-TREND is freely available at http://cgs.pharm.kyoto-u.ac.jp/services/network.

Figures

Figure 1
Figure 1
The procedure of reference gene expression profiles construction and similarity score calculation. (1) Gene expression data annotated as treatment instances (i.e. treatment versus control) were extracted from GEO. (2) For each sample, genes were ranked in descending order according to the log ratio of the treatment to control. (3) Varying gene identifiers (gene names/IDs) were converted to UniGene IDs according to the associated platform annotation file. (4) Constructed rank vector of up- and down-regulated genes that matched between the query and reference, respectively, and sort the components in ascending order. (5) Calculated similarity score.
Figure 2
Figure 2
The method for P-value calculation. (1) Calculate the numbers of up- and down-regulated genes that overlap between Q and a reference profile R; let the numbers be u' (≤ u) and v' (≤ v), respectively. (2) Select u' and v' genes sequentially and randomly from the n genes of R without replacement, and construct a random signature; (3) Calculate the similarity score between R and the random signature; (4) Generated a total of 10,000 random scores by repeating steps 2 and 3. (5) The p-value associated with the similarity score (query score) between query Q and reference R is the proportion of random scores that are no less than the observed similarity score (query score).
Figure 3
Figure 3
Screenshot of GEM-TREND. a) Query input area. The gene-expression signature, gene expression ratio data and text are accepted. Network IDs can be used to retrieve previous networks. b) Results area. The search results of GEO series ID (GSE ID), GEO platform ID (GPL ID), series title, similarity score, and p-value are displayed. One record corresponds to one GEO series and links to GEO by GSE ID and GPL ID. The previous results can be retrieved by JOB IDs. c) Network visualization (Gene Cluster tab): c-1) Network graphical display area. Genes (nodes) in red background are genes from query, while the genes in the yellow background are those that are user-selected. The number shown in the top-right of the genes describes the number of hidden linkages. These linkages can be expanded or hidden by a right click on the gene of interest to choose from the pop-up menu. Genes link to the UniGene database by double clicking. c-2) Gene cluster area, whereupon gene clusters are shown. The number following the cluster describes the number of member genes in the cluster. Genes link to the UniGene database by clicking the UniGene icon. c-3) Gene search window. Matched genes will be highlighted in the gene cluster area. d) Network visualization (GO tab): d-1) Network graphical display area. Genes in the orange background are those associated with the common GO term. d-2) Gene annotation. The top three significant shared GO terms of genes in each ontology are shown for each cluster. The number following the term describes the number of genes associated with the term. Terms link to GO by clicking the GO icon. d-3) Gene search window. e) Linkout to GEO database. f) Linkout to Unigene database. g) Linkout to Gene Ontology database.
Figure 4
Figure 4
The distribution of the ratio of the query's MeSH terms in the top-ranked entries for 100 randomly selected queries. The groups in different color are the top 50 entries without a P-value filter, top 50 entries with a P-value <= 0.01, top 30 entries without a P-value filter, top 30 entries with a P-value <= 0.01, top 10 entries without a P-value filter, top 10 entries with a P-value <= 0.01, and total entries, respectively. The total entries represent all human species microarray series (corresponding to 444 series).
Figure 5
Figure 5
A co-expression network generated using GSE1827 data. The yellow-colored genes are categorized as GO0003700: transcription factor activity. Interestingly, these are hub genes or the neighbor genes in the sub-network, suggesting that the transcriptional factors might be key molecules for bladder tumors.

Similar articles

See all similar articles

Cited by 10 articles

See all "Cited by" articles

References

    1. Stuart JM, Segal E, Koller D, Kim SK. A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules. Science. 2003;302:249–255. doi: 10.1126/science.1087447. - DOI - PubMed
    1. Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002;30:207–210. doi: 10.1093/nar/30.1.207. - DOI - PMC - PubMed
    1. Parkinson H, Kapushesky M, Shojatalab M, Abeygunawardena N, Coulson R, Farne A, Holloway E, Kolesnykov N, Lilja P, Lukk M, Mani R, Rayner T, Sharma A, William E, Sarkans U, Brazma A. ArrayExpress--a public database of microarray experiments and gene expression profiles. Nucleic Acids Res. 2007:D747–D750. doi: 10.1093/nar/gkl995. - DOI - PMC - PubMed
    1. Lee HK, Hsu AK, Sajdak J, Qin J, Pavlidis P. Coexpression analysis of human genes across many microarray data sets. Genome Res. 2004;14:1085–1094. doi: 10.1101/gr.1910904. - DOI - PMC - PubMed
    1. Choi JK, Yu U, Kim S, Yoo OJ. Combining multiple microarray studies and modeling interstudy variation. Bioinformatics. 2003;19:I84–I90. doi: 10.1093/bioinformatics/btg1010. - DOI - PubMed

Publication types

LinkOut - more resources

Feedback