Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jan 4;45(D1):D389-D396.
doi: 10.1093/nar/gkw868. Epub 2016 Sep 26.

COEXPEDIA: Exploring Biomedical Hypotheses via Co-Expressions Associated With Medical Subject Headings (MeSH)

Affiliations
Free PMC article

COEXPEDIA: Exploring Biomedical Hypotheses via Co-Expressions Associated With Medical Subject Headings (MeSH)

Sunmo Yang et al. Nucleic Acids Res. .
Free PMC article

Abstract

The use of high-throughput array and sequencing technologies has produced unprecedented amounts of gene expression data in central public depositories, including the Gene Expression Omnibus (GEO). The immense amount of expression data in GEO provides both vast research opportunities and data analysis challenges. Co-expression analysis of high-dimensional expression data has proven effective for the study of gene functions, and several co-expression databases have been developed. Here, we present a new co-expression database, COEXPEDIA (www.coexpedia.org), which is distinctive from other co-expression databases in three aspects: (i) it contains only co-functional co-expressions that passed a rigorous statistical assessment for functional association, (ii) the co-expressions were inferred from individual studies, each of which was designed to investigate gene functions with respect to a particular biomedical context such as a disease and (iii) the co-expressions are associated with medical subject headings (MeSH) that provide biomedical information for anatomical, disease, and chemical relevance. COEXPEDIA currently contains approximately eight million co-expressions inferred from 384 and 248 GEO series for humans and mice, respectively. We describe how these MeSH-associated co-expressions enable the identification of diseases and drugs previously unknown to be related to a gene or a gene group of interest.

Figures

Figure 1.
Figure 1.
Assessment of co-functional co-expression. (A) Co-functional co-expressions show a strong positive correlation between the strength of co-expression (measured by PCC) and likelihood of functional association (measured by LLS). For example, co-expressions across healthy and malignant human B-lymphocytes with or without B-cell receptor stimulation (GSE39411) show a strong correlation with the likelihood of co-functional associations. LLS scores are assigned for all co-expressed gene pairs based on sigmoid regression fitting of data points between PCC and LLS, and only those with at least an LLS of 1 (i.e. ∼2.7 more likely to be co-functional than random chance) are included in the COEXPEDIA database. (B) In many cases, the co-expressions inferred from a GSE do not implicate a co-functional relationship. For example, the co-expressions across time-course samples during the activation of human regulatory and effector T cells (GSE11292) show a poor correlation between PCC and LLS, indicating that these co-expressions are unlikely to be co-functional.
Figure 2.
Figure 2.
Overview of the COEXPEDIA data structure. (A) Co-expressions are inferred from individual GSEs, each of which is associated with at least one PubMed article. Each article has multiple MeSH terms indexed. Consequently, each co-expression can be associated with at least one MeSH term. (B) A real example of the data structure by CTLA4 co-expressions. The enrichment score of each MeSH term for the given co-expressions can be calculated by the summation of the sum of edge scores (i.e. LLS) derived from multiple GSEs (e.g. enrichment score for T-lymphocytes was calculated by summation of the sum of edge scores from GSE12195, GSE20711 and GSE14924). The given scoring scheme identified isotretinoin, T-lymphocytes, cyclophosphamide, epirubicin as top four MeSH terms enriched for CTLA4 co-expressions.
Figure 3.
Figure 3.
Screenshots of the BRCA1 query results. (A) A list of the co-expression partners of BRCA1 ranked by the sum of LLS scores. (B) A list of the enriched GOBP terms among BRCA1 co-expression partners ranked by the P-value from Fisher's exact test. (C) A list of the enriched DO terms among BRCA1 co-expression partners ranked by the P-value from Fisher's exact test. (D) A visualization of the gene network of BRCA1 and its co-expression partners. (E) A list of the enriched MeSH terms among the BRCA1 co-expression network. (F) A visualization of the gene network of BRCA1 and its co-expression partners only for the selected MeSH term ‘Heart’. (G) Information on the studies (GSEs and PubMed articles) that support the selected MeSH term ‘Heart’ for the co-expression network.
Figure 4.
Figure 4.
Assessment of gene-to-MeSH predictions. Literature-based gene-to-MeSH links compiled from the Gene2MeSH database are used as gold-standard data to evaluate predictions based on co-expressions in COEXPEDIA. The cumulative number of gold-standard gene-to-MeSH links is counted in the given top N ranked MeSH predictions for the query gene while excluding (A) or including (B) the prevalent neoplasm MeSH terms. The predictions by COEXPEDIA are compared with those by 1000 sets of randomly sampled gene-to-MeSH links, which are summarized as distributions for the same ranks.

Similar articles

See all similar articles

Cited by 14 articles

See all "Cited by" articles

References

    1. Barrett T., Wilhite S.E., Ledoux P., Evangelista C., Kim I.F., Tomashevsky M., Marshall K.A., Phillippy K.H., Sherman P.M., Holko M., et al. NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res. 2013;41:D991–D995. - PMC - PubMed
    1. Kolesnikov N., Hastings E., Keays M., Melnichuk O., Tang Y.A., Williams E., Dylag M., Kurbatova N., Brandizi M., Burdett T., et al. ArrayExpress update–simplifying data submissions. Nucleic Acids Res. 2015;43:D1113–D1116. - PMC - PubMed
    1. Kodama Y., Shumway M., Leinonen R., International Nucleotide Sequence Database, C. The sequence read archive: explosive growth of sequencing data. Nucleic Acids Res. 2012;40:D54–D56. - PMC - PubMed
    1. Rung J., Brazma A. Reuse of public genome-wide gene expression data. Nature Rev. Genet. 2013;14:89–99. - PubMed
    1. Marcotte E.M., Pellegrini M., Thompson M.J., Yeates T.O., Eisenberg D. A combined algorithm for genome-wide prediction of protein function. Nature. 1999;402:83–86. - PubMed

Publication types

Feedback