Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Feb 15;16:45.
doi: 10.1186/s12859-015-0453-z.

MeSH ORA Framework: R/Bioconductor Packages to Support MeSH Over-Representation Analysis

Affiliations
Free PMC article

MeSH ORA Framework: R/Bioconductor Packages to Support MeSH Over-Representation Analysis

Koki Tsuyuzaki et al. BMC Bioinformatics. .
Free PMC article

Abstract

Background: In genome-wide studies, over-representation analysis (ORA) against a set of genes is an essential step for biological interpretation. Many gene annotation resources and software platforms for ORA have been proposed. Recently, Medical Subject Headings (MeSH) terms, which are annotations of PubMed documents, have been used for ORA. MeSH enables the extraction of broader meaning from the gene lists and is expected to become an exhaustive annotation resource for ORA. However, the existing MeSH ORA software platforms are still not sufficient for several reasons.

Results: In this work, we developed an original MeSH ORA framework composed of six types of R packages, including MeSH.db, MeSH.AOR.db, MeSH.PCR.db, the org.MeSH.XXX.db-type packages, MeSHDbi, and meshr.

Conclusions: Using our framework, users can easily conduct MeSH ORA. By utilizing the enriched MeSH terms, related PubMed documents can be retrieved and saved on local machines within this framework.

Figures

Figure 1
Figure 1
MeSH ORA framework and dependency of the packages. Our MeSH framework consists of six types of R packages: the MeSH.db, MeSH.AOR.db-type packages, MeSH.PCR.db, org.MeSH.XXX.db, MeSHDbi, and meshr. MeSHDbi defines the class used in the MeSH.db, MeSH.AOR.db, MeSH.PCR.db, and org.MeSH.XXX.db-type packages and then unifies the behavior of these packages. meshr imports the data from MeSH.db and org.MeSH.XXX.db and performs ORA.
Figure 2
Figure 2
Genome-wide tools. We focused on organisms that are used in genome-wide tools such as the UCSC Genome Browser, GeneChip, Gene Ontology, Gendoo, and Bioconductor.
Figure 3
Figure 3
120 orgnisms. To construct org.MeSH.XXX.db-type packages, we focused on organisms satisfying three requirements: 1) use in at least one of five genome-wide tools; 2) possession of an Entrez Gene ID, rather than an Ensembl Gene ID; and 3) published data abailable in at least 100 papers. Finally, 120 organisms were selected for the framework.
Figure 4
Figure 4
Data Sources. The data source of the 120 organisms in our framework. Three data sources - RBBH, gene2pubmed, Gendoo - were choosen.
Figure 5
Figure 5
Data retrieval schema for the construction of MeSH.db and org.MeSH.XXX.db . MeSH.db uses the data for MeSH terms from NLM. org.MeSH.XXX.db uses the data from Gendoo, gene2pubmed and RBBH.
Figure 6
Figure 6
Three types of correspondence between Entrez Gene ID and MeSH ID. org.MeSH.XXX.db-type packages provide three types of correspondence between Entrez Gene ID and MeSH ID: 1) Gendoo data, in which the correspondence is assigned by a text-mining technique; 2) gene2pubmed data, in which the correspondence is assigned by manual curation of NCBI; and 3) RBBH data, in which the correspondence is assigned by using reciprocal BLASTP best hits among all possible combinations of minor organisms and major organisms.
Figure 7
Figure 7
RBBH. A comparison of 100 minor organisms with 15 major organisms by RBBH was conducted.
Figure 8
Figure 8
SELECT function in R. AnnotationDbi package declares that data are designed to be retrieved by the SELECT function. Its grammar is similar to that of SQL’s SELECT method and very simple.
Figure 9
Figure 9
Summary of MeSH Assignment to Entrez Gene ID. Detailed coverage of MeSH against all genes of 115 organisms without Synechocystis, Miyakogusa, Mesorhizobium, Anabaena, and Silkworm. We hypothesized that genes assigned to gene name is well-annotated genes and genes assigned to only locus tag is not-annotated genes. (A) The genes assigned to gene name (red), the genes assigned to only locus tag (blue), and the genes newly annotated by MeSH in this work (green). (B) The genes assigned to only gene name (red), the genes annotated by gene name and MeSH (green), and the genes newly annotated by MeSH in this work (blue).
Figure 10
Figure 10
Simple R code for MeSH ORA. MeSH ORA for Homo sapiens can be performed by a simple R code.
Figure 11
Figure 11
ORA with calorie-restricted rats. Enrichment analysis using Gendoo, gene2pubmed and GO. A (Anatomy), B (Organisms), C (Diseases), D (Chemicals and Drugs), G (Phenomena and Processes) of Gendoo, A, C, D, E (Analytical Diagnostic and Therapeutic Techniques and Equipment), I (Anthropology, Education, Sociology and Social Phenomena) of gene2pubmed, and BP (Biological Process), MF (Molecular Function), CC (Cellular Component) of GO. Only enriched terms are drawn by the tagcloud package (p < 0.05). The minus logarithm of p-values is used as weight and emphasized as the font size.

Similar articles

See all similar articles

Cited by 14 articles

See all "Cited by" articles

References

    1. Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP. Summaries of affymetrix genechip probe level data. Nucleic Acid Res. 2003;31:e15. doi: 10.1093/nar/gng015. - DOI - PMC - PubMed
    1. Mardis ER. The impact of next-generation sequencing technology on genetics. Trends Genet. 2008;24:134–41. doi: 10.1016/j.tig.2007.12.007. - DOI - PubMed
    1. Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. Rna-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008;18(9):1509–17. doi: 10.1101/gr.079558.108. - DOI - PMC - PubMed
    1. Wang L, Feng Z, Wang X, Wang X, Zhang X. Degseq: an r package for identifying differentially expressed genes from rna-seq data. BMC Bioinformatics. 2009;26(1):136–8. doi: 10.1093/bioinformatics/btp612. - DOI - PubMed
    1. Tarazona S, García-Alcalde F, Dopazo J, Ferrer A, Conesa A. Differential expression in rna-seq: a matter of depth. Genome Res. 2003;21(12):2213–23. doi: 10.1101/gr.124321.111. - DOI - PMC - PubMed

LinkOut - more resources

Feedback