Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Feb 28;45(4):1657-1672.
doi: 10.1093/nar/gkw1256.

Large-scale Mapping of Mammalian Transcriptomes Identifies Conserved Genes Associated With Different Cell States

Affiliations
Free PMC article

Large-scale Mapping of Mammalian Transcriptomes Identifies Conserved Genes Associated With Different Cell States

Yang Yang et al. Nucleic Acids Res. .
Free PMC article

Abstract

Distinguishing cell states based only on gene expression data remains a challenging task. This is true even for analyses within a species. In cross-species comparisons, the results obtained by different groups have varied widely. Here, we integrate RNA-seq data from more than 40 cell and tissue types of four mammalian species to identify sets of associated genes as indicators for specific cell states in each species. We employ a statistical method, TROM, to identify both protein-coding and non-coding indicators. Next, we map the cell states within each species and also between species using these indicator genes. We recapitulate known phenotypic similarity between related cell and tissue types and reveal molecular basis for their similarity. We also report novel associations between several tissues and cell types with functional support. Moreover, our identified conserved associated genes are found to be a good resource for studying cell differentiation and reprogramming. Lastly, long non-coding RNAs can serve well as associated genes to indicate cell states. We further infer the biological functions of those non-coding associated genes based on their co-expressed protein-coding genes. This study demonstrates that combining statistical modeling with public RNA-seq data can be powerful for improving our understanding of cell identity control.

Figures

Figure 1.
Figure 1.
Overview of the RNA-seq data sets and Transcriptome Overlap Measure (TROM) approach. (A) Numbers of RNA-seq data sets and sequencing reads for each mammalian species, including human, chimpanzee, bonobo and mouse. (B) The TROM approach. First, associated genes of each cell state are selected using thresholds on FPKMs and Z-scores (normalized FPKMs across cell states). In the within-species TROM (left panel), the significance of the number of the common associated genes of two cell states is established via an overlap test. In the between-species TROM (right panel), a similar overlap test is carried out, except that orthologous genes are used to connect the two species. Two cell states are called ‘mapped’ if the test is significant. (See Materials and Methods for details.)
Figure 2.
Figure 2.
Robust transcriptome mapping patterns. (A) A correspondence map of human cell states by TROM using associated protein-coding genes and lncRNAs under expression cutoff c = 1. (B) A correspondence map of human cell states by TROM using associated protein-coding genes and lncRNAs under expression cutoff c = 0. (C) A correspondence map of human cell states by TROM using associated protein-coding genes under a series of Z-score thresholds. (D) A correspondence map of human cell states by TROM using associated lncRNAs under a series of Z-score thresholds. Columns and rows correspond to biological samples of various cell states. Higher TROM scores (defined as –log10 transformed Bonferroni-corrected P-values from the overlap test) are shown in darker colors.
Figure 3.
Figure 3.
Cell states encoded by associated genes in human. (A) A correspondence map of human cell states by TROM using associated protein-coding genes. Columns and rows correspond to biological samples of various cell states. Higher TROM scores (defined as –log10 transformed Bonferroni-corrected P-values from the overlap test) are shown in darker colors. Axis colors represent cell states, and colored boxes mark the prominent mapping patterns. (B) The number of protein-coding genes (top) and lncRNAs (bottom) associated with different number of human cell states. For associated protein-coding genes, the proportion of housekeeping genes in each group are shown. (C) Enriched gene ontology (GO) (biological processes) terms of 19 human tissues. More significant enrichment scores (defined as –log10 transformed Bonferroni-corrected P-values) are shown in darker colors. (D) Enriched KEGG cellular pathways of 19 human tissues. More significant enrichment scores (defined as –log10 transformed Bonferroni-corrected P-values) are shown in darker colors.
Figure 4.
Figure 4.
Cell state correspondence maps between human and other mammalian species. (A) A correspondence map of various cell states between human and mouse by TROM using associated protein-coding genes. Rows correspond to human cell states, and columns correspond to mouse cell states. (B) A correspondence map of various cell states between human and mouse by TROM using associated TFs. Rows correspond to human cell states, and columns correspond to mouse cell states. (C) A correspondence map of various cell states between human and chimpanzee by TROM using associated lncRNAs. Rows correspond to human cell states, and columns correspond to chimpanzee cell states. In (A–C), higher TROM scores (defined as –log10 transformed Bonferroni-corrected P-values from the overlap test) are shown in darker colors. Axis colors represent cell states, and colored boxes mark the prominent mapping patterns.
Figure 5.
Figure 5.
Inferring functions of conserved associated long non-coding RNAs (lncRNAs). (A) Three examples of conserved associated lncRNAs in embryonic stem cells (ESCs) and iPSCs (LINC01108, ENSG00000226673) (top), kidney (CYP4A22-AS1, ENSG00000225506) (middle) and cerebellum (NTM-IT, ENSG00000238262) (bottom). The expression estimates of the three lncRNAs across seven cell states in human, chimpanzee and bonobo are shown. (B) The 19 largest clusters in the co-expression network of protein-coding genes and lncRNAs. Colors of dots distinguish protein-coding genes (orange) and lncRNAs (blue). (C) Enriched GO terms (biological processes) of the protein-coding genes in the 19 largest clusters. Higher enrichment scores (defined as –log10 transformed Bonferroni-corrected P-values) are shown in darker colors. (D) Radar plots illustrate the extents to which the conserved associated lncRNAs of different cell states are enriched in different clusters. The cell states include cerebellum (top left), heart (top right), kidney (bottom left) and testis (bottom right).

Similar articles

See all similar articles

Cited by 5 articles

References

    1. Furusawa C., Kaneko K. A dynamical-systems view of stem cell biology. Science. 2012; 338:215–217. - PubMed
    1. Ye J., Blelloch R. Regulation of pluripotency by RNA binding proteins. Cell Stem Cell. 2014; 15:271–280. - PMC - PubMed
    1. Young R.A. Control of the embryonic stem cell state. Cell. 2011; 144:940–954. - PMC - PubMed
    1. Spitz F., Furlong E.E. Transcription factors: from enhancer binding to developmental control. Nat. Rev. Genet. 2012; 13:613–626. - PubMed
    1. Chen T., Dent S.Y. Chromatin modifiers and remodellers: regulators of cellular differentiation. Nat. Rev. Genet. 2014; 15:93–106. - PMC - PubMed

Publication types

Feedback