Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jul 1;14(7):R70.
doi: 10.1186/gb-2013-14-7-r70.

Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene

Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene

Mar Gonzàlez-Porta et al. Genome Biol. .

Abstract

Background: RNA sequencing has opened new avenues for the study of transcriptome composition. Significant evidence has accumulated showing that the human transcriptome contains in excess of a hundred thousand different transcripts. However, it is still not clear to what extent this diversity prevails when considering the relative abundances of different transcripts from the same gene.

Results: Here we show that, in a given condition, most protein coding genes have one major transcript expressed at significantly higher level than others, that in human tissues the major transcripts contribute almost 85 percent to the total mRNA from protein coding loci, and that often the same major transcript is expressed in many tissues. We detect a high degree of overlap between the set of major transcripts and a recently published set of alternatively spliced transcripts that are predicted to be translated utilizing proteomic data. Thus, we hypothesize that although some minor transcripts may play a functional role, the major ones are likely to be the main contributors to the proteome. However, we still detect a non-negligible fraction of protein coding genes for which the major transcript does not code a protein.

Conclusions: Overall, our findings suggest that the transcriptome from protein coding loci is dominated by one transcript per gene and that not all the transcripts that contribute to transcriptome diversity are equally likely to contribute to protein diversity. This observation can help to prioritize candidate targets in proteomics research and to predict the functional impact of the detected changes in variation studies.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Most protein coding genes express one predominant transcript. (a) Relative abundance of the subset of transcripts in each position of the ranking for the primary tissues dataset. For each gene, transcripts were ranked based on their relative abundances. There is generally one predominant transcript over the rest. (b) Percentage of the studied mRNA pool explained by each category of transcripts for the BM dataset. The mean percentage for all samples is represented here. Major transcripts represent approximately 85% of the studied mRNA population and were further classified into two-fold and five-fold dominant. (c) Expression distribution for major and minor transcripts in the tissue dataset. We detect a total of 31,902 transcripts expressed above 1 FPKM in at least one tissue and 26,641 different major transcripts.
Figure 2
Figure 2
Example of non-canonical major transcript common to all the 16 tissues analysed: AES (amino-terminal enhancer of split, ENSG00000104964). Read coverage for the gene (a) and screenshot from the Zmap manual annotation interface (b). UTR exons and splice variants with no annotated CDS are shown in red, coding exons are shown in green and the CDS portion of models annotated as NMD are shown in purple. Clusters containing >8,000 CAGE tags defining transcription start suites are shown as small blue boxes, CpG islands are shown as yellow boxes broken by horizontal red bars representing TSS predictions from EPONINE [59]. The short horizontal green bars represent polyadenylation sites identified by polyAseq [60].
Figure 3
Figure 3
Expression patterns for major transcripts. (a) Percentage of genes with recurrent and non-recurrent major transcripts. Changes in the identity of major transcripts across samples were quantified with switch events. (b) Concept of switch event. A gene is considered to be involved in a switch event if we detect two different dominant major transcripts in two different samples. If the dominant transcripts involved in the switch are expressed above 5 FPKM, while the minor ones are expressed below 1 FPKM, we define the event as a strong switch.
Figure 4
Figure 4
Example of a switch event: MBP (myelin basic protein, ENSG00000197971). Read coverage for the gene in brain and kidney. Further tissues, as well as transcript annotation information, can be visualised in Additional File 1 - Figure S12.
Figure 5
Figure 5
Major non-coding transcripts in protein coding genes. (a) Proportion of the mRNA studied represented by different categories of transcripts. Average proportions were calculated including all the samples from each dataset. Major non-coding transcripts are more abundant in nucleus, where the proportion of major coding ones also becomes reduced. (b) Transcript biotype categories for the major non-coding transcripts. Average proportions were calculated including all the samples from each dataset. Processed transcripts are more abundant in the cytosol, while retained introns represent the major fraction in the nucleus. Other minor categories that represented <1% of the transcripts were also identified, but are not visible in the plots.

Similar articles

Cited by

References

    1. Flicek P, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, Fitzgerald S, Gil L, Gordon L, Hendrix M, Hourlier T, Johnson N, Kahari AK, Keefe D, Keenan S, Kinsella R, Komorowska M, Koscielny G, Kulesha E, Larsson P, Longden I, McLaren W, Muffato M, Overduin B, Pignatelli M, Pritchard B, Riat HS. et al.Ensembl 2012. Nucleic Acids Res. 2011;14:D84–D90. - PMC - PubMed
    1. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;14:57–63. doi: 10.1038/nrg2484. - DOI - PMC - PubMed
    1. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;14:621–628. doi: 10.1038/nmeth.1226. - DOI - PubMed
    1. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, Baren MJ van, Salzberg SL, Wold BJ, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010;14:511–515. doi: 10.1038/nbt.1621. - DOI - PMC - PubMed
    1. Turro E, Su SY, Gonçalves Â, Coin LJM, Richardson S, Lewin A. Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads. Genome Biol. 2011;14:R13. doi: 10.1186/gb-2011-12-2-r13. - DOI - PMC - PubMed

Publication types