Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct 1;10(1):16245.
doi: 10.1038/s41598-020-73081-5.

Top-ranked expressed gene transcripts of human protein-coding genes investigated with GTEx dataset

Affiliations

Top-ranked expressed gene transcripts of human protein-coding genes investigated with GTEx dataset

Kuo-Feng Tung et al. Sci Rep. .

Abstract

With considerable accumulation of RNA-Seq transcriptome data, we have extended our understanding about protein-coding gene transcript compositions. However, alternatively compounded patterns of human protein-coding gene transcripts would complicate gene expression data processing and interpretation. It is essential to exhaustively interrogate complex mRNA isoforms of protein-coding genes with an unified data resource. In order to investigate representative mRNA transcript isoforms to be utilized as transcriptome analysis references, we utilized GTEx data to establish a top-ranked transcript isoform expression data resource for human protein-coding genes. Distinctive tissue specific expression profiles and modulations could be observed for individual top-ranked transcripts of protein-coding genes. Protein-coding transcripts or genes do occupy much higher expression fraction in transcriptome data. In addition, top-ranked transcripts are the dominantly expressed ones in various normal tissues. Intriguingly, some of the top-ranked transcripts are noncoding splicing isoforms, which imply diverse gene regulation mechanisms. Comprehensive investigation on the tissue expression patterns of top-ranked transcript isoforms is crucial. Thus, we established a web tool to examine top-ranked transcript isoforms in various human normal tissue types, which provides concise transcript information and easy-to-use graphical user interfaces. Investigation of top-ranked transcript isoforms would contribute understanding on the functional significance of distinctive alternatively spliced transcript isoforms.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
The numbers and average expression levels of protein-coding genes with single transcript gene to 10 transcript isoform genes. There are more single transcript protein-coding genes and they have considerable higher expression levels. Notably, the gene-level expression of protein-coding genes with two and three transcript isoforms was lower than that of other classes.
Figure 2
Figure 2
Expression percentages of top-ranked transcript isoforms in human protein-coding genes with 1 to 10 transcripts per gene. We calculated the expression distribution percentages of transcript isoforms in human protein-coding genes. A rank1 transcript isoform was the dominantly expressed transcript isoform, representing over 50% of the expression level in 1 to 10 transcripts per gene. Rank1 to rank5 accounted for over 95% of the gene expression.
Figure 3
Figure 3
Tissue expression distribution of top-ranked transcript isoforms of human protein-coding genes. Rank1 to rank5 transcript isoform expression percentage in human tissues. We calculated the expression percentage of different ranked transcripts in each gene of various tissue types, and the average expression percentages of these top-ranked transcript isoforms were tabulated by tissue types.
Figure 4
Figure 4
Tissue expression percentage distribution of rank1, rank2, and rank3 transcript isoforms of the human GLRX2 gene. The GLRX2 gene is a protein-coding gene for glutaredoxin 2, which has three transcript isoforms. Rank1 is the dominant transcript type in almost all tissues except testis. Rank2 is the top-ranked transcript in testis tissue. (A) The expression percentages of rank1 to rank3 transcript isoforms are plotted. (B) The expression TPM values of rank1 to rank3 transcript isoforms are plotted.
Figure 5
Figure 5
Web user interfaces for top-ranked transcript isoform expression in tissues. The Tie1 protein tyrosine kinase gene has 10 transcript isoforms. Two protein-coding and 8 processed transcripts are illustrated based on their expression ranking. We provided additional information regarding the transcript length, CDS length, TPM value, gene expression percentage, and coefficient of variation value for each transcript isoform. The MANE select transcript is marked by the star symbol. In each tissue, the ranking of each transcript isoform is displayed with color-coded symbols for easy investigation. ENST00000372476 is the rank1 transcript in all tissues.

Similar articles

Cited by

References

    1. Collins FS. Genome research: The next generation. Cold Spring Harb. Symp. Quant. Biol. 2003;68:49–54. doi: 10.1101/sqb.2003.68.49. - DOI - PubMed
    1. Davey JW, et al. Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat. Rev. Genet. 2011;12:499–510. doi: 10.1038/nrg3012. - DOI - PubMed
    1. Collins FS, Morgan M, Patrinos A. The Human Genome Project: Lessons from large-scale biology. Science. 2003;300:286–290. doi: 10.1126/science.1084564. - DOI - PubMed
    1. Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell176, 535–548 e524, 10.1016/j.cell.2018.12.015 (2019). - PubMed
    1. Poplin R, et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 2018;36:983–987. doi: 10.1038/nbt.4235. - DOI - PubMed

Publication types