Isoform abundance inference provides a more accurate estimation of gene expression levels in RNA-seq

J Bioinform Comput Biol. 2010 Dec;8 Suppl 1:177-92. doi: 10.1142/s0219720010005178.


Due to its unprecedented high-resolution and detailed information, RNA-seq technology based on next-generation high-throughput sequencing significantly boosts the ability to study transcriptomes. The estimation of genes' transcript abundance levels or gene expression levels has always been an important question in research on the transcriptional regulation and gene functions. On the basis of the concept of Reads Per Kilo-base per Million reads (RPKM), taking the union-intersection genes (UI-based) and summing up inferred isoform abundance (isoform-based) are the two current strategies to estimate gene expression levels, but produce different estimations. In this paper, we made the first attempt to compare the two strategies' performances through a series of simulation studies. Our results showed that the isoform-based method gives not only more accurate estimation but also has less uncertainty than the UI-based strategy. If taking into account the non-uniformity of read distribution, the isoform-based method can further reduce estimation errors. We applied both strategies to real RNA-seq datasets of technical replicates, and found that the isoform-based strategy also displays a better performance. For a more accurate estimation of gene expression levels from RNA-seq data, even if the abundance levels of isoforms are not of interest, it is still better to first infer the isoform abundance and sum them up to get the expression level of a gene as a whole.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology
  • Computer Simulation
  • Databases, Nucleic Acid / statistics & numerical data
  • Databases, Protein / statistics & numerical data
  • Gene Expression Profiling / statistics & numerical data*
  • Humans
  • Models, Statistical
  • Protein Isoforms / genetics*
  • Protein Isoforms / metabolism*
  • Sequence Analysis, RNA / statistics & numerical data*


  • Protein Isoforms