A multi-sample approach increases the accuracy of transcript assembly

Nat Commun. 2019 Nov 1;10(1):5000. doi: 10.1038/s41467-019-12990-0.

Abstract

Transcript assembly from RNA-seq reads is a critical step in gene expression and subsequent functional analyses. Here we present PsiCLASS, an accurate and efficient transcript assembler based on an approach that simultaneously analyzes multiple RNA-seq samples. PsiCLASS combines mixture statistical models for exonic feature selection across multiple samples with splice graph based dynamic programming algorithms and a weighted voting scheme for transcript selection. PsiCLASS achieves significantly better sensitivity-precision tradeoff, and renders precision up to 2-3 fold higher than the StringTie system and Scallop plus TACO, the two best current approaches. PsiCLASS is efficient and scalable, assembling 667 GEUVADIS samples in 9 h, and has robust accuracy with large numbers of samples.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Computational Biology / methods*
  • Exons / genetics*
  • Gene Expression Profiling / methods*
  • Humans
  • Liver / metabolism
  • RNA / genetics
  • RNA, Messenger / genetics
  • Reproducibility of Results
  • Sequence Analysis, RNA / methods
  • Software*

Substances

  • RNA, Messenger
  • RNA