Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;14 Suppl 5(Suppl 5):S14.
doi: 10.1186/1471-2105-14-S5-S14. Epub 2013 Apr 10.

CLASS: Constrained Transcript Assembly of RNA-seq Reads

Affiliations
Free PMC article

CLASS: Constrained Transcript Assembly of RNA-seq Reads

Li Song et al. BMC Bioinformatics. .
Free PMC article

Abstract

Background: RNA-seq has revolutionized our ability to survey the cellular transcriptome in great detail. However, while several approaches have been developed, the problem of assembling the short reads into full-length transcripts remains challenging.

Results: We developed a novel algorithm and software tool, CLASS (Constraint-based Local Assembly and Selection of Splice variants), for accurately assembling splice variants using local read coverage patterns of RNA-seq reads, contiguity constraints from read pairs and spliced reads, and optionally information about gene structure extracted from cDNA sequence databases. The algorithmic underpinnings of CLASS are: i) a linear program to infer exons, ii) a compact splice graph representation of a gene and its splice variants, and iii) a transcript selection scheme that takes into account contiguity constraints and, where available, knowledge about gene structure.

Conclusion: In comparisons against leading transcript assembly programs, CLASS is more accurate on both simulated and real reads and produces results that are easier to interpret when applied to large scale real data, and therefore is a promising analysis tool for next generation sequencing data.

Availability: CLASS is available from http://sourceforge.net/projects/splicebox.

Figures

Figure 1
Figure 1
Region, intervals, exons and subexons.
Figure 2
Figure 2
Constraint graph for four read pairs c1, c2, c3, c4 and three predicted transcripts t1, t2, t3.
Figure 3
Figure 3
Examples of CLASS predictions outperforming other programs' on the simulated single-end read data. (a) Cufflinks fails to predict a transcript at the DDX12P gene locus, whereas CLASS predicts the full transcript. (b) CLASS finds more of the splice forms, including alternative 5' and 3' terminal exons, for the C1S gene. (c) Cufflinks produces spurious isoforms, including a short single-exon transcript and a 5 bp variation on exon 22, for the DDX11 gene. The reference ENSEMBL transcripts sampled by the reads are shown in the top panels.
Figure 4
Figure 4
Performance curves of four programs on simulated reads: paired-end (top) and single end (bottom).
Figure 5
Figure 5
Performance curves of four programs on the adrenal data set, on multi-exon transcripts only.

Similar articles

See all similar articles

Cited by 11 articles

See all "Cited by" articles

References

    1. Wang E, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C. et al. Alternative isoform regulation in human tissue transcriptomes. Nature. 2008;456(7221):470–476. doi: 10.1038/nature07509. - DOI - PMC - PubMed
    1. Pan Q, Shai O, Lee L, Frey B, Blencowe B. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet. 2008;40(12):1413–1415. doi: 10.1038/ng.259. - DOI - PubMed
    1. Graveley B. Alternative splicing: Increasing diversity in the proteomic world. Trends in Genet. 2001;17(2):100–107. doi: 10.1016/S0168-9525(00)02176-4. - DOI - PubMed
    1. Martin J, Wang Z. Next-generation transcriptome assembly. Nature Rev Genet. 2011;12:671–682. doi: 10.1038/nrg3068. - DOI - PubMed
    1. Trapnell C, Williams B, Pertea G, Mortazavi A, Kwan G, van Baren M. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010;28(5):511–515. doi: 10.1038/nbt.1621. - DOI - PMC - PubMed

Publication types

LinkOut - more resources

Feedback