Analysis of RNA-Seq Data Using TopHat and Cufflinks

Methods Mol Biol. 2016;1374:339-61. doi: 10.1007/978-1-4939-3167-5_18.


The recent advances in high throughput RNA sequencing (RNA-Seq) have generated huge amounts of data in a very short span of time for a single sample. These data have required the parallel advancement of computing tools to organize and interpret them meaningfully in terms of biological implications, at the same time using minimum computing resources to reduce computation costs. Here we describe the method of analyzing RNA-seq data using the set of open source software programs of the Tuxedo suite: TopHat and Cufflinks. TopHat is designed to align RNA-seq reads to a reference genome, while Cufflinks assembles these mapped reads into possible transcripts and then generates a final transcriptome assembly. Cufflinks also includes Cuffdiff, which accepts the reads assembled from two or more biological conditions and analyzes their differential expression of genes and transcripts, thus aiding in the investigation of their transcriptional and post transcriptional regulation under different conditions. We also describe the use of an accessory tool called CummeRbund, which processes the output files of Cuffdiff and gives an output of publication quality plots and figures of the user's choice. We demonstrate the effectiveness of the Tuxedo suite by analyzing RNA-Seq datasets of Arabidopsis thaliana root subjected to two different conditions.

Keywords: Bowtie; Cuffcompare; Cuffdiff; Cufflinks; Cuffmerge; CummeRbund; Differential gene expression; RNA-seq; TopHat; Transcriptome assembly.

MeSH terms

  • Computational Biology / methods*
  • Gene Expression Profiling / methods
  • Genomics / methods*
  • Sequence Analysis, RNA / methods*
  • Software*
  • Transcriptome