Exploring the shallow end; estimating information content in transcriptomics studies

Front Plant Sci. 2012 Sep 10:3:213. doi: 10.3389/fpls.2012.00213. eCollection 2012.

Abstract

Transcriptomics is a major platform to study organismal biology. The advent of new parallel sequencing technologies has opened up a new avenue of transcriptomics with ever deeper and deeper sequencing to identify and quantify each and every transcript in a sample. However, this may not be the best usage of the parallel sequencing technology for all transcriptomics experiments. I utilized the Shannon Entropy approach to estimate the information contained within a transcriptomics experiment and tested the ability of shallow RNAseq to capture the majority of this information. This analysis showed that it was possible to capture nearly all of the network or genomic information present in a variety of transcriptomics experiments using a subset of the most abundant 5000 transcripts or less within any given sample. Thus, it appears that it should be possible and affordable to conduct large scale factorial analysis with a high degree of replication using parallel sequencing technologies.

Keywords: RNAseq; eQTL; factorial genomics; genetical genomics; information content; microarray; sequencing depth; transcriptomics.