Differential Expression Analysis in RNA-Seq by a Naive Bayes Classifier with Local Normalization

Biomed Res Int. 2015:2015:789516. doi: 10.1155/2015/789516. Epub 2015 Aug 3.

Abstract

To improve the applicability of RNA-seq technology, a large number of RNA-seq data analysis methods and correction algorithms have been developed. Although these new methods and algorithms have steadily improved transcriptome analysis, greater prediction accuracy is needed to better guide experimental designs with computational results. In this study, a new tool for the identification of differentially expressed genes with RNA-seq data, named GExposer, was developed. This tool introduces a local normalization algorithm to reduce the bias of nonrandomly positioned read depth. The naive Bayes classifier is employed to integrate fold change, transcript length, and GC content to identify differentially expressed genes. Results on several independent tests show that GExposer has better performance than other methods. The combination of the local normalization algorithm and naive Bayes classifier with three attributes can achieve better results; both false positive rates and false negative rates are reduced. However, only a small portion of genes is affected by the local normalization and GC content correction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Base Composition
  • Bayes Theorem*
  • Gene Expression*
  • High-Throughput Nucleotide Sequencing
  • Sequence Analysis, RNA / methods*
  • Transcriptome / genetics*