Identification of Differentially Expressed Genes in RNA-seq Data of Arabidopsis thaliana: A Compound Distribution Approach

J Comput Biol. 2016 Apr;23(4):239-47. doi: 10.1089/cmb.2015.0205. Epub 2016 Mar 7.

Abstract

Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product, which may be proteins. A gene is declared differentially expressed if an observed difference or change in read counts or expression levels between two experimental conditions is statistically significant. To identify differentially expressed genes between two conditions, it is important to find statistical distributional property of the data to approximate the nature of differential genes. In the present study, the focus is mainly to investigate the differential gene expression analysis for sequence data based on compound distribution model. This approach was applied in RNA-seq count data of Arabidopsis thaliana and it has been found that compound Poisson distribution is more appropriate to capture the variability as compared with Poisson distribution. Thus, fitting of appropriate distribution to gene expression data provides statistically sound cutoff values for identifying differentially expressed genes.

Keywords: compound distribution; differentially expressed genes; negative binomial.

MeSH terms

  • Algorithms
  • Arabidopsis / genetics*
  • Gene Expression Regulation, Plant*
  • Genes, Plant*
  • Models, Genetic*
  • RNA, Messenger / genetics

Substances

  • RNA, Messenger