Motivation: RNA-seq has emerged as a powerful technology for the detection of differential gene expression in the transcriptome. The commonly used statistical methods for RNA-seq differential expression analysis were designed for individual genes, which may detect too many irrelevant significantly genes or too few genes to interpret the phenotypic changes. Recently network module-based methods have been proposed as a powerful approach to analyze and interpret expression data in microarray and shotgun proteomics. But the module-based statistical model has not been adequately addressed for RNA-seq data.
Result: we proposed a network module-based generalized linear model for differential expression analysis of the count-based sequencing data from RNA-seq. The simulation studies demonstrated the effectiveness of the proposed model and the improvement of the statistical power for identifying the differentially expressed modules in comparison to the existing methods. We also applied our method to tissue datasets and identified 207 significantly differentially expressed kidney-active or liver-active modules. For liver cancer datasets, significantly differentially expressed modules, including Wnt signaling pathway and VEGF pathway, were found to be tightly associated with liver cancer. Besides, in comparison with the single gene-level analysis, our method could identify more significantly biological modules, which related to the liver cancer.
Availability and implementation: The R package SeqMADE is available at https://cran.r-project.org/web/packages/SeqMADE/ .
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: email@example.com