Powerful differential expression analysis incorporating network topology for next-generation sequencing data

Bioinformatics. 2017 May 15;33(10):1505-1513. doi: 10.1093/bioinformatics/btw833.


Motivation: RNA-seq has become the technology of choice for interrogating the transcriptome. However, most methods for RNA-seq differential expression (DE) analysis do not utilize prior knowledge of biological networks to detect DE genes. With the increased availability and quality of biological network databases, methods that can utilize this prior knowledge are needed and will offer biologists with a viable, more powerful alternative when analyzing RNA-seq data.

Results: We propose a three-state Markov Random Field (MRF) method that utilizes known biological pathways and interaction to improve sensitivity and specificity and therefore reducing false discovery rates (FDRs) when detecting differentially expressed genes from RNA-seq data. The method requires normalized count data (e.g. in Fragments or Reads Per Kilobase of transcript per Million mapped reads (FPKM/RPKM) format) as its input and it is implemented in an R package pathDESeq available from Github. Simulation studies demonstrate that our method outperforms the two-state MRF model for various sample sizes. Furthermore, for a comparable FDR, it has better sensitivity than DESeq, EBSeq, edgeR and NOISeq. The proposed method also picks more top Gene Ontology terms and KEGG pathways terms when applied to real dataset from colorectal cancer and hepatocellular carcinoma studies, respectively. Overall, these findings clearly highlight the power of our method relative to the existing methods that do not utilize prior knowledge of biological network.

Availability and implementation: As an R package at https://github.com/MalathiSIDona/pathDESeq.

To install the package type: install_github("MalathiSIDona/pathDESeq",build_vignettes = TRUE). After installation, type vignette("pathDESeq") to access the vignette.

Contact: a.salim@latrobe.edu.au.

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Carcinoma, Hepatocellular / genetics
  • Carcinoma, Hepatocellular / metabolism
  • Colorectal Neoplasms / genetics
  • Colorectal Neoplasms / metabolism
  • Gene Expression Regulation, Neoplastic
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Liver Neoplasms / genetics
  • Liver Neoplasms / metabolism
  • Sample Size
  • Sequence Analysis, RNA / methods*
  • Transcriptome*