INDEED: Integrated differential expression and differential network analysis of omic data for biomarker discovery

Methods. 2016 Dec 1;111:12-20. doi: 10.1016/j.ymeth.2016.08.015. Epub 2016 Aug 31.

Abstract

Differential expression (DE) analysis is commonly used to identify biomarker candidates that have significant changes in their expression levels between distinct biological groups. One drawback of DE analysis is that it only considers the changes on single biomolecule level. Recently, differential network (DN) analysis has become popular due to its capability to measure the changes on biomolecular pair level. In DN analysis, network is typically built based on correlation and biomarker candidates are selected by investigating the network topology. However, correlation tends to generate over-complicated networks and the selection of biomarker candidates purely based on network topology ignores the changes on single biomolecule level. In this paper, we propose a novel approach, INDEED, that builds sparse differential network based on partial correlation and integrates DE and DN analyses for biomarker discovery. We applied this approach on real proteomic and glycomic data generated by liquid chromatography coupled with mass spectrometry for hepatocellular carcinoma (HCC) biomarker discovery study. For each omic data, we used one dataset to select biomarker candidates, built a disease classifier and evaluated the performance of the classifier on an independent dataset. The biomarker candidates, selected by INDEED, were more reproducible across independent datasets, and led to a higher classification accuracy in predicting HCC cases and cirrhotic controls compared with those selected by separate DE and DN analyses. INDEED also identified some candidates previously reported to be relevant to HCC, such as intercellular adhesion molecule 2 (ICAM2) and c4b-binding protein alpha chain (C4BPA), which were missed by both DE and DN analyses. In addition, we applied INDEED for survival time prediction based on transcriptomic data acquired by analysis of samples from breast cancer patients. We selected biomarker candidates and built a regression model for survival time prediction based on a gene expression dataset and patients' survival records. We evaluated the performance of the regression model on an independent dataset. Compared with the biomarker candidates selected by DE and DN analyses, those selected through INDEED led to more accurate survival time prediction.

Keywords: Differential expression analysis; Differential network analysis; Glycomics; Proteomics; Transcriptomics.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Antigens, CD / genetics*
  • Biomarkers, Tumor / genetics*
  • Carcinoma, Hepatocellular / genetics
  • Carcinoma, Hepatocellular / metabolism
  • Cell Adhesion Molecules / genetics*
  • Chromatography, Liquid
  • Complement C4b-Binding Protein / genetics*
  • Gene Expression Regulation, Neoplastic
  • Glycomics / methods
  • Humans
  • Liver Neoplasms / genetics
  • Liver Neoplasms / metabolism
  • Mass Spectrometry
  • Proteomics / methods*
  • Transcriptome / genetics

Substances

  • Antigens, CD
  • Biomarkers, Tumor
  • C4BPA protein, human
  • Cell Adhesion Molecules
  • Complement C4b-Binding Protein
  • ICAM2 protein, human