Boosting edgeR (Robust) by dealing with missing observations and gene-specific outliers in RNA-Seq profiles and its application to explore biomarker genes for diagnosis and therapies of ovarian cancer

Genomics. 2024 May;116(3):110834. doi: 10.1016/j.ygeno.2024.110834. Epub 2024 Mar 26.

Abstract

The edgeR (Robust) is a popular approach for identifying differentially expressed genes (DEGs) from RNA-Seq profiles. However, it shows weak performance against gene-specific outliers and is unable to handle missing observations. To address these issues, we proposed a pre-processing approach of RNA-Seq count data by combining the iLOO-based outlier detection and random forest-based missing imputation approach for boosting the performance of edgeR (Robust). Both simulation and real RNA-Seq count data analysis results showed that the proposed edgeR (Robust) outperformed than the conventional edgeR (Robust). To investigate the effectiveness of identified DEGs for diagnosis, and therapies of ovarian cancer (OC), we selected top-ranked 12 DEGs (IL6, XCL1, CXCL8, C1QC, C1QB, SNAI2, TYROBP, COL1A2, SNAP25, NTS, CXCL2, and AGT) and suggested hub-DEGs guided top-ranked 10 candidate drug-molecules for the treatment against OC. Hence, our proposed procedure might be an effective computational tool for exploring potential DEGs from RNA-Seq profiles for diagnosis and therapies of any disease.

Keywords: Diagnosis and therapies; Differentially expressed genes (DEGs); Outlier diagnosis and missing value imputation; RNA-Seq profiles; edgeR (Robust).

MeSH terms

  • Biomarkers, Tumor* / genetics
  • Female
  • Gene Expression Profiling
  • Humans
  • Ovarian Neoplasms* / diagnosis
  • Ovarian Neoplasms* / genetics
  • Ovarian Neoplasms* / therapy
  • RNA-Seq*
  • Software
  • Transcriptome

Substances

  • Biomarkers, Tumor