A random forest classifier predicts recurrence risk in patients with ovarian cancer

Mol Med Rep. 2018 Sep;18(3):3289-3297. doi: 10.3892/mmr.2018.9300. Epub 2018 Jul 19.

Abstract

Ovarian cancer (OC) is associated with a poor prognosis due to difficulties in early detection. The aims of the present study were to construct a recurrence risk prediction model and to reveal important OC genes or pathways. RNA sequencing data was obtained for 307 OC samples, and the corresponding clinical data were downloaded from The Cancer Genome Atlas database. Additionally, two validation datasets, GSE44104 (20 recurrent and 40 non‑recurrent OC samples) and GSE49997 (204 OC samples), were obtained from the Gene Expression Omnibus database. Differentially expressed genes were screened using the differential expression via distance synthesis algorithm, followed by gene ontology enrichment analysis and weighted gene coexpression network analysis (WGCNA). Furthermore, subnetwork analysis was conducted for the protein‑protein interaction (PPI) network using the BioNet package. Finally, a random forest classifier was constructed based on the subnetwork nodes, and its reliability was validated using the GSE44104 and GSE49997 validation datasets. A total of 44 upregulated and 117 downregulated genes were identified in the recurrent samples. Enrichment analysis indicated that cytochrome P450 family 17 subfamily A member 1 (CYP17A1) was associated with 'positive regulation of steroid hormone biosynthetic processes'. WGCNA identified turquoise and grey modules that were significantly correlated with status and prognosis. A significant PPI subnetwork containing 16 nodes was also identified, including: Transcription factor GATA‑4; fibroblast growth factor 9; aromatase; 3β‑hydroxysteroid dehydrogenase/δ5‑4‑isomerase type 2; corticosteroid 11β‑dehydrogenase isozyme 1; CYP17A1; pituitary homeobox 2; left‑right determination factor 1; homeobox protein ARX; estrogen receptor β; steroidogenic factor 1; forkhead box protein L2; myocardin; steroidogenic acute regulatory protein mitochondrial; vesicular inhibitory amino acid transporter; and twist‑related protein 1. A random forest classifier was constructed using the subnetwork nodes as feature genes, which exhibited a 92% true positive rate when classifying recurrent and non‑recurrent OC samples. The classifying efficiency of the random forest classifier was validated using the two other independent datasets. Overall, 44 upregulated and 117 downregulated genes associated with OC recurrence were identified. Furthermore, the 16 subnetwork node genes that were identified may be important molecules in OC recurrence.

MeSH terms

  • Computational Biology* / methods
  • Databases, Genetic
  • Female
  • Gene Expression Profiling* / methods
  • Gene Expression Regulation, Neoplastic*
  • Gene Ontology
  • Gene Regulatory Networks
  • Humans
  • Ovarian Neoplasms / genetics*
  • Ovarian Neoplasms / pathology*
  • ROC Curve
  • Recurrence
  • Reproducibility of Results
  • Risk Assessment