Identification of Candidate Biomarkers Correlated With the Pathogenesis and Prognosis of Non-small Cell Lung Cancer via Integrated Bioinformatics Analysis

Front Genet. 2018 Oct 12:9:469. doi: 10.3389/fgene.2018.00469. eCollection 2018.

Abstract

Background and Objective: Non-small cell lung cancer (NSCLC) accounts for 80-85% of all patients with lung cancer and 5-year relative overall survival (OS) rate is less than 20%, so that identifying novel diagnostic and prognostic biomarkers is urgently demanded. The present study attempted to identify potential key genes associated with the pathogenesis and prognosis of NSCLC. Methods: Four GEO datasets (GSE18842, GSE19804, GSE43458, and GSE62113) were obtained from the Gene Expression Omnibus (GEO) database. The differentially expressed genes (DEGs) between NSCLC samples and normal ones were analyzed using limma package, and RobustRankAggreg (RRA) package was used to conduct gene integration. Moreover, Search Tool for the Retrieval of Interacting Genes database (STRING), Cytoscape, and Molecular Complex Detection (MCODE) were utilized to establish protein-protein interaction (PPI) network of these DEGs. Furthermore, functional enrichment and pathway enrichment analyses for DEGs were performed by Funrich and OmicShare. While the expressions and prognostic values of top genes were carried out through Gene Expression Profiling Interactive Analysis (GEPIA) and Kaplan Meier-plotter (KM) online dataset. Results: A total of 249 DEGs (113 upregulated and 136 downregulated) were identified after gene integration. Moreover, the PPI network was established with 166 nodes and 1784 protein pairs. Topoisomerase II alpha (TOP2A), a top gene and hub node with higher node degrees in module 1, was significantly enriched in mitotic cell cycle pathway. In addition, Interleukin-6 (IL-6) was enriched in amb2 integrin signaling pathway. The mitotic cell cycle was the most significant pathway in module 1 with the highest P-value. Besides, five hub genes with high degree of connectivity were selected, including TOP2A, CCNB1, CCNA2, UBE2C, and KIF20A, and they were all correlated with worse OS in NSCLC. Conclusion: The results showed that TOP2A, CCNB1, CCNA2, UBE2C, KIF20A, and IL-6 may be potential key genes, while the mitotic cell cycle pathway may be a potential pathway contribute to progression in NSCLC. Further, it could be used as a new biomarker for diagnosis and to direct the synthesis medicine of NSCLC.

Keywords: GEO; bioinformatics; biomarker; differentially expressed genes; non-small cell lung cancer; survival.