Non-small-cell lung cancer pathological subtype-related gene selection and bioinformatics analysis based on gene expression profiles

Mol Clin Oncol. 2018 Feb;8(2):356-361. doi: 10.3892/mco.2017.1516. Epub 2017 Nov 27.

Abstract

Lung cancer is one of the most common malignant diseases and a major threat to public health on a global scale. Non-small-cell lung cancer (NSCLC) has a higher degree of malignancy and a lower 5-year survival rate compared with that of small-cell lung cancer. NSCLC may be mainly divided into two pathological subtypes, adenocarcinoma and squamous cell carcinoma. The aim of the present study was to identify disease genes based on the gene expression profile and the shortest path analysis of weighted functional protein association networks with the existing protein-protein interaction data from the Search Tool for the Retrieval of Interacting Genes. The gene expression profile (GSE10245) was downloaded from the National Center for Biotechnology Information Gene Expression Omnibus database, including 40 lung adenocarcinoma and 18 lung squamous cell carcinoma tissues. A total of 8 disease genes were identified using Naïve Bayesian Classifier based on the Maximum Relevance Minimum Redundancy feature selection method following preprocessing. An additional 21 candidate genes were selected using the shortest path analysis with Dijkstra's algorithm. The AURKA and SLC7A2 genes were selected three and two times in the shortest path analysis, respectively. All those genes participate in a number of important pathways, such as oocyte meiosis, cell cycle and cancer pathways with Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis. The present findings may provide novel insights into the pathogenesis of NSCLC and enable the development of novel therapeutic strategies. However, further investigation is required to confirm these findings.

Keywords: bioinformatics analysis; feature selection; microarray; non-small-cell lung cancer; pathway.