Analysis of expression differences of immune genes in non-small cell lung cancer based on TCGA and ImmPort data sets and the application of a prognostic model

Ann Transl Med. 2020 Apr;8(8):550. doi: 10.21037/atm.2020.04.38.

Abstract

Background: There has been little investigation carried out into the activity of immune-related genes in the prognosis of non-small cell lung cancer (NSCLC). Our study set out to analyze the correlation between the differential expression of immune genes and NSCLC prognosis by screening the differential expression of immune genes. Based on the immune genes identified, we aimed to construct a prognostic risk model and explore some novel molecules which have predictive potential for therapeutic effect and prognosis in lung cancer.

Methods: Immune gene transcriptome data and clinical data of NSCLC samples were extracted from TCGA database, and transcription factors in the ImmPort dataset were obtained. The data were divided into two groups: normal tissues and tumor tissues. The expression levels of immune genes were compared using the edgeR algorithm, and then differential expression analysis was performed. The survival analysis was carried out by combining differential immune genes with clinical survival time, so that the immune genes influencing the prognosis of NSCLC could be determined. A risk score was calculated based on the expression levels of the immune genes related to the prognosis of NSCLC and their corresponding coefficients to construct a prognostic risk model. This model was used to calculate patient risk scores and perform clinical correlation analysis. The selected molecules were further verified by clinical samples.

Results: By comparing NSCLC tissues with normal tissues, a total of 6,778 differentially expressed genes were found (P<0.05), of which 490 were differential immune-related genes. Survival analysis determined 28 differential immune genes to be associated with prognosis (P<0.05). We calculated the patient risk value based on the immune gene prognosis model. The survival curve was drawn according to the patient risk score and showed that the survival prognosis was significantly different for the high-risk and the low-risk groups (P<0.05). The area under the receiver operating characteristic (ROC) curve (AUC) was 0.723, which represented a relatively high true-positive rate. All of the results proved the reliability of our immune gene risk prognostic model. After drawing the risk curve, S100A16, IGKV4, S100P, ANGPTL4, SEMA4B, and LGR4 were found to be the high-risk immune genes in NSCLC. Clinical correlation analysis of survival-related differential immune genes revealed that in patients with lymph node metastasis, ANGPTL4 was positively correlated with T stage, S100a16 and SEMA4B were upregulated, and VIPR1 was downregulated. Further analysis revealed that VIPR1 was decreased in metastatic lung cancer compared to non-metastatic lung cancer. Furthermore, the real-time PCR detection of the clinical samples showed that S100A16 expression in lung cancer was increased, while VIPR1 expression in lung cancer was downregulated, which was consistent with the results of our bioinformatics analysis.

Conclusions: Based on big data from the TCGA and ImmPort databases, our study analyzed the relationship between differential expression of immune-related genes and clinical data, and constructed a prognostic model based on the immune genes identified. Two novel molecules, S100A16 and VIPR1, were verified to possibly have significant biological function in NSCLC. Our research may provide us with new insight into the immune genes by which the malignant biological behavior of NSCLC is mediated.

Keywords: Non-small cell lung cancer (NSCLC); clinical significance; immune gene; prognostic model; risk score.