LUADpp: an effective prediction model on prognosis of lung adenocarcinomas based on somatic mutational features

BMC Cancer. 2019 Mar 22;19(1):263. doi: 10.1186/s12885-019-5433-7.


Background: Lung adenocarcinoma is the most common type of lung cancers. Whole-genome sequencing studies disclosed the genomic landscape of lung adenocarcinomas. however, it remains unclear if the genetic alternations could guide prognosis prediction. Effective genetic markers and their based prediction models are also at a lack for prognosis evaluation.

Methods: We obtained the somatic mutation data and clinical data for 371 lung adenocarcinoma cases from The Cancer Genome Atlas. The cases were classified into two prognostic groups (3-year survival), and a comparison was performed between the groups for the somatic mutation frequencies of genes, followed by development of computational models to discrete the different prognosis.

Results: Genes were found with higher mutation rates in good (≥ 3-year survival) than in poor (< 3-year survival) prognosis group of lung adenocarcinoma patients. Genes participating in cell-cell adhesion and motility were significantly enriched in the top gene list with mutation rate difference between the good and poor prognosis group. Support Vector Machine models with the gene somatic mutation features could well predict prognosis, and the performance improved as feature size increased. An 85-gene model reached an average cross-validated accuracy of 81% and an Area Under the Curve (AUC) of 0.896 for the Receiver Operating Characteristic (ROC) curves. The model also exhibited good inter-stage prognosis prediction performance, with an average AUC of 0.846 for the ROC curves.

Conclusion: The prognosis of lung adenocarcinomas is related with somatic gene mutations. The genetic markers could be used for prognosis prediction and furthermore provide guidance for personal medicine.

Keywords: Lung adenocarcinomas; Machine learning; Personal medicine; Somatic mutational; Support vector machine model.

MeSH terms

  • Adenocarcinoma of Lung / genetics
  • Adenocarcinoma of Lung / mortality*
  • Adenocarcinoma of Lung / pathology
  • Adenocarcinoma of Lung / therapy
  • Biomarkers, Tumor / genetics*
  • Computational Biology
  • Datasets as Topic
  • Feasibility Studies
  • Genomics / methods
  • Humans
  • Lung Neoplasms / genetics
  • Lung Neoplasms / mortality*
  • Lung Neoplasms / pathology
  • Lung Neoplasms / therapy
  • Models, Biological*
  • Mutation
  • Precision Medicine / methods
  • Prognosis
  • ROC Curve
  • Support Vector Machine*
  • Survival Analysis
  • Survival Rate


  • Biomarkers, Tumor