Discriminant Analysis of Lung Cancer Using Nonlinear Clustering of Copy Numbers

Cancer Invest. 2020 Feb;38(2):102-112. doi: 10.1080/07357907.2020.1719501. Epub 2020 Jan 30.

Abstract

Background: Patient survival is not optimal for non-small cell lung cancer (NSCLC) patients, recurrence rate is high, and hence, early detection is crucial to increase the patient's survival. Gene-cancer mapping intends to discover associated genes with cancers and due to advances in high-throughput genotyping, screening for disease loci on a genome-wide scale is now possible. DNA copy numbers can potentially be used to identify cancer from normal cells in early detection of cancer.Methods: We use a nonlinear clustering method, so-called kernel K-means to separate cancer from normal samples. Kernel K-means is applied to the copy numbers obtained for each chromosome to cluster 63 paired cancer-blood samples (total of 126 samples) into two groups. Clustering performance is evaluated using true and false-positive rates, true and false-negative rates, and a nonlinear criterion, normalized mutual information (NMI).Results: Copy numbers of paired cancer-blood samples for 63 NSCLC patients are used in this study. Kernel K-means was applied to cluster 126 samples in two groups using copy numbers on each chromosome separately. The clustering results for 22 chromosomes are evaluated and discriminant power of them in identifying cancer is computed. We identified the top five and bottom five chromosomes based on their discriminant power.Conclusions: The results reveal high discriminant power of chromosomes 8, 5, 1, 3, and 19 for identifying cancer with the highest sensitivity of 75% yielded by chromosome 5. Bottom 5 chromosomes 9, 6, 4, 13, and 21 show low discriminant power with the accuracy of below 54% where true cancer and normal samples are grouped into substantially overlapping groups using copy numbers. This indicates the similarities of copy numbers obtained for cancer and normal samples on these chromosomes.

Keywords: DNA copy numbers; Lung cancer; clustering; discriminant analysis; kernel K-means; normalized mutual information.

MeSH terms

  • Carcinoma, Non-Small-Cell Lung / blood
  • Carcinoma, Non-Small-Cell Lung / diagnosis
  • Carcinoma, Non-Small-Cell Lung / genetics*
  • Cluster Analysis
  • DNA Copy Number Variations*
  • Discriminant Analysis
  • Early Detection of Cancer / methods
  • Humans
  • Lung Neoplasms / blood
  • Lung Neoplasms / diagnosis
  • Lung Neoplasms / genetics*
  • Neoplasm Recurrence, Local
  • Polymorphism, Single Nucleotide*
  • Reproducibility of Results
  • Sensitivity and Specificity