Identifying Lung Cancer Cell Markers with Machine Learning Methods and Single-Cell RNA-Seq Data

Life (Basel). 2021 Sep 9;11(9):940. doi: 10.3390/life11090940.

Abstract

Non-small cell lung cancer is a major lethal subtype of epithelial lung cancer, with high morbidity and mortality. The single-cell sequencing technique plays a key role in exploring the pathogenesis of non-small cell lung cancer. We proposed a computational method for distinguishing cell subtypes from the different pathological regions of non-small cell lung cancer on the basis of transcriptomic profiles, including a group of qualitative classification criteria (biomarkers) and various rules. The random forest classifier reached a Matthew's correlation coefficient (MCC) of 0.922 by using 720 features, and the decision tree reached an MCC of 0.786 by using 1880 features. The obtained biomarkers and rules were analyzed in the end of this study.

Keywords: cell biomarker; decision tree; feature selection; lung cancer; quantitative rules; random forest.