A Machine Learning Approach for Tracing Tumor Original Sites With Gene Expression Profiles

Front Bioeng Biotechnol. 2020 Nov 24:8:607126. doi: 10.3389/fbioe.2020.607126. eCollection 2020.

Abstract

Some carcinomas show that one or more metastatic sites appear with unknown origins. The identification of primary or metastatic tumor tissues is crucial for physicians to develop precise treatment plans for patients. With unknown primary origin sites, it is challenging to design specific plans for patients. Usually, those patients receive broad-spectrum chemotherapy, while still having poor prognosis though. Machine learning has been widely used and already achieved significant advantages in clinical practices. In this study, we classify and predict a large number of tumor samples with uncertain origins by applying the random forest and Naive Bayesian algorithms. We use the precision, recall, and other measurements to evaluate the performance of our approach. The results have showed that the prediction accuracy of this method was 90.4 for 7,713 samples. The accuracy was 80% for 20 metastatic tumors samples. In addition, the 10-fold cross-validation is used to evaluate the accuracy of classification, which reaches 91%.

Keywords: machine learning; naive Bayes; random forest; the ability of tissue tracing; uncertain origins.