Molecular signatures have been excessively reported for diagnosis of many cancers during the last 20 years. However, false-positive signatures are always found using statistical methods or machine learning approaches, and that makes subsequent biological experiments fail. Therefore, signature discovery has gradually become a non-mainstream work in bioinformatics. Actually, there are three critical weaknesses that make the identified signature unreliable. First of all, a signature is wrongly thought to be a gene set, each component of which keeps differential expressions between or among sample groups. Second, there may be many false-positive genes expressed differentially found, even if samples derived from cancer or normal group can be separated in one-dimensional space. Third, cross-platform validation results of a discovered signature are always poor. In order to solve these problems, we propose a new feature selection framework based on ensemble classification to discover signatures for cancer diagnosis. Meanwhile, a procedure for data transform among different expression profiles across different platforms is also designed. Signatures are found on simulation and real data representing different carcinomas across different platforms. Besides, false positives are suppressed. The experimental results demonstrate the effectiveness of our method.
Keywords: RNA-seq; ensemble classification; expression profiles; feature selection; signature.
© The Author(s) 2022. Published by Oxford University Press. All rights reserved. For Permissions, please email: email@example.com.