Robust microarray data feature selection using a correntropy based distance metric learning approach

Comput Biol Med. 2023 Jul:161:107056. doi: 10.1016/j.compbiomed.2023.107056. Epub 2023 May 22.

Abstract

Classification of high-dimensional microarray data is a challenge in bioinformatics and genetic data processing. One of the challenging issues of feature selection is the presence of outliers. The Euclidean distance metric is sensitive to outliers. In this study, a distance metric learning based feature selection approach that uses the correntropy function as the discrimination metric is proposed. For this purpose, the metric learning problem is formulated as an optimization problem and solved using the Lagrange method. The output of the approach signifies the most important and robust features. After feature selection, different classification methods such as SVM, decision trees, and NN classifiers are used to investigate the classification accuracy of the proposed method as well as precision, recall, and F-measure. Experiments are carried out on 13 high-dimensional datasets and show that the proposed method outperforms the previous models in terms of accuracy and robustness.

Keywords: Correntropy; Distance metric learning; Feature selection; Microarray data classifications; Robustness.

MeSH terms

  • Algorithms*
  • Computational Biology* / methods
  • Learning
  • Microarray Analysis