Unsupervised Feature Selection by a Genetic Algorithm for Mid-Infrared Spectral Data

Anal Chem. 2022 Nov 22;94(46):16050-16059. doi: 10.1021/acs.analchem.2c03118. Epub 2022 Nov 8.

Abstract

Dimensional reduction of highly multidimensional datasets such as those acquired by Fourier transform infrared spectroscopy (FTIR) is a critical step in the data analysis workflow. To achieve this goal, numerous feature selection methods have been developed and applied in a supervised context, i.e., using a priori knowledge about data usually in the form of labels for classification or quantitative values for regression. For this, genetic algorithms have been largely exploited due to their flexibility and global optimization principle. However, few applications in an unsupervised context have been reported in infrared spectroscopy. The aim of this article is to propose a new unsupervised feature selection method based on a genetic algorithm using a validity index computed from KMeans partitions as a fitness function. Evaluated on a simulated dataset and validated and tested on three real-world infrared spectroscopic datasets, our developed algorithm is able to find the spectral descriptors improving clustering accuracy and simplifying the spectral interpretation of results.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Spectrophotometry, Infrared
  • Spectroscopy, Fourier Transform Infrared