Visualization of a Machine Learning Framework toward Highly Sensitive Qualitative Analysis by SERS

Anal Chem. 2022 Jul 19;94(28):10151-10158. doi: 10.1021/acs.analchem.2c01450. Epub 2022 Jul 6.

Abstract

Surface-enhanced Raman spectroscopy (SERS), providing near-single-molecule-level fingerprint information, is a powerful tool for the trace analysis of a target in a complicated matrix and is especially facilitated by the development of modern machine learning algorithms. However, both the high demand of mass data and the low interpretability of the mysterious black-box operation significantly limit the well-trained model to real systems in practical applications. Aiming at these two issues, we constructed a novel machine learning algorithm-based framework (Vis-CAD), integrating visual random forest, characteristic amplifier, and data augmentation. The introduction of data augmentation significantly reduced the requirement of mass data, and the visualization of the random forest clearly presented the captured features, by which one was able to determine the reliability of the algorithm. Taking the trace analysis of individual polycyclic aromatic hydrocarbons in a mixture as an example, a trustworthy accuracy no less than 99% was realized under the optimized condition. The visualization of the algorithm framework distinctly demonstrated that the captured feature was well correlated to the characteristic Raman peaks of each individual. Furthermore, the sensitivity toward the trace individual could be improved by least 1 order of magnitude as compared to that with the naked eye. The proposed algorithm distinguished by the lesser demand of mass data and the visualization of the operation process offers a new way for the indestructible application of machine learning algorithms, which would bring push-to-the-limit sensitivity toward the qualitative and quantitative analysis of trace targets, not only in the field of SERS, but also in the much wider spectroscopy world. It is implemented in the Python programming language and is open-source at https://github.com/3331822w/Vis-CAD.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Machine Learning*
  • Polycyclic Aromatic Hydrocarbons*
  • Reproducibility of Results
  • Spectrum Analysis, Raman / methods

Substances

  • Polycyclic Aromatic Hydrocarbons