Background: In recent years, high throughput and non-invasive Raman spectrometry technique has matured as an effective approach to identification of individual cells by species, even in complex, mixed populations. Raman profiling is an appealing optical microscopic method to achieve this. To fully utilize Raman proling for single-cell analysis, an extensive understanding of Raman spectra is necessary to answer questions such as which filtering methodologies are effective for pre-processing of Raman spectra, what strains can be distinguished by Raman spectra, and what features serve best as Raman-based biomarkers for single-cells, etc.
Results: In this work, we have proposed an approach called rDisc to discretize the original Raman spectrum into only a few (usually less than 20) representative peaks (Raman shifts). The approach has advantages in removing noises, and condensing the original spectrum. In particular, effective signal processing procedures were designed to eliminate noise, utilising wavelet transform denoising, baseline correction, and signal normalization. In the discretizing process, representative peaks were selected to signicantly decrease the Raman data size. More importantly, the selected peaks are chosen as suitable to serve as key biological markers to differentiate species and other cellular features. Additionally, the classication performance of discretized spectra was found to be comparable to full spectrum having more than 1000 Raman shifts. Overall, the discretized spectrum needs about 5storage space of a full spectrum and the processing speed is considerably faster. This makes rDisc clearly superior to other methods for single-cell classication.