Background: Metabolomics LC-MS experiments yield large numbers of peaks, few of which can be identified by database matching. Many of the remaining peaks correspond to derivatives of identified peaks (e.g., isotope peaks, adducts, fragments and multiply charged molecules). In this article, we present a data-reduction approach that automatically identifies these derivative peaks.
Results: Using data-driven clustering based on chromatographic peak shape correlation and intensity patterns across biological replicates, derivative peaks can be reliably identified. Using a test data set obtained from Leishmania donovani extracts, we achieved a 60% reduction of the number of peaks. After quality control filtering, almost 80% of the peaks could putatively be identified by database matching.
Conclusion: Automated peak filtering substantially speeds up the data-interpretation process.