Computational Variation: An Underinvestigated Quantitative Variability Caused by Automated Data Processing in Untargeted Metabolomics

Anal Chem. 2021 Jun 16. doi: 10.1021/acs.analchem.0c03381. Online ahead of print.

Abstract

Computational tools are commonly used in untargeted metabolomics to automatically extract metabolic features from liquid chromatography-mass spectrometry (LC-MS) raw data. However, due to the incapability of software to accurately determine chromatographic peak heights/areas for features with poor chromatographic peak shape, automated data processing in untargeted metabolomics faces additional quantitative variation (i.e., computational variation) besides the well-recognized analytical and biological variations. In this work, using multiple biological samples, we investigated how experimental factors, including sample concentrations, LC separation columns, and data processing programs, contribute to computational variation. For example, we found that the peak height (PH)-based quantification is more precise when MS-DIAL was used for data processing. We further systematically compared the different patterns of computational variation between PH- and peak area (PA)-based quantitative measurements. Our results suggest that the magnitude of computational variation is highly consistent at a given concentration. Hence, we proposed a quality control (QC) sample-based correction workflow to minimize computational variation by automatically selecting PH or PA-based measurement for each intensity value. This bioinformatic solution was demonstrated in a metabolomic comparison of leukemia patients before and after chemotherapy. Our novel workflow can be effectively applied on 652 out of 915 metabolic features, and over 31% (206 out of 652) of corrected features showed distinctly changed statistical significance. Overall, this work highlights computational variation, a considerable but underinvestigated quantitative variability in omics-scale quantitative analyses. In addition, the proposed bioinformatic solution can minimize computational variation, thus providing a more confident statistical comparison among biological groups in quantitative metabolomics.