Common data models to streamline metabolomics processing and annotation, and implementation in a Python pipeline

Joshua M Mitchell; Yuanye Chi; Maheshwor Thapa; Zhiqiang Pang; Jianguo Xia; Shuzhao Li

doi:10.1101/2024.02.13.580048

Common data models to streamline metabolomics processing and annotation, and implementation in a Python pipeline

bioRxiv [Preprint]. 2024 Feb 14:2024.02.13.580048. doi: 10.1101/2024.02.13.580048.

Authors

Joshua M Mitchell¹, Yuanye Chi¹, Maheshwor Thapa¹, Zhiqiang Pang², Jianguo Xia², Shuzhao Li^{1

3}

Affiliations

¹ The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA.
² Institute of Parasitology, McGill University, Montreal, Quebec, Canada.
³ University of Connecticut School of Medicine, Farmington, CT 06032, USA.

Abstract

To standardize metabolomics data analysis and facilitate future computational developments, it is essential is have a set of well-defined templates for common data structures. Here we describe a collection of data structures involved in metabolomics data processing and illustrate how they are utilized in a full-featured Python-centric pipeline. We demonstrate the performance of the pipeline, and the details in annotation and quality control using large-scale LC-MS metabolomics and lipidomics data and LC-MS/MS data. Multiple previously published datasets are also reanalyzed to showcase its utility in biological data analysis. This pipeline allows users to streamline data processing, quality control, annotation, and standardization in an efficient and transparent manner. This work fills a major gap in the Python ecosystem for computational metabolomics.

Publication types

Preprint

Abstract

Publication types

Grants and funding