Building blocks for automated elucidation of metabolites: natural product-likeness for candidate ranking

Kalai Vanii Jayaseelan; Christoph Steinbeck

doi:10.1186/1471-2105-15-234

Building blocks for automated elucidation of metabolites: natural product-likeness for candidate ranking

BMC Bioinformatics. 2014 Jul 5:15:234. doi: 10.1186/1471-2105-15-234.

Authors

Kalai Vanii Jayaseelan¹, Christoph Steinbeck

Affiliation

¹ Cheminformatics and Metabolism, European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. kalai@ebi.ac.uk.

Abstract

Background: In metabolomics experiments, spectral fingerprints of metabolites with no known structural identity are detected routinely. Computer-assisted structure elucidation (CASE) has been used to determine the structural identities of unknown compounds. It is generally accepted that a single 1D NMR spectrum or mass spectrum is usually not sufficient to establish the identity of a hitherto unknown compound. When a suite of spectra from 1D and 2D NMR experiments supplemented with a molecular formula are available, the successful elucidation of the chemical structure for candidates with up to 30 heavy atoms has been reported previously by one of the authors. In high-throughput metabolomics, usually 1D NMR or mass spectrometry experiments alone are conducted for rapid analysis of samples. This method subsequently requires that the spectral patterns are analyzed automatically to quickly identify known and unknown structures. In this study, we investigated whether additional existing knowledge, such as the fact that the unknown compound is a natural product, can be used to improve the ranking of the correct structure in the result list after the structure elucidation process.

Results: To identify unknowns using as little spectroscopic information as possible, we implemented an evolutionary algorithm-based CASE mechanism to elucidate candidates in a fully automated fashion, with input of the molecular formula and 13C NMR spectrum of the isolated compound. We also tested how filters like natural product-likeness, a measure that calculates the similarity of the compounds to known natural product space, might enhance the performance and quality of the structure elucidation. The evolutionary algorithm is implemented within the SENECA package for CASE reported previously, and is available for free download under artistic license at http://sourceforge.net/projects/seneca/. The natural product-likeness calculator is incorporated as a plugin within SENECA and is available as a GUI client and command-line executable. Significant improvements in candidate ranking were demonstrated for 41 small test molecules when the CASE system was supplemented by a natural product-likeness filter.

Conclusions: In spectroscopically underdetermined structure elucidation problems, natural product-likeness can contribute to a better ranking of the correct structure in the results list.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Automation
Biological Products / chemistry*
Biostatistics / methods*
Magnetic Resonance Spectroscopy
Mass Spectrometry
Metabolomics / methods*

Substances

Biological Products

Grants and funding

BB/K004301/1/Biotechnology and Biological Sciences Research Council/United Kingdom