Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jul 5;15:234.
doi: 10.1186/1471-2105-15-234.

Building Blocks for Automated Elucidation of Metabolites: Natural Product-Likeness for Candidate Ranking

Affiliations
Free PMC article

Building Blocks for Automated Elucidation of Metabolites: Natural Product-Likeness for Candidate Ranking

Kalai Vanii Jayaseelan et al. BMC Bioinformatics. .
Free PMC article

Abstract

Background: In metabolomics experiments, spectral fingerprints of metabolites with no known structural identity are detected routinely. Computer-assisted structure elucidation (CASE) has been used to determine the structural identities of unknown compounds. It is generally accepted that a single 1D NMR spectrum or mass spectrum is usually not sufficient to establish the identity of a hitherto unknown compound. When a suite of spectra from 1D and 2D NMR experiments supplemented with a molecular formula are available, the successful elucidation of the chemical structure for candidates with up to 30 heavy atoms has been reported previously by one of the authors. In high-throughput metabolomics, usually 1D NMR or mass spectrometry experiments alone are conducted for rapid analysis of samples. This method subsequently requires that the spectral patterns are analyzed automatically to quickly identify known and unknown structures. In this study, we investigated whether additional existing knowledge, such as the fact that the unknown compound is a natural product, can be used to improve the ranking of the correct structure in the result list after the structure elucidation process.

Results: To identify unknowns using as little spectroscopic information as possible, we implemented an evolutionary algorithm-based CASE mechanism to elucidate candidates in a fully automated fashion, with input of the molecular formula and 13C NMR spectrum of the isolated compound. We also tested how filters like natural product-likeness, a measure that calculates the similarity of the compounds to known natural product space, might enhance the performance and quality of the structure elucidation. The evolutionary algorithm is implemented within the SENECA package for CASE reported previously, and is available for free download under artistic license at http://sourceforge.net/projects/seneca/. The natural product-likeness calculator is incorporated as a plugin within SENECA and is available as a GUI client and command-line executable. Significant improvements in candidate ranking were demonstrated for 41 small test molecules when the CASE system was supplemented by a natural product-likeness filter.

Conclusions: In spectroscopically underdetermined structure elucidation problems, natural product-likeness can contribute to a better ranking of the correct structure in the results list.

Figures

Figure 1
Figure 1
Evolutionary algorithm scheme for CASE. Evolution starts with an initial population of 16 individuals. The initial population is seeded by mutating the single random structure generated from the molecular formula. The population is evaluated for termination criteria, i.e. maximum fitness, maximum allowed runtime, maximum allowed generations, or maximum allowed generations with no improvement. Evolution continues until any one of the above conditions is met. The population is doubled by mutating every individual before fitness is evaluated. After fitness evaluation, the fittest ones are promoted to the next generation by round-robin tournaments selection. Once one of the termination conditions is met, the solution set is reported.
Figure 2
Figure 2
Forty-one test structures collected from the Journal of Natural Products. Test molecules with a heavy atom count of ≤15 were collected only from recently published articles. All these structures were cross checked with the NMRShiftDB index to ensure that none of them was already present in the training index. The NP-Score for the given molecules were calculated using our previously reported NP-likeness calculator and is based on 3-sphere signature height.
Figure 3
Figure 3
Performance of SENECA using the new fitness evaluators. 100 sequential runs were performed for every 41 test case, i.e. 4100 runs each for the NMR_only and NMR_NP evaluations. The NMR_NP judge predicted the correct solutions more often than the NMR_only judge. Correct candidates were retrieved in the solution set in 1470/4100 runs, and 1096/4100 runs, using NMR_NP and NMR_only, respectively.
Figure 4
Figure 4
Number of times the correct candidate was ranked first among the retrieved cases. Correct structures were predicted for 36/41 test cases in total. Of the correctly predicted cases, 9/34 and 17/35 were ranked first using the NMR_only and NMR_NP judges, respectively, showing that the application of NP-likeness frequently improved the rank of the correct solution.
Figure 5
Figure 5
Spread of ranks of all the correct solutions across all test cases and runs. The rank given for the correct structure among the other predicted candidates using the NMR_only and NMR_NP judges is shown. An empty index indicates that there was no successful prediction for that test case in any of the 100 runs.
Figure 6
Figure 6
Distribution of ranks of all the correct solutions across all test cases and runs. The overall distribution of ranks given to the correct structure as shown in Figure 5, is summarised here using a violin plot.
Figure 7
Figure 7
Average rank of all the correct solutions over the 100 runs for each test case. In addition to predicting the correct structure more often (as shown in Figure 3), the application of NP-likeness, on average, improved the rank of the correct structure among the predicted candidates. The average ranks here are smoothened out by applying loess regression.

Similar articles

See all similar articles

Cited by 3 articles

References

    1. Steinbeck C. The automation of natural product structure elucidation. Curr Opinion Drug Discov Dev. 2001;4(3):338–342. - PubMed
    1. Steinbeck C. In: Handbook of Chemoinformatics. Gasteiger J, Engel T, editor. Weinheim: John Wiley & Sons; 2003. Computer-assisted structure elucidation; pp. 1378–1406.
    1. Peironcely JE, Reijmers T, Coulier L, Bender A, Hankemeier T. Understanding and classifying metabolite space and metabolite-likeness. PloS one. 2011;6(12):e28966. doi: 10.1371/journal.pone.0028966. - DOI - PMC - PubMed
    1. Jayaseelan K, Moreno P, Truszkowski A, Ertl P, Steinbeck C. Natural product-likeness score revisited: an open-source, open-data implementation. BMC Bioinformatics. 2012;13:106. doi: 10.1186/1471-2105-13-106. - DOI - PMC - PubMed
    1. Steinbeck C. SENECA: A platform-independent, distributed, and parallel system for computer-assisted structure elucidation in organic chemistry. J Chem Inf Comput Sci. 2001;41(6):1500–1507. doi: 10.1021/ci000407n. - DOI - PubMed

Publication types

Substances

LinkOut - more resources

Feedback