Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Aug 16;9(1):46.
doi: 10.1186/s13321-017-0234-y.

Comparative Analysis of Chemical Similarity Methods for Modular Natural Products With a Hypothetical Structure Enumeration Algorithm

Affiliations
Free PMC article

Comparative Analysis of Chemical Similarity Methods for Modular Natural Products With a Hypothetical Structure Enumeration Algorithm

Michael A Skinnider et al. J Cheminform. .
Free PMC article

Abstract

Natural products represent a prominent source of pharmaceutically and industrially important agents. Calculating the chemical similarity of two molecules is a central task in cheminformatics, with applications at multiple stages of the drug discovery pipeline. Quantifying the similarity of natural products is a particularly important problem, as the biological activities of these molecules have been extensively optimized by natural selection. The large and structurally complex scaffolds of natural products distinguish their physical and chemical properties from those of synthetic compounds. However, no analysis of the performance of existing methods for molecular similarity calculation specific to natural products has been reported to date. Here, we present LEMONS, an algorithm for the enumeration of hypothetical modular natural product structures. We leverage this algorithm to conduct a comparative analysis of molecular similarity methods within the unique chemical space occupied by modular natural products using controlled synthetic data, and comprehensively investigate the impact of diverse biosynthetic parameters on similarity search. We additionally investigate a recently described algorithm for natural product retrobiosynthesis and alignment, and find that when rule-based retrobiosynthesis can be applied, this approach outperforms conventional two-dimensional fingerprints, suggesting it may represent a valuable approach for the targeted exploration of natural product chemical space and microbial genome mining. Our open-source algorithm is an extensible method of enumerating hypothetical natural product structures with diverse potential applications in bioinformatics.

Keywords: Chemical fingerprints; Chemical similarity; Chemical structure enumeration; Natural products.

Figures

Fig. 1
Fig. 1
Application of an algorithm for hypothetical modular natural product structure enumeration to comparative analysis of chemical similarity methods. A short, linear biomimetic polymer is generated by LEMONS, and one or more monomers are substituted. Tailoring reactions may be executed on the original and/or modified polymers. The modified polymer is compared to the entire original library with one of 18 chemical similarity algorithms. A correct match is scored if the modified structure displays greater chemical similarity to the original structure than to any of the other structures within the library. The process is repeated for each original polymer and the fraction of correct matches is calculated. Each experiment is repeated 100 times
Fig. 2
Fig. 2
Examples of original and modified structures generated by LEMONS. a A linear hybrid nonribosomal peptide/polyketide natural product containing an alicyclic starter unit is derivatized by substitution of an amino acid, the starter unit, and a polyketide monomer, in addition to chlorination and heterocyclization/oxidation to form a thiazole. b A macrocyclic polyketide is derivatized by substitution of two polyketide monomers and glycosylation by the deoxysugar actinosamine
Fig. 3
Fig. 3
Chemical similarity method performance on hypothetical libraries of linear peptides. a Percentage of correct matches after substitution of a single proteinogenic amino acid in a library of hypothetical linear oligopeptides. b Trends in percentage of correct matches with substitution of one to five proteinogenic amino acids
Fig. 4
Fig. 4
Chemical similarity method performance on hypothetical libraries of linear products. Trends in percentage of correct matches with substitution of one to five monomers within a hypothetical nonribosomal peptide (a), polyketide (b), or hybrid natural product (c)
Fig. 5
Fig. 5
Chemical similarity method performance on hypothetical libraries of linear natural products as a function of natural product size. Trends in percentage of correct matches with substitution of three monomers within hypothetical linear nonribosomal peptides (a), polyketides (b), hybrid natural products (c), or hybrids with starter units (d) containing five to fourteen monomers
Fig. 6
Fig. 6
Chemical similarity method performance on hypothetical libraries of cyclic natural products. Trends in percentage of correct matches with substitution of one to five monomers or the site of macrocyclization within a hypothetical cyclic nonribosomal peptide (a), polyketide (b), or hybrid natural product (c)
Fig. 7
Fig. 7
Chemical similarity method performance on hypothetical libraries of glycosylated natural products. Trends in percentage of correct matches with addition of one to three hexose or deoxsugars, or substitution of one to three glycosylation sites, within hypothetical linear or cyclic nonribosomal peptides (a), polyketides (b), or hybrid natural products (c)
Fig. 8
Fig. 8
Chemical similarity method performance on hypothetical libraries of complex hybrid natural products. Percentage of correct matches after substitution of two monomers in hypothetical libraries of complex hybrid natural products with variable macrocyclization, glycosylation, heterocyclization, halogenation, and N-methylation
Fig. 9
Fig. 9
Effect of random bond addition to chemical similarity method performance. Trend in percentage of correct matches after substitution of a single monomer in hypothetical libraries of linear hybrid natural products with one to eight random bonds

Similar articles

See all similar articles

Cited by 3 articles

References

    1. Bender A, Glen RC. Molecular similarity: a key technique in molecular informatics. Org Biomol Chem. 2004;2(22):3204–3218. doi: 10.1039/b409813g. - DOI - PubMed
    1. Maggiora G, Vogt M, Stumpfe D, Bajorath J. Molecular similarity in medicinal chemistry. J Med Chem. 2014;57(8):3186–3204. doi: 10.1021/jm401411z. - DOI - PubMed
    1. Bender A, Jenkins JL, Scheiber J, Sukuru SCK, Glick M, Davies JW. How similar are similarity searching methods? A principal component analysis of molecular descriptor space. J Chem Inf Model. 2009;49(1):108–119. doi: 10.1021/ci800249s. - DOI - PubMed
    1. Cereto-Massague A, Ojeda MJ, Valls C, Mulero M, Garcia-Vallve S, Pujadas G. Molecular fingerprint similarity search in virtual screening. Methods. 2015;71:58–63. doi: 10.1016/j.ymeth.2014.08.005. - DOI - PubMed
    1. Bajusz D, Racz A, Heberger K. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J Cheminform. 2015;7:20. doi: 10.1186/s13321-015-0069-3. - DOI - PMC - PubMed

LinkOut - more resources

Feedback