Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Oct 18;113(42):E6343-E6351.
doi: 10.1073/pnas.1609014113. Epub 2016 Oct 3.

Genomic Charting of Ribosomally Synthesized Natural Product Chemical Space Facilitates Targeted Mining

Affiliations
Free PMC article

Genomic Charting of Ribosomally Synthesized Natural Product Chemical Space Facilitates Targeted Mining

Michael A Skinnider et al. Proc Natl Acad Sci U S A. .
Free PMC article

Abstract

Microbial natural products are an evolved resource of bioactive small molecules, which form the foundation of many modern therapeutic regimes. Ribosomally synthesized and posttranslationally modified peptides (RiPPs) represent a class of natural products which have attracted extensive interest for their diverse chemical structures and potent biological activities. Genome sequencing has revealed that the vast majority of genetically encoded natural products remain unknown. Many bioinformatic resources have therefore been developed to predict the chemical structures of natural products, particularly nonribosomal peptides and polyketides, from sequence data. However, the diversity and complexity of RiPPs have challenged systematic investigation of RiPP diversity, and consequently the vast majority of genetically encoded RiPPs remain chemical "dark matter." Here, we introduce an algorithm to catalog RiPP biosynthetic gene clusters and chart genetically encoded RiPP chemical space. A global analysis of 65,421 prokaryotic genomes revealed 30,261 RiPP clusters, encoding 2,231 unique products. We further leverage the structure predictions generated by our algorithm to facilitate the genome-guided discovery of a molecule from a rare family of RiPPs. Our results provide the systematic investigation of RiPP genetic and chemical space, revealing the widespread distribution of RiPP biosynthesis throughout the prokaryotic tree of life, and provide a platform for the targeted discovery of RiPPs based on genome sequencing.

Keywords: chemical space; cheminformatics; genome mining; natural product discovery; ribosomally synthesized natural product.

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Schematic overview of a genomic structure prediction algorithm for ribosomally synthesized and posttranslationally modified natural products. A library of 154 hidden Markov models and a set of heuristics for precursor peptides enable the identification and clustering of biosynthetic genes. A library of 53 motifs is used to predict precursor peptide N- and/or C-terminal cleavage. Finally, a set of 94 virtual tailoring reactions are executed based on identified biosynthetic information to generate a combinatorial library of predicted structures. The exact masses of predicted structures can subsequently be searched within a high-resolution LC/MS chromatogram.
Fig. 2.
Fig. 2.
Validation of RiPP-PRISM predictive accuracy. (A) Difference between true and predicted N- and C-terminal leader and follower peptide cleavage sites. (B) Average median Tanimoto coefficient between predicted structure libraries and true RiPP structures for 21 families of RiPPs. Error bars show SD.
Fig. 3.
Fig. 3.
Genome mining for RiPP biosynthetic gene clusters and their unique products. (A) Biosynthetic gene clusters identified in a sample of 65,421 prokaryotic genomes, organized by RiPP family (most abundant first) and taxonomic class of producer organism. “Other” includes all classes with fewer than 100 RiPP clusters. (B) Unique products identified by Tanimoto coefficient matrix analysis, organized by RiPP family (most abundant first).
Fig. 4.
Fig. 4.
Charting the chemical space of known and genetically encoded RiPPs. (A) Principal component analysis plot of 509 known ribosomal products, organized into 18 families, with node size corresponding to number of known RiPPs and node color corresponding to within-family chemical diversity (average median Tanimoto coefficient). (B) Principal component analysis plot for genetically encoded RiPPs, with node size corresponding to number of unique predicted RiPPs and node color corresponding to average median Tanimoto coefficient across all identified clusters.
Fig. 5.
Fig. 5.
Genome-guided isolation of a member of the rare YM-216391 family of RiPPs. Hypothetical structures for a S. aurantiacus RiPP were generated by RiPP-PRISM and were used to search LC-MS/MS chromatograms. A single peak was identified corresponding to predicted structure no. 10 of 15, including MS/MS data, which demonstrated that this candidate structure was the predicted molecule, aurantizolicin.

Similar articles

See all similar articles

Cited by 32 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback