Comprehensive analysis of distinctive polyketide and nonribosomal peptide structural motifs encoded in microbial genomes

J Mol Biol. 2007 May 18;368(5):1500-17. doi: 10.1016/j.jmb.2007.02.099. Epub 2007 Mar 14.


We developed a highly accurate method to predict polyketide (PK) and nonribosomal peptide (NRP) structures encoded in microbial genomes. PKs/NRPs are polymers of carbonyl/peptidyl chains synthesized by polyketide synthases (PKS) and nonribosomal peptide synthetases (NRPS). We analyzed domain sequences corresponding to specific substrates and physical interactions between PKSs/NRPSs in order to predict which substrates (carbonyl/peptidyl units) are selected and assembled into highly ordered chemical structures. The predicted PKs/NRPs were represented as the sequences of carbonyl/peptidyl units to extract the structural motifs efficiently. We applied our method to 4529 PKSs/NRPSs and found 619 PKs/NRPs. We also collected 1449 PKs/NRPs whose chemical structures have been determined experimentally. The structural sequences were compared using the Smith-Waterman algorithm, and clustered into 271 clusters. From the compound clusters, we extracted 33 structural motifs that are significantly related with their bioactivities. We used the structural motifs to infer functions of 13 novel PKs/NRPs clusters produced by Pseudomonas spp. and Burkholderia spp. and found a putative virulence factor. The integrative analysis of genomic and chemical information given here will provide a strategy to predict the chemical structures, the biosynthetic pathways, and the biological activities of PKs/NRPs, which is useful for the rational design of novel PKs/NRPs.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Bacterial Proteins / chemistry*
  • Cluster Analysis
  • Genome
  • Macrolides / chemistry*
  • Molecular Sequence Data
  • Molecular Structure
  • Peptides / chemistry*
  • Peptides / genetics
  • Substrate Specificity


  • Bacterial Proteins
  • Macrolides
  • Peptides