SeMPI: a genome-based secondary metabolite prediction and identification web server

Nucleic Acids Res. 2017 Jul 3;45(W1):W64-W71. doi: 10.1093/nar/gkx289.


The secondary metabolism of bacteria, fungi and plants yields a vast number of bioactive substances. The constantly increasing amount of published genomic data provides the opportunity for an efficient identification of gene clusters by genome mining. Conversely, for many natural products with resolved structures, the encoding gene clusters have not been identified yet. Even though genome mining tools have become significantly more efficient in the identification of biosynthetic gene clusters, structural elucidation of the actual secondary metabolite is still challenging, especially due to as yet unpredictable post-modifications. Here, we introduce SeMPI, a web server providing a prediction and identification pipeline for natural products synthesized by polyketide synthases of type I modular. In order to limit the possible structures of PKS products and to include putative tailoring reactions, a structural comparison with annotated natural products was introduced. Furthermore, a benchmark was designed based on 40 gene clusters with annotated PKS products. The web server of the pipeline (SeMPI) is freely available at:

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Biological Products / chemistry*
  • Biological Products / metabolism
  • Genome
  • Genomics
  • Internet
  • Polyketide Synthases / metabolism
  • Secondary Metabolism / genetics*
  • Software*


  • Biological Products
  • Polyketide Synthases