Bioinformatics tools for the identification of gene clusters that biosynthesize specialized metabolites

Brief Bioinform. 2018 Sep 28;19(5):1022-1034. doi: 10.1093/bib/bbx020.


Specialized metabolites (also called natural products or secondary metabolites) derived from bacteria, fungi, marine organisms and plants constitute an important source of antibiotics, anti-cancer agents, insecticides, immunosuppressants and herbicides. Many specialized metabolites in bacteria and fungi are biosynthesized via metabolic pathways whose enzymes are encoded by clustered genes on a chromosome. Metabolic gene clusters comprise a group of physically co-localized genes that together encode enzymes for the biosynthesis of a specific metabolite. Although metabolic gene clusters are generally not known to occur outside of microbes, several plant metabolic gene clusters have been discovered in recent years. The discovery of novel metabolic pathways is being enabled by the increasing availability of high-quality genome sequencing coupled with the development of powerful computational toolkits to identify metabolic gene clusters. To provide a comprehensive overview of various bioinformatics methods for detecting gene clusters, we compare and contrast key aspects of algorithmic logic behind several computational tools, including 'NP.searcher', 'ClustScan', 'CLUSEAN', 'antiSMASH', 'SMURF', 'MIDDAS-M', 'ClusterFinder', 'CASSIS/SMIPS' and 'C-Hunter' among others. We also review additional tools such as 'NRPSpredictor' and 'SBSPKS' that can infer substrate specificity for previously identified gene clusters. The continual development of bioinformatics methods to predict gene clusters will help shed light on how organisms assemble multi-step metabolic pathways for adaptation to various ecological niches.

Publication types

  • Research Support, N.I.H., Extramural
  • Review

MeSH terms

  • Algorithms
  • Animals
  • Bacteria / genetics
  • Bacteria / metabolism
  • Biosynthetic Pathways / genetics*
  • Computational Biology / methods*
  • Fungi / genetics
  • Fungi / metabolism
  • Humans
  • Models, Genetic
  • Multigene Family*
  • Plants / genetics
  • Plants / metabolism
  • Software