MIDDAS-M: motif-independent de novo detection of secondary metabolite gene clusters through the integration of genome sequencing and transcriptome data

PLoS One. 2013 Dec 31;8(12):e84028. doi: 10.1371/journal.pone.0084028. eCollection 2013.

Abstract

Many bioactive natural products are produced as "secondary metabolites" by plants, bacteria, and fungi. During the middle of the 20th century, several secondary metabolites from fungi revolutionized the pharmaceutical industry, for example, penicillin, lovastatin, and cyclosporine. They are generally biosynthesized by enzymes encoded by clusters of coordinately regulated genes, and several motif-based methods have been developed to detect secondary metabolite biosynthetic (SMB) gene clusters using the sequence information of typical SMB core genes such as polyketide synthases (PKS) and non-ribosomal peptide synthetases (NRPS). However, no detection method exists for SMB gene clusters that are functional and do not include core SMB genes at present. To advance the exploration of SMB gene clusters, especially those without known core genes, we developed MIDDAS-M, a motif-independent de novodetection algorithm for SMB gene clusters. We integrated virtual gene cluster generation in an annotated genome sequence with highly sensitive scoring of the cooperative transcriptional regulation of cluster member genes. MIDDAS-M accurately predicted 38 SMB gene clusters that have been experimentally confirmed and/or predicted by other motif-based methods in 3 fungal strains. MIDDAS-M further identified a new SMB gene cluster for ustiloxin B, which was experimentally validated. Sequence analysis of the cluster genes indicated a novel mechanism for peptide biosynthesis independent of NRPS. Because it is fully computational and independent of empirical knowledge about SMB core genes, MIDDAS-M allows a large-scale, comprehensive analysis of SMB gene clusters, including those with novel biosynthetic mechanisms that do not contain any functionally characterized genes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Biomarkers / metabolism
  • Fungi / genetics*
  • Gene Expression Profiling*
  • Genome, Fungal*
  • Multigene Family*
  • Nucleotide Motifs / genetics*
  • Oligonucleotide Array Sequence Analysis
  • Peptide Synthases / genetics
  • Polyketide Synthases / genetics
  • RNA, Messenger / genetics
  • Real-Time Polymerase Chain Reaction
  • Reverse Transcriptase Polymerase Chain Reaction
  • Software*

Substances

  • Biomarkers
  • RNA, Messenger
  • Polyketide Synthases
  • Peptide Synthases
  • non-ribosomal peptide synthase

Grant support

This work was partly supported by the commission for Development of Artificial Gene Synthesis Technology for Creating Innovative Biomaterial from the Ministry of Economy, Trade and Industry (METI), Japan (http://www.meti.go.jp/information/data/c120522aj.html). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. No additional external funding was received for this study.