Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Dec 31;8(12):e84028.
doi: 10.1371/journal.pone.0084028. eCollection 2013.

MIDDAS-M: Motif-Independent De Novo Detection of Secondary Metabolite Gene Clusters Through the Integration of Genome Sequencing and Transcriptome Data

Free PMC article

MIDDAS-M: Motif-Independent De Novo Detection of Secondary Metabolite Gene Clusters Through the Integration of Genome Sequencing and Transcriptome Data

Myco Umemura et al. PLoS One. .
Free PMC article


Many bioactive natural products are produced as "secondary metabolites" by plants, bacteria, and fungi. During the middle of the 20th century, several secondary metabolites from fungi revolutionized the pharmaceutical industry, for example, penicillin, lovastatin, and cyclosporine. They are generally biosynthesized by enzymes encoded by clusters of coordinately regulated genes, and several motif-based methods have been developed to detect secondary metabolite biosynthetic (SMB) gene clusters using the sequence information of typical SMB core genes such as polyketide synthases (PKS) and non-ribosomal peptide synthetases (NRPS). However, no detection method exists for SMB gene clusters that are functional and do not include core SMB genes at present. To advance the exploration of SMB gene clusters, especially those without known core genes, we developed MIDDAS-M, a motif-independent de novodetection algorithm for SMB gene clusters. We integrated virtual gene cluster generation in an annotated genome sequence with highly sensitive scoring of the cooperative transcriptional regulation of cluster member genes. MIDDAS-M accurately predicted 38 SMB gene clusters that have been experimentally confirmed and/or predicted by other motif-based methods in 3 fungal strains. MIDDAS-M further identified a new SMB gene cluster for ustiloxin B, which was experimentally validated. Sequence analysis of the cluster genes indicated a novel mechanism for peptide biosynthesis independent of NRPS. Because it is fully computational and independent of empirical knowledge about SMB core genes, MIDDAS-M allows a large-scale, comprehensive analysis of SMB gene clusters, including those with novel biosynthetic mechanisms that do not contain any functionally characterized genes.

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.


Figure 1
Figure 1. Principle of the MIDDAS-M algorithm.
(A) Virtual cluster (VC) generation for SMB gene cluster detection. Gene clusters on a genome are evaluated comprehensively by a moving window with a specific cluster size; the cluster size can be changed from 3 to 30 or another appropriate size. (B) Schematic representation of MIDDAS-M. Candidate SMB gene clusters show large deviations from the standard deviation after summing the induction ratios of member genes and statistical enhancement. (C) Flow chart of the MIDDAS-M algorithm.
Figure 2
Figure 2. Behavior and performance of MIDDAS-M in A. oryzae.
(A) Histograms of M scores at ncl = 1, 3, 5, 7, and 10 in the transcriptomes at 7 vs. 4 days of cultivation in kojic acid (KA)-production medium. The symmetry broke at a cluster size of 3 because of the emergence of large M scores due to the induction of the KA cluster genes. Arrows at the termini of the x-axis indicate the smallest and the largest values. (B) Emergence of a ωmax peak by MIDDAS-M from the raw induction ratio. The x-axis designates relative position of the genes on the A. oryzae RIB40 genome when eight chromosomes are concatenated into one. The y-axis scales are the same for all three datasets in the same raw. The ωmax peak indicated by the red arrow corresponds exactly to the three genes responsible for KA production.
Figure 3
Figure 3. Clear detection of known SMB gene clusters in F. verticillioides by MIDDAS-M.
(A) Expression levels of each gene on the F. verticillioides genome in 4 samples of a transcriptome time series at 24, 48, 72, 96 h in liquid fumonisin-inducing media. The highest value of the 4 expression levels was plotted for each gene. (B) Absolute maximum cluster scores (|ωmax|) by the comprehensive pair-wise calculation (4C2) for each gene detected from the same transcriptome data as A. The step line plot in gray denotes the individual chromosomes. The peaks designated by a through e correspond to the 5 experimentally validated SMB clusters: a, fumonisin; b, perithecium pigment; c, fusaric acid; d, bikaverin; e, fusarin. Two peaks to which any known gene clusters do not correspond were designated as y1 and y2.
Figure 4
Figure 4. SMB gene cluster detection by MIDDAS-M in A. flavus.
(A) A 3D view of the ωmax scores for all genes and combinations of culture conditions. Comprehensive detection of SMB gene clusters was performed on all 378 pairwise combinations of culture conditions from 28 transcriptomes. The gray and green areas denote blocks of synteny and non-synteny, respectively, with the A. nidulans genome. The positions of gene clusters possessing PKS and NRPS core genes predicted by SMURF are shown in orange and blue, respectively. The chemical structures of four A. flavus secondary metabolites are shown at the positions of corresponding SMB gene clusters; the ustiloxin B gene cluster was first identified in this paper. (B) Magnified view of the area on chromosome 2 corresponding to the black square in A. As an example, a yellow circle designates the peak observed specifically at particular positions, from which conditions for producing the corresponding compound were determined.
Figure 5
Figure 5. Frequency of SMB-related genes in clusters detected by MIDDAS-M.
(A) Ratios of SMB-related genes (Q-genes) detected by KOG analysis with the cluster genes detected by MIDDAS-M (hatched bars) and all the genes in the corresponding genome (gray bars). (B) The proportion of clusters containing genes annotated as P450 enzymes (pink), C6 transcription factors (blue), and major facilitator superfamily members (green) were calculated for detected clusters with the threshold score of ωmax in A. flavus. The value is plotted to a ωmax of 18,350, at which 10 clusters remain to be detected.
Figure 6
Figure 6. Identification of the ustiloxin B cluster in A. flavus based on the MIDDAS-M prediction.
(A) MIDDAS-M results from a combination of culture conditions in maize at 28°C versus 37°C. The leftmost distinct peak corresponds to the aflatoxin gene cluster. The other two peaks were designated as clusters a and b. The step line plot in gray denotes the chromosomes. (B) Peaks at a retention time of 8.9 min detected in the extracted ion chromatograms of m/z 644.2±0.1 in negative ion mode were not observed in the A. flavus deletion mutants of the genes in cluster a (red). Chromatograms are for medium only (blue, negative control), the control strain (pyrG revertant, black), the aflatoxin cluster deletion mutant, and three mutants with deletions in cluster b (gray). (C) The mass spectra at of the 8.9 min retention peaks in the control strain (above) and the deletion mutant ΔAF_a (below). The MS peak of m/z 644.2 in the control strain was not present in the deletion mutant. (D) Comparison of the mass spectra for ustiloxin B and the compound with m/z 644.2 (in negative ion mode) isolated from the control strain. (E) Comparison of the chromatograms of the ustiloxin B reference standard and the compound isolated in this study. The extracted ion chromatogram of m/z 644.23 in negative ion mode and UV chromatograms at 290, 254, and 220 nm are indicated.

Similar articles

See all similar articles

Cited by 20 articles

See all "Cited by" articles


    1. Yu J, Bhatnagar D, Ehrlich KC (2002) Aflatoxin biosynthesis. Rev Iberoam Micol 19: 191–200. - PubMed
    1. Rank C, Larsen TO, Frisvad JC (2010) Functional systems biology of Aspergillus. In: Machida M, Gomi K, editors. Aspergillus Molecular Biology and Genomics. Norfolk, UK: Caister Academic Press. pp. 173–198.
    1. Khaldi N, Seifuddin FT, Turner G, Haft D, Nierman WC, et al. (2010) SMURF: Genomic mapping of fungal secondary metabolite clusters. Fungal Genet Biol 47: 736–741. - PMC - PubMed
    1. Medema MH, Blin K, Cimermancic P, de Jager V, Zakrzewski P, et al. (2011) antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids Res 39: W339–346. - PMC - PubMed
    1. Blin K, Medema MH, Kazempour D, Fischbach MA, Breitling R, et al. (2013) antiSMASH 2.0 − a versatile platform for genome mining of secondary metabolite producers. Nucleic Acids Res 41: W204–W212. - PMC - PubMed

Publication types

Grant support

This work was partly supported by the commission for Development of Artificial Gene Synthesis Technology for Creating Innovative Biomaterial from the Ministry of Economy, Trade and Industry (METI), Japan ( The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. No additional external funding was received for this study.