Multiplexed metagenome mining using short DNA sequence tags facilitates targeted discovery of epoxyketone proteasome inhibitors

Proc Natl Acad Sci U S A. 2015 Apr 7;112(14):4221-6. doi: 10.1073/pnas.1501124112. Epub 2015 Mar 23.


In molecular evolutionary analyses, short DNA sequences are used to infer phylogenetic relationships among species. Here we apply this principle to the study of bacterial biosynthesis, enabling the targeted isolation of previously unidentified natural products directly from complex metagenomes. Our approach uses short natural product sequence tags derived from conserved biosynthetic motifs to profile biosynthetic diversity in the environment and then guide the recovery of gene clusters from metagenomic libraries. The methodology is conceptually simple, requires only a small investment in sequencing, and is not computationally demanding. To demonstrate the power of this approach to natural product discovery we conducted a computational search for epoxyketone proteasome inhibitors within 185 globally distributed soil metagenomes. This led to the identification of 99 unique epoxyketone sequence tags, falling into 6 phylogenetically distinct clades. Complete gene clusters associated with nine unique tags were recovered from four saturating soil metagenomic libraries. Using heterologous expression methodologies, seven potent epoxyketone proteasome inhibitors (clarepoxcins A-E and landepoxcins A and B) were produced from these pathways, including compounds with different warhead structures and a naturally occurring halohydrin prodrug. This study provides a template for the targeted expansion of bacterially derived natural products using the global metagenome.

Keywords: drug discovery; environmental DNA; nonribosomal peptide; polyketide; proteasome inhibitor.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods*
  • DNA / chemistry
  • Drug Design
  • Drug Discovery
  • Genetic Variation
  • Genome
  • Genome, Bacterial
  • Geography
  • Ketones / chemistry*
  • Magnetic Resonance Spectroscopy
  • Metagenome
  • Metagenomics
  • Molecular Sequence Data
  • Multigene Family
  • Peptides / chemistry
  • Phylogeny
  • Polyketides / chemistry
  • Proteasome Endopeptidase Complex / chemistry
  • Proteasome Inhibitors / chemistry*
  • Software
  • Soil Microbiology*


  • Ketones
  • Peptides
  • Polyketides
  • Proteasome Inhibitors
  • DNA
  • Proteasome Endopeptidase Complex

Associated data

  • BioProject/PRJNA258222
  • GENBANK/KP830089
  • GENBANK/KP830090
  • GENBANK/KP830091
  • GENBANK/KP830092
  • GENBANK/KP830093
  • GENBANK/KP830094
  • GENBANK/KP830095
  • GENBANK/KP830096
  • GENBANK/KP830097