A roadmap for metagenomic enzyme discovery

Nat Prod Rep. 2021 Nov 17;38(11):1994-2023. doi: 10.1039/d1np00006c.


Covering: up to 2021Metagenomics has yielded massive amounts of sequencing data offering a glimpse into the biosynthetic potential of the uncultivated microbial majority. While genome-resolved information about microbial communities from nearly every environment on earth is now available, the ability to accurately predict biocatalytic functions directly from sequencing data remains challenging. Compared to primary metabolic pathways, enzymes involved in secondary metabolism often catalyze specialized reactions with diverse substrates, making these pathways rich resources for the discovery of new enzymology. To date, functional insights gained from studies on environmental DNA (eDNA) have largely relied on PCR- or activity-based screening of eDNA fragments cloned in fosmid or cosmid libraries. As an alternative, shotgun metagenomics holds underexplored potential for the discovery of new enzymes directly from eDNA by avoiding common biases introduced through PCR- or activity-guided functional metagenomics workflows. However, inferring new enzyme functions directly from eDNA is similar to searching for a 'needle in a haystack' without direct links between genotype and phenotype. The goal of this review is to provide a roadmap to navigate shotgun metagenomic sequencing data and identify new candidate biosynthetic enzymes. We cover both computational and experimental strategies to mine metagenomes and explore protein sequence space with a spotlight on natural product biosynthesis. Specifically, we compare in silico methods for enzyme discovery including phylogenetics, sequence similarity networks, genomic context, 3D structure-based approaches, and machine learning techniques. We also discuss various experimental strategies to test computational predictions including heterologous expression and screening. Finally, we provide an outlook for future directions in the field with an emphasis on meta-omics, single-cell genomics, cell-free expression systems, and sequence-independent methods.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Amino Acid Sequence
  • Biological Products / metabolism
  • Catalytic Domain
  • Cytochrome P-450 Enzyme System / chemistry
  • Cytochrome P-450 Enzyme System / physiology
  • Enzymes / chemistry
  • Enzymes / isolation & purification*
  • Machine Learning
  • Metagenomics / methods*
  • Microbiota
  • Phylogeny


  • Biological Products
  • Enzymes
  • Cytochrome P-450 Enzyme System