Enzyme annotation for orphan and novel reactions using knowledge of substrate reactive sites

Proc Natl Acad Sci U S A. 2019 Apr 9;116(15):7298-7307. doi: 10.1073/pnas.1818877116. Epub 2019 Mar 25.


Thousands of biochemical reactions with characterized activities are "orphan," meaning they cannot be assigned to a specific enzyme, leaving gaps in metabolic pathways. Novel reactions predicted by pathway-generation tools also lack associated sequences, limiting protein engineering applications. Associating orphan and novel reactions with known biochemistry and suggesting enzymes to catalyze them is a daunting problem. We propose the method BridgIT to identify candidate genes and catalyzing proteins for these reactions. This method introduces information about the enzyme binding pocket into reaction-similarity comparisons. BridgIT assesses the similarity of two reactions, one orphan and one well-characterized nonorphan reaction, using their substrate reactive sites, their surrounding structures, and the structures of the generated products to suggest enzymes that catalyze the most-similar nonorphan reactions as candidates for also catalyzing the orphan ones. We performed two large-scale validation studies to test BridgIT predictions against experimental biochemical evidence. For the 234 orphan reactions from the Kyoto Encyclopedia of Genes and Genomes (KEGG) 2011 (a comprehensive enzymatic-reaction database) that became nonorphan in KEGG 2018, BridgIT predicted the exact or a highly related enzyme for 211 of them. Moreover, for 334 of 379 novel reactions in 2014 that were later cataloged in KEGG 2018, BridgIT predicted the exact or highly similar enzymes. BridgIT requires knowledge about only four connecting bonds around the atoms of the reactive sites to correctly annotate proteins for 93% of analyzed enzymatic reactions. Increasing to seven connecting bonds allowed for the accurate identification of a sequence for nearly all known enzymatic reactions.

Keywords: novel (de novo) reactions; orphan reactions; reaction similarity; reactive site recognition; sequence similarity.

MeSH terms

  • Binding Sites
  • Databases, Protein*
  • Enzymes* / chemistry
  • Enzymes* / genetics
  • Molecular Sequence Annotation*
  • Sequence Analysis, Protein*


  • Enzymes