Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr 9;116(15):7298-7307.
doi: 10.1073/pnas.1818877116. Epub 2019 Mar 25.

Enzyme annotation for orphan and novel reactions using knowledge of substrate reactive sites

Affiliations

Enzyme annotation for orphan and novel reactions using knowledge of substrate reactive sites

Noushin Hadadi et al. Proc Natl Acad Sci U S A. .

Abstract

Thousands of biochemical reactions with characterized activities are "orphan," meaning they cannot be assigned to a specific enzyme, leaving gaps in metabolic pathways. Novel reactions predicted by pathway-generation tools also lack associated sequences, limiting protein engineering applications. Associating orphan and novel reactions with known biochemistry and suggesting enzymes to catalyze them is a daunting problem. We propose the method BridgIT to identify candidate genes and catalyzing proteins for these reactions. This method introduces information about the enzyme binding pocket into reaction-similarity comparisons. BridgIT assesses the similarity of two reactions, one orphan and one well-characterized nonorphan reaction, using their substrate reactive sites, their surrounding structures, and the structures of the generated products to suggest enzymes that catalyze the most-similar nonorphan reactions as candidates for also catalyzing the orphan ones. We performed two large-scale validation studies to test BridgIT predictions against experimental biochemical evidence. For the 234 orphan reactions from the Kyoto Encyclopedia of Genes and Genomes (KEGG) 2011 (a comprehensive enzymatic-reaction database) that became nonorphan in KEGG 2018, BridgIT predicted the exact or a highly related enzyme for 211 of them. Moreover, for 334 of 379 novel reactions in 2014 that were later cataloged in KEGG 2018, BridgIT predicted the exact or highly similar enzymes. BridgIT requires knowledge about only four connecting bonds around the atoms of the reactive sites to correctly annotate proteins for 93% of analyzed enzymatic reactions. Increasing to seven connecting bonds allowed for the accurate identification of a sequence for nearly all known enzymatic reactions.

Keywords: novel (de novo) reactions; orphan reactions; reaction similarity; reactive site recognition; sequence similarity.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Main steps of the BridgIT workflow: step 1, reactive site recognition for an input reaction (de novo or orphan); step 2: reaction fingerprint construction; step 3, reaction similarity evaluation; and step 4, sorting, ranking, and gene assignment. Steps 1.a through 1.c illustrate the procedure of the identification of reactive sites for the orphan reaction R02763. Step 1.a: two candidate reactive sites of 3-carboxy-2-hydroxymuconate semialdehyde (substrate A) that were recognized by the rules 4.1.1. (green) and 1.13.11 (red). Step 1.b: both rules recognized the connectivity of atoms within two candidate reactive sites. Step 1.c, only reaction rule 4.1.1. can explain the transformation of substrate A to products. Step 2.a shows the fragmentation of reaction compounds, whereas step 2.b illustrates the mathematical representations of the corresponding BridgIT reaction fingerprints.
Fig. 2.
Fig. 2.
Comparison of the results obtained with the BridgIT and standard fingerprint on two example KEGG reactions. (A) The input reaction R00722 (Left) and the most-similar reactions (Right) identified with the BridgIT and standard fingerprints. Note that the standard fingerprinting method failed to find a similar reaction to R00722 due to cancellations inside all fingerprint description layers. (B) The input reaction R00691 (Left) and the most-similar reactions (Right) identified with the BridgIT and standard fingerprints.
Fig. 3.
Fig. 3.
A multienzyme reaction such as R00217 can be catalyzed by more than one enzyme. BridgIT identified two distinct fingerprints for this reaction that correspond to two reactive sites of oxaloacetate (A). The reactive site recognized by the 1.1.1. rule (B) is more specific (blue substructure) than the one recognized by the 4.1.1. rule (C) (green substructure).
Fig. 4.
Fig. 4.
Multifunctional enzymes can catalyze reactions with two different reactive sites. R03539 (A) and R03208 (B) are catalyzed by the same enzyme, 1.11.1.8. However, the reactive sites of these substrates are completely different.
Fig. 5.
Fig. 5.
Details of the BridgIT verification procedure that was performed on ATLAS reaction rat132341, which was novel in KEGG 2014 and later experimentally identified and cataloged in KEGG 2018—that is, it became a nonorphan reaction (R10392). (A) rat132341 catalyzes the conversion of (R)-(homo)2-citrate to cis-(homo)2-aconitate. (B) Using the biochemical knowledge of KEGG 2014, BridgIT predicts the KEGG reaction R03444, which is catalyzed by a 4.2.1.114-class enzyme, as the most similar known reaction to rat132341. Remarkably, the same enzyme is later assigned to R10392 in KEGG 2018 with the corresponding biochemical confirmation. (C) The identified EC number (4.2.1.114) can be used to extract the corresponding protein sequences along with their crystal structures.

Similar articles

Cited by

References

    1. Orth JD, et al. A comprehensive genome-scale reconstruction of Escherichia coli metabolism–2011. Mol Syst Biol. 2011;7:535. - PMC - PubMed
    1. Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: New perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017;45:D353–D361. - PMC - PubMed
    1. Sorokina M, Stam M, Médigue C, Lespinet O, Vallenet D. Profiling the orphan enzymes. Biol Direct. 2014;9:10. - PMC - PubMed
    1. Shearer AG, Altman T, Rhee CD. Finding sequences for over 270 orphan enzymes. PLoS One. 2014;9:e97250. - PMC - PubMed
    1. Gao J, Ellis LBM, Wackett LP. The University of Minnesota Biocatalysis/Biodegradation Database: Improving public access. Nucleic Acids Res. 2010;38(Suppl 1):D488–D491. - PMC - PubMed