Assignment of EC numbers to enzymatic reactions with MOLMAP reaction descriptors and random forests

J Chem Inf Model. 2009 Jul;49(7):1839-46. doi: 10.1021/ci900104b.

Abstract

The MOLMAP descriptor relies on a Kohonen SOM that defines types of covalent bonds on the basis of their physicochemical and topological properties. The MOLMAP descriptor of a molecule represents the types of bonds available in that molecule. The MOLMAP descriptor of a reaction is defined as the difference between the MOLMAPs of the products and the reactants and numerically encodes the pattern of changes in bonds during a chemical reaction. In this study, a genome-scale data set of enzymatic reactions available in the KEGG database was encoded by the MOLMAP descriptors and was explored for the assignment of the official EC number from the reaction equation with Random Forests as the machine learning algorithm. EC numbers were correctly assigned in 95%, 90%, and 85% (for independent test sets) at the class, subclass, and subsubclass EC number level, respectively, with training sets including one reaction from each available full EC number. Increasing differences between training and test sets were explored, leading to decreased percentages of correct assignments. The classification of reactions only from the main reactants and products was obtained at the class, subclass, and subsubclass level with accuracies of 78%, 74%, and 63%, respectively.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence*
  • Biocatalysis
  • Databases, Factual
  • Enzymes / metabolism*
  • Models, Biological

Substances

  • Enzymes