Enzyme specificity prediction using cross-attention graph neural networks

Nature. 2025 Nov;647(8090):639-647. doi: 10.1038/s41586-025-09697-2. Epub 2025 Oct 8.

Abstract

Enzymes are the molecular machines of life, and a key property that governs their function is substrate specificity-the ability of an enzyme to recognize and selectively act on particular substrates. This specificity originates from the three-dimensional (3D) structure of the enzyme active site and complicated transition state of the reaction1,2. Many enzymes can promiscuously catalyse reactions or act on substrates beyond those for which they were originally evolved1,3-5. However, millions of known enzymes still lack reliable substrate specificity information, impeding their practical applications and comprehensive understanding of the biocatalytic diversity in nature. Here we developed a cross-attention-empowered SE(3)-equivariant graph neural network architecture named EZSpecificity for predicting enzyme substrate specificity, which was trained on a comprehensive, tailor-made database of enzyme-substrate interactions at sequence and structural levels. EZSpecificity outperformed the existing machine learning models for enzyme substrate specificity prediction, as demonstrated by both an unknown substrate and enzyme database and seven proof-of-concept protein families. Experimental validation with eight halogenases and 78 substrates showed that EZSpecificity achieved a 91.7% accuracy in identifying the single potential reactive substrate, significantly higher than that of the state-of-the-art model enzyme substrate prediction (58.3%). EZSpecificity represents a general machine learning model for the accurate prediction of substrate specificity for enzymes related to fundamental and applied research in biology and medicine.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Biocatalysis
  • Catalytic Domain
  • Databases, Protein
  • Enzymes* / chemistry
  • Enzymes* / metabolism
  • Graph Neural Networks*
  • Machine Learning
  • Models, Molecular
  • Reproducibility of Results
  • Substrate Specificity

Substances

  • Enzymes