Chemical reaction vector embeddings: towards predicting drug metabolism in the human gut microbiome

Pac Symp Biocomput. 2018;23:56-67.


Bacteria in the human gut have the ability to activate, inactivate, and reactivate drugs with both intended and unintended effects. For example, the drug digoxin is reduced to the inactive metabolite dihydrodigoxin by the gut Actinobacterium E. lenta, and patients colonized with high levels of drug metabolizing strains may have limited response to the drug. Understanding the complete space of drugs that are metabolized by the human gut microbiome is critical for predicting bacteria-drug relationships and their effects on individual patient response. Discovery and validation of drug metabolism via bacterial enzymes has yielded >50 drugs after nearly a century of experimental research. However, there are limited computational tools for screening drugs for potential metabolism by the gut microbiome. We developed a pipeline for comparing and characterizing chemical transformations using continuous vector representations of molecular structure learned using unsupervised representation learning. We applied this pipeline to chemical reaction data from MetaCyc to characterize the utility of vector representations for chemical reaction transformations. After clustering molecular and reaction vectors, we performed enrichment analyses and queries to characterize the space. We detected enriched enzyme names, Gene Ontology terms, and Enzyme Consortium (EC) classes within reaction clusters. In addition, we queried reactions against drug-metabolite transformations known to be metabolized by the human gut microbiome. The top results for these known drug transformations contained similar substructure modifications to the original drug pair. This work enables high throughput screening of drugs and their resulting metabolites against chemical reactions common to gut bacteria.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Bacteria / metabolism*
  • Biotransformation
  • Cluster Analysis
  • Computational Biology / methods
  • Databases, Pharmaceutical / statistics & numerical data
  • Drug Evaluation, Preclinical / statistics & numerical data
  • Gastrointestinal Microbiome / physiology*
  • High-Throughput Screening Assays / statistics & numerical data
  • Humans
  • Pharmaceutical Preparations / chemistry
  • Pharmaceutical Preparations / metabolism*
  • Quantitative Structure-Activity Relationship
  • Stochastic Processes


  • Pharmaceutical Preparations