Background: A recent comparison showed the extensive similarities between the structural properties of metabolites in the reconstructed human metabolic network ("endogenites") and those of successful, marketed drugs ("drugs").
Results: Clustering indicated the related but differential population of chemical space by endogenites and drugs. Differences between the drug-endogenite similarities resulting from various encodings and judged by Tanimoto similarity could be related simply to the fraction of the bitstrings set to 1. By extracting drug/endogenite substructures, we develop a novel family of fingerprints, the Drug Endogenite Substructure (DES) encodings, based on the ranked frequency of the various substructures. These provide a natural assessment of drug-endogenite likeness, and may be used as descriptors with which to derive quantitative structure-activity relationships (QSARs).
Conclusions: "Drug-endogenite likeness" seems to have utility, and leads to a simple, novel and interpretable substructure-based molecular encoding for cheminformatics.
Keywords: cheminformatics; drug transporters; encodings; endogenites; metabolomics.