Visualisation of the chemical space of fragments, lead-like and drug-like molecules in PubChem

J Comput Aided Mol Des. 2011 Jul;25(7):649-62. doi: 10.1007/s10822-011-9437-x. Epub 2011 May 27.

Abstract

The 4.5 million organic molecules with up to 20 non-hydrogen atoms in PubChem were analyzed using the MQN-system, which consists in 42 integer value descriptors of molecular structure. The 42-dimensional MQN-space was visualised by principal component analysis and representation of the (PC1, PC2), (PC1, PC3) and (PC2, PC3) planes. The molecules were organized according to ring count (PC1, 38% of variance), the molecular size (PC2, 25% of variance), and the H-bond acceptor count (PC3, 12% of variance). Compounds following Lipinski's bioavailability, Oprea's lead-likeness and Congreve's fragment-likeness criteria formed separated groups in MQN-space visible in the (PC2, PC3) plane. MQN-similarity searches of the 4.5 million molecules (see the browser available at www.gdb.unibe.ch ) gave significant enrichment factors for recovering groups of fragment-sized bioactive compounds related to ten different biological targets taken from Chembl, allowing lead-hopping relationships not seen with substructure fingerprint similarity searches. The diversity of different compound series was analyzed by MQN-distance histograms.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Combinatorial Chemistry Techniques
  • Databases, Factual / classification*
  • Drug Discovery*
  • Humans
  • Informatics*
  • Ligands
  • Peptide Fragments / chemistry*
  • Pharmaceutical Preparations / chemistry*
  • Small Molecule Libraries / chemistry

Substances

  • Ligands
  • Peptide Fragments
  • Pharmaceutical Preparations
  • Small Molecule Libraries