Exploring Chemical Information in PubChem

Curr Protoc. 2021 Aug;1(8):e217. doi: 10.1002/cpz1.217.

Abstract

PubChem (https://pubchem.ncbi.nlm.nih.gov) is a public chemical database that serves scientific communities as well as the general public. This database collects chemical information from hundreds of data sources and organizes them into multiple data collections, including Substance, Compound, BioAssay, Protein, Gene, Pathway, and Patent. These collections are interlinked with each other, allowing users to discover related records in the various collections (e.g., drugs targeting a protein or genes modulated by a chemical). PubChem can be searched by keyword (e.g., a chemical, protein, or gene name) as well as by chemical structure. The input structure can be provided using popular line notations or drawn with the PubChem Sketcher. PubChem supports various types of structure searches, including identity search, 2-D and 3-D similarity searches, and substructure and superstructure searches. Results from multiple searches can be combined using Boolean operators (i.e., AND, OR, and NOT) to formulate complex queries. PubChem allows the user to quickly retrieve a list of records annotated with a particular classification or ontological term. This paper provides step-by-step instructions on how to explore PubChem data with examples of commonly requested tasks. © 2021. This article is a U.S. Government work and is in the public domain in the USA. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Finding genes and proteins that interact with a given compound Basic Protocol 2: Finding drug-like compounds similar to a query compound through a two-dimensional (2-D) similarity search Basic Protocol 3: Finding compounds similar to a query compound through a three-dimensional (3-D) similarity search Support Protocol: Computing similarity scores between compounds Basic Protocol 4: Getting the bioactivity data for the hit compounds from substructure search Basic Protocol 5: Finding drugs that target a particular gene Basic Protocol 6: Getting bioactivity data of all chemicals tested against a protein. Basic Protocol 7: Finding compounds annotated with classifications or ontological terms Basic Protocol 8: Finding stereoisomers and isotopomers of a compound through identity search.

Keywords: PubChem; chemical structure search; cheminformatics; drug discovery; molecular similarity; public database.

MeSH terms

  • Biological Assay
  • Databases, Chemical*
  • Databases, Factual
  • Information Storage and Retrieval*
  • Proteins
  • United States

Substances

  • Proteins