Literature information in PubChem: associations between PubChem records and scientific articles

J Cheminform. 2016 Jun 10:8:32. doi: 10.1186/s13321-016-0142-6. eCollection 2016.

Abstract

Background: PubChem is an open archive consisting of a set of three primary public databases (BioAssay, Compound, and Substance). It contains information on a broad range of chemical entities, including small molecules, lipids, carbohydrates, and (chemically modified) amino acid and nucleic acid sequences (including siRNA and miRNA). Currently (as of Nov. 2015), PubChem contains more than 150 million depositor-provided chemical substance descriptions, 60 million unique chemical structures, and 225 million biological activity test results provided from over 1 million biological assay records.

Description: Many PubChem records (substances, compounds, and assays) include depositor-provided cross-references to scientific articles in PubMed. Some PubChem contributors provide bioactivity data extracted from scientific articles. Literature-derived bioactivity data complement high-throughput screening (HTS) data from the concluded NIH Molecular Libraries Program and other HTS projects. Some journals provide PubChem with information on chemicals that appear in their newly published articles, enabling concurrent publication of scientific articles in journals and associated data in public databases. In addition, PubChem links records to PubMed articles indexed with the Medical Subject Heading (MeSH) controlled vocabulary thesaurus.

Conclusion: Literature information, both provided by depositors and derived from MeSH annotations, can be accessed using PubChem's web interfaces, enabling users to explore information available in literature related to PubChem records beyond typical web search results.

Graphical abstract: Graphical abstractLiterature information for PubChem records is derived from various sources.