A dataset of 200 structured product labels annotated for adverse drug reactions

Sci Data. 2018 Jan 30:5:180001. doi: 10.1038/sdata.2018.1.


Adverse drug reactions (ADRs), unintended and sometimes dangerous effects that a drug may have, are one of the leading causes of morbidity and mortality during medical care. To date, there is no structured machine-readable authoritative source of known ADRs. The United States Food and Drug Administration (FDA) partnered with the National Library of Medicine to create a pilot dataset containing standardised information about known adverse reactions for 200 FDA-approved drugs. The Structured Product Labels (SPLs), the documents FDA uses to exchange information about drugs and other products, were manually annotated for adverse reactions at the mention level to facilitate development and evaluation of text mining tools for extraction of ADRs from all SPLs. The ADRs were then normalised to the Unified Medical Language System (UMLS) and to the Medical Dictionary for Regulatory Activities (MedDRA). We present the curation process and the structure of the publicly available database SPL-ADR-200db containing 5,098 distinct ADRs. The database is available at https://bionlp.nlm.nih.gov/tac2017adversereactions/; the code for preparing and validating the data is available at https://github.com/lhncbc/fda-ars.

Publication types

  • Dataset
  • Research Support, N.I.H., Intramural

MeSH terms

  • Databases, Factual
  • Drug Labeling*
  • Drug-Related Side Effects and Adverse Reactions*
  • United States
  • United States Food and Drug Administration