A method for systematic discovery of adverse drug events from clinical notes

J Am Med Inform Assoc. 2015 Nov;22(6):1196-204. doi: 10.1093/jamia/ocv102. Epub 2015 Jul 31.


Objective: Adverse drug events (ADEs) are undesired harmful effects resulting from use of a medication, and occur in 30% of hospitalized patients. The authors have developed a data-mining method for systematic, automated detection of ADEs from electronic medical records.

Materials and methods: This method uses the text from 9.5 million clinical notes, along with prior knowledge of drug usages and known ADEs, as inputs. These inputs are further processed into statistics used by a discriminative classifier which outputs the probability that a given drug-disorder pair represents a valid ADE association. Putative ADEs identified by the classifier are further filtered for positive support in 2 independent, complementary data sources. The authors evaluate this method by assessing support for the predictions in other curated data sources, including a manually curated, time-indexed reference standard of label change events.

Results: This method uses a classifier that achieves an area under the curve of 0.94 on a held out test set. The classifier is used on 2,362,950 possible drug-disorder pairs comprised of 1602 unique drugs and 1475 unique disorders for which we had data, resulting in 240 high-confidence, well-supported drug-AE associations. Eighty-seven of them (36%) are supported in at least one of the resources that have information that was not available to the classifier.

Conclusion: This method demonstrates the feasibility of systematic post-marketing surveillance for ADEs using electronic medical records, a key component of the learning healthcare system.

Keywords: EMR mining; machine learning; pharmacovigilance; post market drug safety surveillance.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural
  • Validation Study

MeSH terms

  • Data Mining / methods*
  • Drug-Related Side Effects and Adverse Reactions / classification*
  • Drug-Related Side Effects and Adverse Reactions / diagnosis
  • Electronic Health Records*
  • Humans
  • Machine Learning
  • Product Surveillance, Postmarketing / methods*