High accuracy information extraction of medication information from clinical notes: 2009 i2b2 medication extraction challenge

J Am Med Inform Assoc. 2010 Sep-Oct;17(5):524-7. doi: 10.1136/jamia.2010.003939.


Objective: Medication information comprises a most valuable source of data in clinical records. This paper describes use of a cascade of machine learners that automatically extract medication information from clinical records.

Design: Authors developed a novel supervised learning model that incorporates two machine learning algorithms and several rule-based engines.

Measurements: Evaluation of each step included precision, recall and F-measure metrics. The final outputs of the system were scored using the i2b2 workshop evaluation metrics, including strict and relaxed matching with a gold standard.

Results: Evaluation results showed greater than 90% accuracy on five out of seven entities in the name entity recognition task, and an F-measure greater than 95% on the relationship classification task. The strict micro averaged F-measure for the system output achieved best submitted performance of the competition, at 85.65%.

Limitations: Clinical staff will only use practical processing systems if they have confidence in their reliability. Authors estimate that an acceptable accuracy for a such a working system should be approximately 95%. This leaves a significant performance gap of 5 to 10% from the current processing capabilities.

Conclusion: A multistage method with mixed computational strategies using a combination of rule-based classifiers and statistical classifiers seems to provide a near-optimal strategy for automated extraction of medication information from clinical records.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Artificial Intelligence
  • Electronic Health Records*
  • Humans
  • Information Storage and Retrieval / methods*
  • Natural Language Processing*
  • Pharmaceutical Preparations*


  • Pharmaceutical Preparations