Extracting medication information from clinical text

J Am Med Inform Assoc. 2010 Sep-Oct;17(5):514-8. doi: 10.1136/jamia.2010.003947.


The Third i2b2 Workshop on Natural Language Processing Challenges for Clinical Records focused on the identification of medications, their dosages, modes (routes) of administration, frequencies, durations, and reasons for administration in discharge summaries. This challenge is referred to as the medication challenge. For the medication challenge, i2b2 released detailed annotation guidelines along with a set of annotated discharge summaries. Twenty teams representing 23 organizations and nine countries participated in the medication challenge. The teams produced rule-based, machine learning, and hybrid systems targeted to the task. Although rule-based systems dominated the top 10, the best performing system was a hybrid. Of all medication-related fields, durations and reasons were the most difficult for all systems to detect. While medications themselves were identified with better than 0.75 F-measure by all of the top 10 systems, the best F-measure for durations and reasons were 0.525 and 0.459, respectively. State-of-the-art natural language processing systems go a long way toward extracting medication names, dosages, modes, and frequencies. However, they are limited in recognizing duration and reason fields and would benefit from future research.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Computers, Hybrid
  • Electronic Health Records*
  • Humans
  • Information Storage and Retrieval / methods*
  • Natural Language Processing*
  • Patient Dropouts
  • Pharmaceutical Preparations*


  • Pharmaceutical Preparations