Lancet: a high precision medication event extraction system for clinical text

J Am Med Inform Assoc. 2010 Sep-Oct;17(5):563-7. doi: 10.1136/jamia.2010.004077.

Abstract

Objective: This paper presents Lancet, a supervised machine-learning system that automatically extracts medication events consisting of medication names and information pertaining to their prescribed use (dosage, mode, frequency, duration and reason) from lists or narrative text in medical discharge summaries.

Design: Lancet incorporates three supervised machine-learning models: a conditional random fields model for tagging individual medication names and associated fields, an AdaBoost model with decision stump algorithm for determining which medication names and fields belong to a single medication event, and a support vector machines disambiguation model for identifying the context style (narrative or list).

Measurements: The authors, from the University of Wisconsin-Milwaukee, participated in the third i2b2 shared-task for challenges in natural language processing for clinical data: medication extraction challenge. With the performance metrics provided by the i2b2 challenge, the micro F1 (precision/recall) scores are reported for both the horizontal and vertical level.

Results: Among the top 10 teams, Lancet achieved the highest precision at 90.4% with an overall F1 score of 76.4% (horizontal system level with exact match), a gain of 11.2% and 12%, respectively, compared with the rule-based baseline system jMerki. By combining the two systems, the hybrid system further increased the F1 score by 3.4% from 76.4% to 79.0%.

Conclusions: Supervised machine-learning systems with minimal external knowledge resources can achieve a high precision with a competitive overall F1 score.Lancet based on this learning framework does not rely on expensive manually curated rules. The system is available online at http://code.google.com/p/lancet/.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Artificial Intelligence*
  • Electronic Health Records*
  • Humans
  • Information Storage and Retrieval / methods*
  • Natural Language Processing*