Extracting semantic predications from Medline citations for pharmacogenomics

Pac Symp Biocomput. 2007;209-20.


We describe a natural language processing system (Enhanced SemRep) to identify core assertions on pharmacogenomics in Medline citations. Extracted information is represented as semantic predications covering a range of relations relevant to this domain. The specific relations addressed by the system provide greater precision than that achievable with methods that rely on entity co-occurrence. The development of Enhanced SemRep is based on the adaptation of an existing system and crucially depends on domain knowledge in the Unified Medical Language System. We provide a preliminary evaluation (55% recall and 73% precision) and discuss the potential of this system in assisting both clinical practice and scientific investigation.

Publication types

  • Research Support, N.I.H., Intramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Computational Biology
  • Cytochrome P-450 CYP2D6 / genetics
  • Cytochrome P-450 CYP2D6 / metabolism
  • Humans
  • Natural Language Processing
  • Pharmacogenetics / statistics & numerical data*
  • Semantics
  • Unified Medical Language System


  • Cytochrome P-450 CYP2D6