Supporting the classification of pathology reports: comparing two information retrieval methods

Comput Methods Programs Biomed. 2000 Jun;62(2):109-13. doi: 10.1016/s0169-2607(00)00056-0.

Abstract

In this contribution two methods from the domain of information retrieval are compared. The goal of the retrieval is to select from a library of pathology reports those ones that are most similar to a given report. The SNOMED codes that accompany these reports are presented to the pathologist who has to code the given report with the aim to improve the quality of coding. The reports were represented either as a vector of words or as a vector of N-grams. Both 4-, 5- and 6-grams were used. The similarity of the reports was determined by comparing the SNOMED terms that were added to the reports. It could be concluded that the word-based method was consistently better than the N-gram method.

Publication types

  • Comparative Study

MeSH terms

  • Databases, Factual*
  • Humans
  • Information Storage and Retrieval / methods*