Automated coding of diagnoses--three methods compared

Proc AMIA Symp. 2000:250-4.

Abstract

In Germany, new legal requirements have raised the importance of the accurate encoding of admission and discharge diseases for in- and outpatients. In response to emerging needs for computer-supported tools we examined three methods for automated coding of German-language free-text diagnosis phrases. We compared a language-independent lexicon-free n-gram approach with one which uses a dictionary of medical morphemes and refines the query by a mapping to SNOMED codes. Both techniques produced a ranked output of possible diagnoses within a vector space framework for retrieval. The results did not reveal any significant difference: The correct diagnosis was found in approximately 40% for three-digit codes, and 30% for four-digit codes. The lexicon-based method was then modified by substituting the vector space ranking by a heuristic approach that capitalizes on the semantic structure of SNOMED, thus raising the number of correct diagnoses significantly (approximately 50% for three-digit codes, and 40% for four-digit codes). As a result, we claim that lexicon-based retrieval methods do not perform better than the lexicon-free ones, unless conceptual knowledge is added.

Publication types

  • Comparative Study

MeSH terms

  • Abstracting and Indexing / methods*
  • Algorithms
  • Disease / classification*
  • Electronic Data Processing*
  • Germany
  • Humans
  • Information Storage and Retrieval / methods*
  • Vocabulary, Controlled