Automated coding of diagnoses--three methods compared

P Franz; A Zaiss; S Schulz; U Hahn; R Klar

Automated coding of diagnoses--three methods compared

Proc AMIA Symp. 2000:250-4.

Authors

P Franz¹, A Zaiss, S Schulz, U Hahn, R Klar

Affiliation

¹ Freiburg University Hospital, Department of Medical Informatics.

PMID: 11079883
PMCID: PMC2243719

Abstract

In Germany, new legal requirements have raised the importance of the accurate encoding of admission and discharge diseases for in- and outpatients. In response to emerging needs for computer-supported tools we examined three methods for automated coding of German-language free-text diagnosis phrases. We compared a language-independent lexicon-free n-gram approach with one which uses a dictionary of medical morphemes and refines the query by a mapping to SNOMED codes. Both techniques produced a ranked output of possible diagnoses within a vector space framework for retrieval. The results did not reveal any significant difference: The correct diagnosis was found in approximately 40% for three-digit codes, and 30% for four-digit codes. The lexicon-based method was then modified by substituting the vector space ranking by a heuristic approach that capitalizes on the semantic structure of SNOMED, thus raising the number of correct diagnoses significantly (approximately 50% for three-digit codes, and 40% for four-digit codes). As a result, we claim that lexicon-based retrieval methods do not perform better than the lexicon-free ones, unless conceptual knowledge is added.

Publication types

Comparative Study

MeSH terms

Abstracting and Indexing / methods*
Algorithms
Disease / classification*
Electronic Data Processing*
Germany
Humans
Information Storage and Retrieval / methods*
Vocabulary, Controlled