Automatic inference of ICD-10 codes from German ophthalmologic physicians' letters using natural language processing

D Böhringer; P Angelova; L Fuhrmann; J Zimmermann; M Schargus; N Eter; T Reinhard

doi:10.1038/s41598-024-59926-3

Automatic inference of ICD-10 codes from German ophthalmologic physicians' letters using natural language processing

Sci Rep. 2024 Apr 19;14(1):9035. doi: 10.1038/s41598-024-59926-3.

Authors

D Böhringer¹, P Angelova², L Fuhrmann³, J Zimmermann⁴, M Schargus³, N Eter⁴, T Reinhard²

Affiliations

¹ Eye Center of the University Hospital Freiburg, Medical Faculty of the Albert-Ludwigs-University Freiburg, Freiburg, Germany. daniel.boehringer@uniklinik-freiburg.de.
² Eye Center of the University Hospital Freiburg, Medical Faculty of the Albert-Ludwigs-University Freiburg, Freiburg, Germany.
³ Department of Ophthalmology, Asklepios Hospital Nord-Heidberg, Hamburg, Germany.
⁴ Department of Ophthalmology, Medical Center, University of Münster, Münster, Germany.

Abstract

Physicians' letters are the optimal source of diagnoses for registries. However, most registries demand for diagnosis codes such as ICD-10. We herein describe an algorithm that infers ICD-10 codes from German ophthalmologic physicians' letters. We assess the method in three German eye hospitals. Our algorithm is based on the nearest-neighbor method as well as on a large thesaurus for ICD-10 codes. This thesaurus was embedded into a Word2Vec space created from anonymized physicians' reports of the first hospital. For evaluation, each of the three hospitals sent all diagnoses taken from 100 letters. The inferred ICD-10 codes were evaluated for correctness by the senders. A total of 3332 natural language terms had been sent in (812 hospital one, 1473 hospital two, 1047 hospital three). A total of 526 non-diagnoses were excluded upfront. 2806 ICD-10 codes were inferred (771 hospital one, 1226 hospital two, 809 hospital three). In the first hospital, 98% were fully correct and 99% correct at the level of the superordinate disease concept. The percentages in hospital two were 69% and 86%. The respective numbers for hospital three were 69% and 91%. Our simple method is capable of inferring ICD-10 codes for German natural language diagnoses, especially when the embedding space has been built with physicians' letters from the same hospital. The method may yield sufficient accuracy for many tasks in the multi-centric setting and can easily be adapted to other languages/specialities.

Keywords: Artificial intelligence; Clinical registries; Diagnosis coding; Natural language processing.

MeSH terms

Hospitals
Humans
International Classification of Diseases*
Natural Language Processing
Physicians*
Registries