The ngram chief complaint classifier: a novel method of automatically creating chief complaint classifiers based on international classification of diseases groupings

J Biomed Inform. 2010 Apr;43(2):268-72. doi: 10.1016/j.jbi.2009.08.015. Epub 2009 Aug 27.

Abstract

Introduction: The ngram classifier is created by using text fragments to measure associations between chief complaints (CC) and a syndromic grouping of ICD-9-CM codes.

Objectives: For gastrointestinal (GI) syndrome to determine: (1) ngram CC classifier sensitivity/specificity. (2) Daily volumes for ngram CC and ICD-9-CM classifiers.

Design: Retrospective cohort.

Setting: 19 Emergency Departments.

Participants: Consecutive visits (1/1/2000-12/31/2005).

Protocol: (1) Used an existing ICD-9-CM filter for "lower GI" to create the ngram CC classifier from a training set and then measured sensitivity/specificity in a test set using an ICD-9-CM classifier as criterion. (2) Compare daily volumes based on ICD-9-CM with that predicted by the ngram classifier.

Results: For a specificity of 0.96, sensitivity was 0.70. The daily volume correlation for ngram vs. ICD-9-CM was R=0.92.

Conclusion: The ngram CC classifier performed similarly to manually developed CC classifiers and has advantages of rapid automated creation and updating, and may be used independent of language or dialect.

MeSH terms

  • Cohort Studies
  • Diagnosis
  • Disease Outbreaks / prevention & control
  • Disease Outbreaks / statistics & numerical data*
  • Emergency Service, Hospital
  • Epidemiologic Methods*
  • Gastrointestinal Diseases
  • Humans
  • Medical Informatics / methods*
  • Natural Language Processing*
  • Population Surveillance / methods*
  • Retrospective Studies
  • Sensitivity and Specificity