A comparison of classification algorithms to automatically identify chest X-ray reports that support pneumonia

J Biomed Inform. 2001 Feb;34(1):4-14. doi: 10.1006/jbin.2001.1000.


We compared the performance of expert-crafted rules, a Bayesian network, and a decision tree at automatically identifying chest X-ray reports that support acute bacterial pneumonia. We randomly selected 292 chest X-ray reports, 75 (25%) of which were from patients with a hospital discharge diagnosis of bacterial pneumonia. The reports were encoded by our natural language processor and then manually corrected for mistakes. The encoded observations were analyzed by three expert systems to determine whether the reports supported pneumonia. The reference standard for radiologic support of pneumonia was the majority vote of three physicians. We compared (a) the performance of the expert systems against each other and (b) the performance of the expert systems against that of four physicians who were not part of the gold standard. Output from the expert systems and the physicians was transformed so that comparisons could be made with both binary and probabilistic output. Metrics of comparison for binary output were sensitivity (sens), precision (prec), and specificity (spec). The metric of comparison for probabilistic output was the area under the receiver operator characteristic (ROC) curve. We used McNemar's test to determine statistical significance for binary output and univariate z-tests for probabilistic output. Measures of performance of the expert systems for binary (probabilistic) output were as follows: Rules--sens, 0.92; prec, 0.80; spec, 0.86 (Az, 0.960); Bayesian network--sens, 0.90; prec, 0.72; spec, 0.78 (Az, 0.945); decision tree--sens, 0.86; prec, 0.85; spec, 0.91 (Az, 0.940). Comparisons of the expert systems against each other using binary output showed a significant difference between the rules and the Bayesian network and between the decision tree and the Bayesian network. Comparisons of expert systems using probabilistic output showed no significant differences. Comparisons of binary output against physicians showed differences between the Bayesian network and two physicians. Comparisons of probabilistic output against physicians showed a difference between the decision tree and one physician. The expert systems performed similarly for the probabilistic output but differed in measures of sensitivity, precision, and specificity produced by the binary output. All three expert systems performed similarly to physicians.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Acute Disease
  • Algorithms*
  • Bayes Theorem
  • Classification
  • Decision Trees
  • Diagnosis, Computer-Assisted*
  • Expert Systems
  • Humans
  • Natural Language Processing
  • Pneumonia, Bacterial / diagnosis*
  • Pneumonia, Bacterial / diagnostic imaging*
  • Radiography, Thoracic