Our objectives are to quickly interpret symptoms of emergency patients to identify likely syndromes and to improve population-wide disease outbreak detection. We constructed a database of 248 syndromes, each syndrome having an estimated probability of producing any of 85 symptoms, with some two-way, three-way, and five-way probabilities reflecting correlations among symptoms. Using these multi-way probabilities in conjunction with an iterative proportional fitting algorithm allows estimation of full conditional probabilities. Combining these conditional probabilities with misdiagnosis error rates and incidence rates via Bayes theorem, the probability of each syndrome is estimated. We tested a prototype of computer-aided differential diagnosis (CADDY) on simulated data and on more than 100 real cases, including West Nile Virus, Q fever, SARS, anthrax, plague, tularaemia and toxic shock cases. We conclude that: (1) it is important to determine whether the unrecorded positive status of a symptom means that the status is negative or that the status is unknown; (2) inclusion of misdiagnosis error rates produces more realistic results; (3) the naive Bayes classifier, which assumes all symptoms behave independently, is slightly outperformed by CADDY, which includes available multi-symptom information on correlations; as more information regarding symptom correlations becomes available, the advantage of CADDY over the naive Bayes classifier should increase; (4) overlooking low-probability, high-consequence events is less likely if the standard output summary is augmented with a list of rare syndromes that are consistent with observed symptoms, and (5) accumulating patient-level probabilities across a larger population can aid in biosurveillance for disease outbreaks.
c 2007 John Wiley & Sons, Ltd.