Assessing predictive accuracy: how to compare Brier scores

J Clin Epidemiol. 1991;44(11):1141-6. doi: 10.1016/0895-4356(91)90146-z.


Several investigators have used the Brier index to measure the predictive accuracy of a set of medical judgments; the Brier scores of different raters who have evaluated the same patients provides a measure of relative accuracy. However, such comparisons may be difficult to interpret because of the lack of a statistical test for differentiating between two Brier scores. To demonstrate a method for addressing this issue we analyzed the judgments of five medical students, each of whom independently evaluated the same 25 patients with recurrent chest pain. Using the method we determined that two of the students gave judgments that were incompatible with the actual observed outcomes (p less than 0.05); of the three remaining students we detected a significant difference between two (p less than 0.05). These results differed from receiver operating characteristic curve area analysis, another technique used to evaluate predictive accuracy. We suggest that the proposed method can provide a useful tool for investigators using the Brier index to compare how well clinicians express uncertainty using probability judgments.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Chest Pain / diagnosis
  • Coronary Angiography
  • Diagnosis*
  • Electrocardiography
  • Humans
  • Judgment
  • Probability*
  • ROC Curve
  • Recurrence