The use of "overall accuracy" to evaluate the validity of screening or diagnostic tests

J Gen Intern Med. 2004 May;19(5 Pt 1):460-5. doi: 10.1111/j.1525-1497.2004.30091.x.


Objective: Evaluations of screening or diagnostic tests sometimes incorporate measures of overall accuracy, diagnostic accuracy, or test efficiency. These terms refer to a single summary measurement calculated from 2 x 2 contingency tables that is the overall probability that a patient will be correctly classified by a screening or diagnostic test. We assessed the value of overall accuracy in studies of test validity, a topic that has not received adequate emphasis in the clinical literature.

Design: Guided by previous reports, we summarize the issues concerning the use of overall accuracy. To document its use in contemporary studies, a search was performed for test evaluation studies published in the clinical literature from 2000 to 2002 in which overall accuracy derived from a 2 x 2 contingency table was reported.

Measurements and main results: Overall accuracy is the weighted average of a test's sensitivity and specificity, where sensitivity is weighted by prevalence and specificity is weighted by the complement of prevalence. Overall accuracy becomes particularly problematic as a measure of validity as 1) the difference between sensitivity and specificity increases and/or 2) the prevalence deviates away from 50%. Both situations lead to an increasing deviation between overall accuracy and either sensitivity or specificity. A summary of results from published studies (N = 25) illustrated that the prevalence-dependent nature of overall accuracy has potentially negative consequences that can lead to a distorted impression of the validity of a screening or diagnostic test.

Conclusions: Despite the intuitive appeal of overall accuracy as a single measure of test validity, its dependence on prevalence renders it inferior to the careful and balanced consideration of sensitivity and specificity.

Publication types

  • Research Support, U.S. Gov't, P.H.S.
  • Review
  • Validation Study

MeSH terms

  • Diagnostic Techniques and Procedures / standards*
  • Humans
  • Mass Screening / standards*
  • Reference Standards
  • Reproducibility of Results
  • Sensitivity and Specificity