Chapter 9: options for summarizing medical test performance in the absence of a "gold standard"

J Gen Intern Med. 2012 Jun;27 Suppl 1(Suppl 1):S67-75. doi: 10.1007/s11606-012-2031-7.


The classical paradigm for evaluating test performance compares the results of an index test with a reference test. When the reference test does not mirror the "truth" adequately well (e.g. is an "imperfect" reference standard), the typical ("naïve") estimates of sensitivity and specificity are biased. One has at least four options when performing a systematic review of test performance when the reference standard is "imperfect": (a) to forgo the classical paradigm and assess the index test's ability to predict patient relevant outcomes instead of test accuracy (i.e., treat the index test as a predictive instrument); (b) to assess whether the results of the two tests (index and reference) agree or disagree (i.e., treat them as two alternative measurement methods); (c) to calculate "naïve" estimates of the index test's sensitivity and specificity from each study included in the review and discuss in which direction they are biased; (d) mathematically adjust the "naïve" estimates of sensitivity and specificity of the index test to account for the imperfect reference standard. We discuss these options and illustrate some of them through examples.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Diagnostic Techniques and Procedures / standards*
  • Evidence-Based Medicine / methods
  • Evidence-Based Medicine / standards
  • Guidelines as Topic*
  • Humans
  • Meta-Analysis as Topic*
  • Outcome and Process Assessment, Health Care / methods
  • Outcome and Process Assessment, Health Care / standards
  • Reference Standards
  • Reproducibility of Results
  • Review Literature as Topic*
  • Sensitivity and Specificity