Measuring Agreement in Diagnostics: A Practical Guide for Researchers

Stat Med. 2025 Oct;44(23-24):e70299. doi: 10.1002/sim.70299.

Abstract

Healthcare professionals routinely perform clinical examinations and diagnostic assessments. How the findings of these assessments are interpreted can have significant implications for patient care and outcomes. A recent systematic review on reliability and agreement studies in intrapartum fetal heart rate monitoring highlighted three methodological issues: (1) confusion between the concepts of agreement and reliability, (2) lack of clarity on how agreement and reliability measures are calculated when more than two raters are involved, and (3) confidence intervals seldom reported. This paper aims to clarify how agreement measures can be computed and interpreted when the outcome is binary (e.g., normal/abnormal test result). Using a motivating example in which five experienced obstetricians assessed 20 CTGs, we demonstrate how agreement can be defined, computed, and interpreted in various scenarios. The paper further explains the relationship between agreement measures and the concept of reliability, the distinction between intra- and inter-observer studies, and approaches to make statistical inference and sample size calculations. Particular emphasis is placed on the proportion of agreement, the proportion of specific agreement and kappa coefficients. A shiny application has also been developed to support researchers in their agreement studies. This work completes existing tools such as the Guidelines for Reporting Reliability and Agreement Studies (GRRAS), the Quality Appraisal Tool for Studies of Diagnostic Reliability (QAREL) and STARD guidelines for reporting diagnostic accuracy studies. It is intended to help researchers improve the methodological quality of studies that evaluate the agreement of clinical tests.

Keywords: clinical test; concordance; error; interobserver; intraobserver; observer variation; reliability; repeatability; reproducibility of results.

MeSH terms

  • Cardiotocography* / statistics & numerical data
  • Data Interpretation, Statistical
  • Female
  • Heart Rate, Fetal
  • Humans
  • Observer Variation
  • Pregnancy
  • Reproducibility of Results