Inter-rater agreement and reliability of the COSMIN (COnsensus-based Standards for the selection of health status Measurement Instruments) checklist

Lidwine B Mokkink; Caroline B Terwee; Elizabeth Gibbons; Paul W Stratford; Jordi Alonso; Donald L Patrick; Dirk L Knol; Lex M Bouter; Henrica C W de Vet

doi:10.1186/1471-2288-10-82

Inter-rater agreement and reliability of the COSMIN (COnsensus-based Standards for the selection of health status Measurement Instruments) checklist

BMC Med Res Methodol. 2010 Sep 22:10:82. doi: 10.1186/1471-2288-10-82.

Authors

Lidwine B Mokkink¹, Caroline B Terwee, Elizabeth Gibbons, Paul W Stratford, Jordi Alonso, Donald L Patrick, Dirk L Knol, Lex M Bouter, Henrica C W de Vet

Affiliation

¹ Department of Epidemiology and Biostatistics and the EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, The Netherlands. w.mokkink@vumc.nl

Abstract

Background: The COSMIN checklist is a tool for evaluating the methodological quality of studies on measurement properties of health-related patient-reported outcomes. The aim of this study is to determine the inter-rater agreement and reliability of each item score of the COSMIN checklist (n = 114).

Methods: 75 articles evaluating measurement properties were randomly selected from the bibliographic database compiled by the Patient-Reported Outcome Measurement Group, Oxford, UK. Raters were asked to assess the methodological quality of three articles, using the COSMIN checklist. In a one-way design, percentage agreement and intraclass kappa coefficients or quadratic-weighted kappa coefficients were calculated for each item.

Results: 88 raters participated. Of the 75 selected articles, 26 articles were rated by four to six participants, and 49 by two or three participants. Overall, percentage agreement was appropriate (68% was above 80% agreement), and the kappa coefficients for the COSMIN items were low (61% was below 0.40, 6% was above 0.75). Reasons for low inter-rater agreement were need for subjective judgement, and accustom to different standards, terminology and definitions.

Conclusions: Results indicated that raters often choose the same response option, but that it is difficult on item level to distinguish between articles. When using the COSMIN checklist in a systematic review, we recommend getting some training and experience, completing it by two independent raters, and reaching consensus on one final rating. Instructions for using the checklist are improved.

Publication types

Research Support, Non-U.S. Gov't
Validation Study

MeSH terms

Checklist* / methods
Data Interpretation, Statistical
Health Status Indicators*
Observer Variation
Outcome Assessment, Health Care / standards*
Periodicals as Topic / standards*
Qualitative Research
Reproducibility of Results