Interrater reliability of scoring of pain drawings in a self-report health survey

Spine (Phila Pa 1976). 2005 Aug 15;30(16):E455-8. doi: 10.1097/


Study design: Study of interrater reliability.

Objective: To assess the interrater reliability of data from pain drawings scored by multiple raters and the consistency of the subsequent classification of cases of widespread pain.

Summary of background data: In large health surveys, pain drawings used to capture self-reported pain, and to classify cases of widespread pain, are often scored by several raters. The reliability of multiple rater scoring of pain drawings has not been investigated.

Methods: As part of a postal survey sent to adults 50 years and older, subjects were asked to shade their pain on a blank body manikin. The first 50 pain drawings in which respondents had shaded pain were selected for this study. Eight nonclinical staff were trained to score pain drawings using transparent templates divided into 50 body areas. Interrater reliability was assessed by comparing the scoring of "pain" or "no pain" for all 50 areas of each pain drawing.

Results: Complete scoring agreement among all raters was observed for at least 78% of pain drawings across all body areas (kappa > 0.60). The raters had complete agreement in 42 of 50 areas in 90% or more of pain drawings. From the raters' scoring of pain areas, there was complete agreement on the presence or absence of widespread pain for 49 of 50 pain drawings (98% agreement, Kappa = 0.98).

Conclusions: This study shows that multiple raters, with training and guidelines, can reliably score pain drawings, and high consistency in the subsequent classification of cases of widespread pain can be obtained from such data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Art*
  • Health Surveys
  • Humans
  • Manikins
  • Middle Aged
  • Observer Variation
  • Pain / classification*
  • Pain / physiopathology*
  • Pain Measurement / methods*