Background: We present a fully articulated protocol for the Hamilton Rating Scale for Depression (HAM-D), including item scoring rules, rater training procedures, and a data management algorithm to increase accuracy of scores prior to outcome analyses. The latter involves identifying potentially inaccurate scores as interviews with discrepancies between two independent raters on the basis of either scores >=5-point difference) or meeting threshold for depression recurrence status, a long-term treatment outcome with public health significance. Discrepancies are resolved by assigning two new raters, identifying items with disagreement per an algorithm, and reaching consensus on the most accurate scores for those items.
Methods: These methods were applied in a clinical trial where the primary outcome was the Structured Interview Guide for the Hamilton Rating Scale for Depression-Seasonal Affective Disorder version (SIGH-SAD), which includes the 21-item HAM-D and 8 items assessing atypical symptoms. 177 seasonally depressed adult patients were enrolled and interviewed at 10 time points across treatment and the 2-year followup interval for a total of 1589 completed interviews with 1535 (96.6%) archived.
Results: Inter-rater reliability ranged from ICCs of .923-.967. Only 86 (5.6%) interviews met criteria for a between-rater discrepancy. HAM-D items "Depressed Mood", "Work and Activities", "Middle Insomnia", and "Hypochondriasis" and Atypical items "Fatigability" and "Hypersomnia" contributed most to discrepancies.
Limitations: Generalizability beyond well-trained, experienced raters in a clinical trial is unknown.
Conclusions: Researchers might want to consider adopting this protocol in part or full. Clinicians might want to tailor it to their needs.
Keywords: Depression assessment; Hamilton Rating Scale; Inter-rater reliability; Rater training; Scoring rules.
Copyright © 2016 Elsevier B.V. All rights reserved.