The National Institutes of Health (NIH) Oral chronic Graft-versus-Host Disease (cGVHD) Activity Assessment Instrument is intended to be simple to use and to provide a reproducible objective measure of disease activity over time. The objective of this study was to assess inter- and intraobserver variability in the component and composite scores in patients evaluated with oral cGVHD. Twenty-four clinicians (bone marrow transplant [BMT] oncologists: BMTE, n = 16; BMT midlevel providers: BMT MLP; n = 4; and oral medicine experts [OME], n = 4), from 6 major transplant centers scored high-quality intraoral photographs of 12 patients. The same photographs were evaluated 1 week later by the same evaluators. An intraclass correlation coefficient (ICC) was used to calculate intrarater reliability and interrater agreement was analyzed using a weighted kappa statistic: 0 <or= kappa <or= 0.20 = poor, 0.21 <or= kappa <or= 0.40 = fair, 0.41 <or= kappa <or= 0.60 = moderate, 0.61 <or= kappa <or= 0.80 = good, 0.81 <or= kappa <or= 1.00 = very good. Data on participant experiences and demographics were also collected. Mean interrater reliability for each element was poor to moderate (range: 0.15-0.46). Overall mean kappa scores were highest for ulcers (0.46), followed by erythema (0.23), and lowest for lichenoid (0.15) and mucoceles (0.14). Kappa scores were higher in OME compared with BMTE and BMT MLP in ulcers and erythema (eg, 0.85, 0.44, 0.33 for ulcers, respectively), but similar in lichenoid and mucoceles. Overall intrarater reliability in all groups was very good (>or=0.90) and highest for ulcers (0.97, 0.85, 0.94). Although 75% of OME were comfortable with their abilities to score the cases, approximately 50% of BMTE and BMT MLP were uncomfortable. The majority felt that their evaluations were accurate; however, 84% agreed that formal training is required. Interrater variability of the oral cGVHD instrument is unacceptable for the purposes of clinical trials. Greater concordance among OME, high intrarater reliability, and participant feedback suggests that formal training may significantly decrease variability. Parallel investigations must be completed using the other organ specific instruments prior to any revision and widespread prospective utilization of these tools as research endpoints.
Copyright (c) 2010 American Society for Blood and Marrow Transplantation. Published by Elsevier Inc. All rights reserved.