Purpose: To evaluate the within- and between-reader reliability and the interrelation between 2 methods of grading meibography images.
Methods: A video meibography sequence (1200 frames) was captured from 290 patients using near-infrared light (650-700 nm) and a near-infrared CCD camera. One frame was selected for grading by 2 masked readers using 2 scales, where the first reader graded the image on 2 occasions and the second reader graded the image on 1 occasion. The first grading scale was a gestalt assessment (categorically graded), which is an assessment of partial meibomian glands within the image. The second was a count of individual whole glands. Within- and between-reader reliability and concurrent validity between the scales were examined.
Results: Within-reader reliability of the gestalt scale was moderate to high (simple kappa = 0.78, 95% confidence interval [CI] = 0.71-0.85 and weighted kappa = 0.91, 95% CI = 0.88-0.95). Within-reader reliability of individual gland counting was moderate via a 95% limits of agreement analysis (-2.84-2.76 glands). Between-reader reliability of the gestalt scale was fair (simple kappa = 0.38, 95% CI = 0.30-0.46 and weighted kappa = 0.57, 95% CI = 0.47-0.68). Between-reader reliability of gland counting was fair via a 95% limits of agreement analysis (-4.46-5.08 glands). There was a strong relation between the gestalt scale and gland counting indicating good concurrent validity (Z = -15.15, P < 0.0001).
Conclusions: These methods of grading meibography images demonstrate good within-reader reliability and fair between-reader reliability. Responsiveness to change will need to be addressed in future studies.