Objective: Despite its quantitative definitions, the Spetzler-Martin grading scale for brain arteriovenous malformations (AVMs) is subject to interobserver variability, particularly when observers differ in their subspecialties. Interobserver variability between neuroradiologist and neurosurgeon grading was analyzed in a large AVM series to determine its extent, causes, and clinical implications.
Methods: In a consecutive surgical series of 224 AVM patients, Spetzler-Martin grades were assigned independently by a neuroradiologist and a neurosurgeon. Interobserver agreement was measured with a Cohen kappa analysis and the Wilcoxon signed-rank test.
Results: Disagreement in grades occurred in 62 (27.7%) patients. By kappa analysis, agreement was highest for venous drainage pattern (kappa = 0.90), intermediate for eloquence (kappa = 0.71), and lowest for size (kappa = 0.67), with substantial agreement on overall grade (kappa = 0.61). By Wilcoxon signed-rank test, size scores and AVM grades were significantly different. Causes of interobserver variability in grading included diffuse nidus, differences in angiographic versus surgical borders, paradoxical venous drainage, and the ambiguity of eloquence. Interobserver variability affected outcome data but did not diminish the predictive value of the Spetzler-Martin scale.
Conclusion: The Spetzler-Martin grading system can be applied reliably to most AVMs with good agreement between observers, but some unusual AVMs expose the system's imprecision and subjectivity. Interobserver variability can affect reporting of results, surgical risk assessment, and patient selection. Undergrading may encourage borderline surgical candidates to choose surgery and obtain results below their expectations.