Interobserver and Intraobserver Reliability in the Radiologic Assessment of Lumbar Interbody Fusion

Clin Spine Surg. 2017 Jul;30(6):E853-E856. doi: 10.1097/BSD.0000000000000423.


Study design: Retrospective cohort study comparing intraobserver and interobserver reliability of 3 different radiologic fusion classifications following uninstrumented single-level anterior lumbar interbody fusion.

Objective of the study: The objective of the study was to compare the intraobserver and interobserver reliability of 3 different radiologic spinal fusion scoring systems.

Summary of background data: Knowledge regarding radiologic spinal fusion is crucial when studying patients that were treated with lumbar interbody fusion. The scoring system should be reliable and reproducible. Various radiologic classification systems coexist, but the reliability of these systems has thus far not been compared in a single consecutive group of patients. The aim of the present study was the identification of the most valid scoring system in the assessment of interbody fusion.

Methods: We studied a retrospective consecutive cohort of 50 patients who underwent an anterior lumbar interbody fusion procedure by a single surgeon using a stand-alone cage performed between 1993 and 2002. Plain anterior-posterior, lateral radiographs, and flexion-extension radiographs were made during follow-up visits and were used for analysis. The interbody fusion was scored on these radiographic images using the 3 classification systems (Brantigan, Burkus, and the Radiographic Score) by 2 experienced musculoskeletal radiologists and 2 senior orthopedic spinal surgeons all of whom were blinded to clinical data and outcome.

Results: Of the 3 classifications included in the current study, the Burkus classification had a moderate interobserver agreement and a substantial to perfect intraobserver agreement. The other classifications (Bratingan and the Radiographic Score) showed only fair interobserver agreement and moderate to substantial agreement among all observers. No significant differences in reliability between orthopedic surgeons and radiologists were found for all 3 classifications.

Conclusions: The Burkus classification system was classified as most reliable in this, but showed only moderate interobserver agreement. Therefore, the need for a more reliable classification system for the radiographic assessment of lumbar interbody fusion still exists to date.

MeSH terms

  • Adult
  • Aged
  • Demography
  • Female
  • Humans
  • Lumbar Vertebrae / diagnostic imaging*
  • Lumbar Vertebrae / surgery*
  • Male
  • Middle Aged
  • Observer Variation
  • Spinal Fusion*
  • Young Adult