A critical assessment of factors influencing reliability in the classification of fractures, using fractures of the tibial plafond as a model

J Orthop Trauma. 1997 Oct;11(7):471-6. doi: 10.1097/00005131-199710000-00003.

Abstract

Objective: To investigate three factors that may influence the reliability of a fracture classification system: (a) the quality of the radiographs; (b) the ability of observers to identify the fracture fragments; and (c) the use of binary decision making.

Design: Assessment of interobserver reliability of blinded observers.

Setting: Medical school department of orthopaedics.

Participants: Two attending orthopaedists, two PGY-5 orthopaedic residents, and two PGY-3 orthopaedic residents served as observers.

Intervention: Observers classified radiographs of twenty-five tibial plafond fractures according to the Rüedi-Allgöwer and binary classification systems, and also rated the quality of each radiograph as adequate or inadequate for accurately classifying the fracture. At a second session, observers classified the same radiographs after marking the fragments of the tibial articular surface, as well as radiographs that had the articular fragments premarked by the senior author.

Main outcome measures: Pairwise interobserver reliability was analyzed by kappa statistics, and mean kappa values were compared for each method of fracture classification.

Results: No difference in interobserver reliability was detected between the Rüedi-Allgöwer and binary classification systems. Interobserver agreement on the adequacy of the radiographs was poorer than agreement on the classification of the fractures themselves. Having observers mark the fragments of the tibial articular surface had no effect on interobserver reliability; having the articular fragments premarked, however, significantly improved interobserver reliability in classifying the fractures.

Conclusions: The results of this study underscore the complexity of tibial plafond fractures and the difficulty observers have in reliably interpreting fracture radiographs. Fracture classification systems, such as the Rüedi-Allgöwer, predicated on identification of the number and displacement of articular fragments, may inherently perform poorly on reliability analyses because of observer difficulty in reliably identifying the fragments. Because binary decision making did not improve the reliability of fracture classification in this study, further investigation of the effectiveness of binary decision making may be advisable before such strategies are put into widespread use.

MeSH terms

  • Adult
  • Ankle Injuries / classification*
  • Ankle Injuries / diagnostic imaging*
  • Decision Trees
  • Humans
  • Models, Anatomic
  • Observer Variation
  • Radiography
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Tibial Fractures / classification*
  • Tibial Fractures / diagnostic imaging*