Intrarater and interrater reliability of assessment of lumbar multifidus muscle thickness using rehabilitative ultrasound imaging

J Orthop Sports Phys Ther. 2007 Oct;37(10):608-12. doi: 10.2519/jospt.2007.2418.


Study design: Within-session intrarater and interrater reliability study.

Objective: To establish the intrarater and interrater reliability of thickness measurements of the multifidus muscle in a parasagittal plane, conducted by an experienced ultrasound operator and a novice assessor.

Background: There is considerable evidence for the important role of the multifidus muscle in segmental stabilization of the lumbar spine. The cross-sectional area of the multifidus muscle has been assessed in healthy subjects and patients with low back pain using real-time ultrasound imaging. However, few studies have measured the thickness of the multifidus muscle using a parasagittal view.

Methods and measures: The thickness of the multifidus muscle was measured at rest, using real-time ultrasound imaging, in 10 subjects without a history of low back pain, at the levels of the L2-3 and L4-5 zygapophyseal joints. The measure was carried out 3 times at each level by 2 assessors (1 experienced, 1 novice). Intrarater (model 3) and interrater (model 2) reliability was assessed by calculation of an F statistic (analysis of variance), the intraclass correlation coefficient (ICC), and the standard error of measurement (SEM).

Results: On the basis of an average of 3 trials, the 2 operators showed very high interrater agreement on the measurement of thicknesses at the L2-3 level (ICC2,3 = 0.96; 95% CI: 0.84 to 0.99) and the L4-5 vertebral level (ICC2,3 = 0.97; 95% CI: 0.87 to 0.99), with no systematic differences in muscle size across operators (P > .05). Interrater reliability was relatively lower for the L2-3 level (ICC2,1 = 0.85; 95% CI: 0.51 to 0.96) than the L4-5 level (ICC2,1 = 0.87; 95% CI: 0.52 to 0.97) when a single trial per rater was used, but these values still indicated a high level of agreement. In addition, the novice and experienced operator produced reliable intrarater measurements at L2-3 (ICC3,1 = 0.89; 95% CI: 0.72 to 0.97 and 0.94; 95% CI: 0.86 to 0.99) and at L4-5 (ICC3,1 = 0.88; 95% CI: 0.68 to 0.97 and 0.95; 95% CI: 0.86 to 0.99), with no systematic differences in muscle size across trials (P > .05). The consistently low SEM values also indicate low measurement error.

Conclusion: A novice and an experienced assessor were both able to reliably perform this measure at rest for 2 vertebral levels using real-time ultrasound imaging. An average of 3 trials produced higher interrater reliability scores, though using a single trial per rater was also reliable.

Publication types

  • Comparative Study

MeSH terms

  • Adult
  • Female
  • Humans
  • Lumbosacral Region / diagnostic imaging*
  • Male
  • Middle Aged
  • Muscle, Skeletal / diagnostic imaging
  • Muscle, Skeletal / physiology*
  • Queensland
  • Reproducibility of Results
  • Ultrasonography, Doppler / methods