Interobserver agreement in the evaluation of B-lines using bedside ultrasound

J Crit Care. 2015 Dec;30(6):1395-9. doi: 10.1016/j.jcrc.2015.08.021. Epub 2015 Sep 1.


Purpose: We evaluated agreement among trained emergency physicians assessing the degree of B-line presence on bedside ultrasound in patients presenting to the emergency department (ED) with acute undifferentiated dyspnea. We also determined which thoracic zones offered the highest level of interobserver reliability for sonographic B-line assessment.

Materials and methods: We evaluated a prospective convenience sample of adult patients presenting with dyspnea to an academic ED. Two consecutive bedside lung ultrasounds were performed on 91 patients by a pair of physician-sonographers. The lung ultrasounds were structured 10-zone thoracic sonograms, documented as videos. Sonographer pairs were expert/expert (>100 lung ultrasounds performed) or expert/novice pairs (novices performed 5 supervised examinations after structured training) and blinded to clinical data. Sonographers reported B-line concentration with 3 assessment methods: (1) normal (<3 B-lines) or abnormal (≥3 B-lines); (2) ordinal (normal, mild, moderate, or severe), and (3) counting B-lines (0-10; >10) in each zone. All statistical analyses were performed using SPSS version 18.0 (Chicago, IL) and Stata 12.1 (College Station, TX). We evaluated interrater and intrarater agreement using Intraclass correlation coefficients (ICCs).

Results: The right and left anterior/superior lung zones showed substantial agreement in all assessment methods and demonstrated best overall agreement (ICC for right: counting, ordinal, and normal/abnormal, 0.811 [0.714-0.875], 0.875 [0.810-0.917], and 0.729 [0.590-0.821], respectively). Furthermore, both expert/expert pairs and expert/novice pairs showed substantial agreement in the right and left anterior/superior thoracic zones (expert/expert, 0.904 and 0.777, respectively; expert/novice, 0.862, and 0.834, respectively). Second best agreement was found for the lateral/superior lung zones (right: counting, ordinal, and normal/abnormal, 0.744 [0.612-0.831], 0.686 [0.524-0.792], and 0.639 [0.453-0.761], respectively; and ICC left: counting, ordinal, and normal/abnormal, 0.671 [0.501-0.782], 0.615 [0.417-0.746], and 0.720 [0.577-0.815], respectively). When comparing agreement to distinguish "normal vs abnormal" B-line findings, our results showed significant agreement in all zones with the exception of the right and left inferior/lateral lung fields and left posterior lung. Reinterpretation by 2 experts of all their own randomized video clips at a later date showed agreement of 0.697 (n=733 zones) and 0.647 (n=266) zones for ordinal assessment of B-line concentration.

Conclusion: Interrater agreement was best in the anterior/superior thoracic zones followed by the lateral/superior zones for both expert/expert and expert/novice pairs. Agreement in the lateral/inferior lung zones was overall inferior. Intrarater agreement was highest at extreme high or low numbers of B-lines.

Keywords: B-line; Emergency ultrasound; alveolar interstitial syndrome; comet tail; inter-rater reliability; lung ultrasound.

MeSH terms

  • Aged
  • Dyspnea / diagnostic imaging*
  • Emergency Medicine / methods
  • Emergency Service, Hospital
  • Female
  • Heart Failure / diagnostic imaging
  • Humans
  • Lung / diagnostic imaging*
  • Male
  • Middle Aged
  • Observer Variation*
  • Prospective Studies
  • Reproducibility of Results
  • Ultrasonography
  • Video Recording