Background: The effect of different probes and operator experience on the reliability of lung ultrasound (LU) interpretation has not been investigated. We studied the effect of probes and operator experience on the interpretation reliability of LU in critically ill neonates.
Methods: This was a prospective, blind, cohort study enrolling patients with basic patterns ("B," "severe B," consolidation). Patients were scanned with microlinear (15 MHz; L15), phased-array sectorial (6-12 MHz; S7), and microconvex (8 MHz; C8) probes, in random order. Static images were acquired in high resolution, anonymized, and included in a pictorial database in random sequences. Seventeen clinicians with different LU experience were asked to blindly assess the pictorial database. Interrater agreement and interpretation reliability were analyzed. Subanalyses according to expertise and probe, and multivariate linear regression (including an "expertise × probe" interaction factor), were also performed.
Results: The agreement tends to be lower and more heterogeneous for residents (intraclass correlation coefficient [ICC], 0.82 [95% CI, 0.74-0.9], P < .001; I2, 67%, P = .04) and for fellows (ICC, 0.93 [95% CI, 0.9-0.97], P < .001; I2, 69%, P = .04), especially when using nonlinear probes, compared with senior physicians (ICC, 0.95 [95% CI, 0.93-0.96], P < .001; I2, 0%, P = .433). Area under the curve (AUC) values were high for all probes (L15, 0.96 [95% CI, 0.93-0.99]; C8, 0.91 [95% CI, 0.85-0.98]; S7, 0.86 [95% CI, 0.82-0.91]) and physicians (senior physicians, 0.95 [95% CI, 0.83-0.99]; fellows, 0.95 [95% CI, 0.75-0.99]; residents, 0.86 [95% CI, 0.5-0.99]). Worse reliability and higher heterogeneity were found when the evaluation was performed by residents (AUC, 0.9 [95% CI, 0.85-0.94], P < .01; I2, 93.6%, P < .001) than by fellows (AUC, 0.99 [95% CI, 0.9-0.999], P < .001; I2, 34.3%, P = .09) and/or by senior physicians (AUC, 0.99 [95% CI, 0.9-0.999], P < .001; I2, 18%, P = .236). The "expertise × probe" interaction factor was associated with lower ICC (standardized regression coefficient β, -0.69; P < .0001; adjusted R2, 0.99) and AUC (standardized regression coefficient β, -0.76; P < .0001; adjusted R2, 0.98).
Conclusions: LU interpretation in neonates shows good interrater agreement and reliability, irrespective of the probe and rater expertise. The use of nonlinear probes by novice operators is associated with the lowest agreement and reliability.
Keywords: agreement; diagnostic accuracy; expertise; lung ultrasound; neonate; probe.
Copyright © 2019 American College of Chest Physicians. Published by Elsevier Inc. All rights reserved.