Left ventricular function can be evaluated by qualitative grading and by eyeball estimation of ejection fraction (EF). We sought to define the reproducibility of these techniques, and how they are affected by image quality, experience and accreditation. Twenty apical four-chamber echocardiographic cine loops (Online Resource 1-20) of varying image quality and left ventricular function were anonymized and presented to 35 operators. Operators were asked to provide (1) a one-phrase grading of global systolic function (2) an "eyeball" EF estimate and (3) an image quality rating on a 0-100 visual analogue scale. Each observer viewed every loop twice unknowingly, a total of 1400 viewings. When grading LV function into five categories, an operator's chance of agreement with another operator was 50% and with themself on blinded re-presentation was 68%. Blinded eyeball LVEF re-estimates by the same operator had standard deviation (SD) of difference of 7.6 EF units, with the SD across operators averaging 8.3 EF units. Image quality, defined as the average of all operators' assessments, correlated with EF estimate variability (r = -0.616, p < 0.01) and visual grading agreement (r = 0.58, p < 0.01). However, operators' own single quality assessments were not a useful forewarning of their estimate being an outlier, partly because individual quality assessments had poor within-operator reproducibility (SD of difference 17.8). Reproducibility of visual grading of LV function and LVEF estimation is dependent on image quality, but individuals cannot themselves identify when poor image quality is disrupting their LV function estimate. Clinicians should not assume that patients changing in grade or in visually estimated EF have had a genuine clinical change.
Keywords: Echocardiography; Heart failure; Reproducibility of results; Ventricular function.