Purpose: To study whether compliance with methodological standards affected the reported accuracy of screening ultrasonography (US) for trauma.
Materials and methods: Meta-analysis was conducted of prospective investigations in which US was compared with any diagnostic reference test in patients with suspected abdominal injury. Reports were retrieved from electronic databases without language restrictions; added information was gained with manual search. Two reviewers independently assessed methodological rigor by using 27 items contained in the Standards for Reporting of Diagnostic Accuracy (STARD) checklist and the Quality Assessment of Studies of Diagnostic Accuracy included in Systematic Reviews (QUADAS) instrument. Inconsistencies were resolved by means of consensus. Summary receiver operating characteristics and random-effects meta-regression were used to model the effect of methodological standards and other study features on US accuracy.
Results: A total of 62 trials, which included a total of 18,167 participants, were eligible for meta-analysis. The average proportion of men or boys was 71.7%, the mean age was 30.6 years +/- 10.8 (standard deviation), and the mean injury severity score was 16.7 +/- 8.3. The prevalence of abdominal trauma was 25.1% (95% confidence interval [CI]: 21.1%, 29.1%). Pooled overall sensitivity and specificity of US were 78.9% (95% CI: 74.9%, 82.9%) and 99.2% (95% CI: 99.0%, 99.4%), respectively. Varying end points (hemoperitoneum or organ damage) did not change these results. US accuracy was much lower in children (sensitivity, 57.9%; specificity, 94.3%). Strong heterogeneity was observed in sensitivity, whereas specificity remained constant across trials. There was evidence of publication bias. Initial interobserver agreement with methodological standards ranged from poor (kappa = 0.03, independent verification of US findings) to perfect (kappa = 1.00, sufficiently short interval between US and reference test). By consensus, studies fulfilled a median of 13 methodological criteria (range, five to 20 criteria). In investigations that lacked individual methodological standards, researchers overestimated pooled sensitivity, with predicted differences of 9%-18%. The use of a single reference test, specification of the number of excluded patients, and calculation of CIs independently contributed to predicted sensitivity in a multivariate model. In 16 investigations (1309 subjects), a single reference test was used, which provided a combined sensitivity of 66.0% (95% CI: 56.2%, 75.8%).
Conclusion: Bias-adjusted sensitivity of screening US for trauma is low. Adherence to methodological standards included in appraisal instruments like STARD and QUADAS is crucial to obtain valid estimates of test accuracy.
Copyright RSNA, 2005