Introduction: Intracranial hemorrhage (ICH) is a neurological emergency with high mortality rates requiring timely diagnosis. While computed tomography (CT) remains the gold standard, diagnostic accuracy varies with radiologist experience and workload. This systematic review and meta-analysis aims to evaluate the diagnostic performance of AI algorithms in detecting ICH on CT imaging and to explore key considerations for their clinical implementation in emergency and teleradiology settings.
Methods: We conducted a systematic review and meta-analysis following PRISMA-DTA guidelines, searching seven databases up to May 2025. Studies evaluating AI diagnostic accuracy for ICH detection on non-contrast CT scans were included. Quality assessment used QUADAS-2 criteria. Pooled estimates were calculated using random-effects models, with subgroup analyses by algorithm architecture and ICH subtype.
Results: A total of 45 studies met the inclusion criteria, comprising 29 research algorithm evaluations (n = 185,847 patients) and 16 studies of commercial AI system implementations (n = 94,523 patients). Research algorithms demonstrated a pooled sensitivity of 0.890 (95 % CI: 0.839-0.942) and specificity of 0.926 (95 % CI: 0.899-0.954). Commercial AI systems exhibited slightly superior performance, with sensitivity of 0.899 (95 % CI: 0.858-0.940) and specificity of 0.951 (95 % CI: 0.928-0.974). Diagnostic accuracy varied notably across ICH subtypes, with epidural hemorrhage presenting the greatest detection challenge (difficulty score: 0.251). Among algorithmic designs, convolutional recurrent neural networks (CNN-RNNs) demonstrated the highest diagnostic performance. In real-world clinical implementation, AI integration demonstrated substantial workflow improvements: door-to-treatment decision time reduced by 26 % (92 → 68 min), critical case notification time decreased by 57 % (75 → 32 min), and triage accuracy improved by 8 % (86 %→94 %), directly impacting patient care pathways. Despite a 7-8 % sensitivity reduction compared to benchmark settings, these clinical benefits were consistent across implementations.
Conclusions: AI algorithms demonstrate strong diagnostic performance in detecting ICH, with commercial systems demonstrating superior specificity compared to research models. Despite notable performance gaps in detecting certain hemorrhage subtypes, particularly epidural hemorrhage, the clinical benefits of AI integration, including improved workflow efficiency and reduced time to treatment decisions, are substantial. Future research should prioritize prospective validation and the development of algorithms tailored to enhance detection across challenging ICH subtypes.
Keywords: Artificial intelligence; Computed tomography; Deep learning; Diagnostic accuracy; Intracranial hemorrhage; Neuroimaging.
© 2025 The Authors.