Background: Post-endoscopic retrograde cholangiopancreatography pancreatitis (PEP) is the most common and clinically significant complication of ERCP, with an incidence of 3.5-9.7% in general populations and up to 14.7% in high-risk groups, leading to considerable morbidity, mortality, and healthcare costs. Although numerous multivariable prediction models have been developed, their predictor sets, methodological rigor, and clinical applicability remain highly variable.
Method: We conducted a PRISMA 2020-compliant systematic review and meta-analysis, prospectively registered in PROSPERO (CRD42024556967). Nine databases were searched to June 1, 2024, for studies developing or validating multivariable PEP risk prediction models. Data on study/model characteristics, predictors, and performance metrics were extracted. Risk of bias was assessed with PROBAST, and study quality with the Newcastle-Ottawa Scale. Random-effects meta-analyses pooled (i) PEP incidence, (ii) associations of individual predictors, and (iii) overall model performance.
Results: Twenty-four studies (26 models; n = 38,016) published from 2002-2024 were included, predominantly retrospective cohorts from East Asia (n = 16). The pooled PEP incidence was 8.48% (95% CI: 6.90-10.39%; I² = 96.4%), highest in East Asia and retrospective cohorts. Strongest predictors included pancreatic duct cannulation (OR=3.50), pancreatic injection (OR=3.50), previous pancreatitis (OR=3.32), and pancreatic guidewire use (OR=2.63); additional consistent factors were female sex, difficult cannulation, elevated bilirubin, low albumin, choledocholithiasis, and prolonged procedure time. The pooled odds ratio for model performance was 0.81 (95% CI: 0.78-0.84; I² = 83.5%), with AUCs ranging 0.560-0.915, though calibration was infrequently reported (38%) and external validation undertaken in only 46%. PROBAST indicated high overall risk of bias, chiefly in the analysis (92%) and participants (100%) domains.
Conclusion: Current PEP prediction models generally demonstrate moderate-to-high discrimination but are limited by suboptimal calibration, inadequate external validation, and methodological heterogeneity. Future research should adhere to TRIPOD guidelines, employ multicenter large-sample designs, retain continuous predictors, address missing data with robust imputation methods, and conduct comprehensive temporal, geographic, and domain-specific validation. Integration of artificial intelligence/machine learning with conventional modeling and embedding validated tools into clinical workflows may enhance predictive accuracy and real-world utility.
Copyright: © 2025 Mao et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.