Predicting isocitrate dehydrogenase (IDH) mutations in gliomas using magnetic resonance imaging (MRI) is clinically important for treatment planning. This study compared two artificial intelligence (AI) models, GliomaDepth-IDH (ResNet34-based) and GliomaVista-IDH (Vision Transformer-based), with 18 physicians (eight neuroradiologists, five neurosurgeons, and five neurosurgery residents) in predicting IDH mutation status. On the Brain Tumor Segmentation Challenge dataset, the GliomaVista-IDH AI model achieved an area under the curve (AUC) value of 0.97, significantly outperforming all physician groups. However, external validation on a Japanese cohort revealed performance degradation: GliomaDepth-IDH declined to an AUC of 0.75 and GliomaVista-IDH to 0.82, with GliomaVista-IDH showing significant calibration issues (Brier score = 0.32). High-performing physicians achieved comparable results (AUC = 0.88) with superior calibration (Brier score = 0.19). Inter-rater reliability analysis revealed substantial variability across physician groups. These findings suggest that AI models can assist many physicians, while experienced practitioners remain competitive with better-calibrated predictions in challenging domains.
© 2026. The Author(s).