Molecular subtyping is essential for guiding systemic therapy in breast cancer but currently requires invasive biopsy. Conventional B-mode ultrasound offers rich anatomical information, yet lacks the functional dynamics needed to capture the comprehensive biology of tumors. Here, we present the first multimodal ultrasound spatiotemporal transformer, MUST-Sub, which integrates paired B-mode morphological features with contrast-enhanced ultrasound (CEUS) hemodynamic patterns to classify Luminal, HER2-enriched, and triple-negative subtypes. Training on a retrospective development cohort, and validated on internal, prospective, and multicenter external cohorts, MUST-Sub achieved macro-average areas under the receiver operating characteristic curve (AUCs) of 0.94, 0.90, and 0.92, respectively, and Luminal versus non-Luminal AUCs of 0.92, 0.88, and 0.91, outperforming B-mode-only deep learning baselines. MUST-Sub also produced interpretable quantitative biomarkers derived from spatiotemporal attention: the morphology-associated biomarker showed inverse correlations with tumor size (Spearman ρ = [- 0.34, - 0.23]; all p < . 05), while the hemodynamics-associated biomarker correlated positively with tumor size (ρ = [0.24, 0.32]; all p < . 05) and Ki-67 proliferation index (ρ = [0.21, 0.24]; all p < . 05). These results suggest that multimodal ultrasound with spatiotemporal modeling can serve as a promising adjunctive approach for non-invasive pre-biopsy molecular phenotyping of breast cancer.
© 2026. The Author(s).