Objective: Sensorless alignment of two-dimensional (2D) freehand ultrasound scans for three-dimensional US (3DUS) reconstruction offers significant advantages due to its ease of use. Prior approaches have used transducers with motion sensors, which are cumbersome and inconvenient in a clinical setting, linear wobblers, or motorized 2D scanning which suffer from a small field of view (FOV) and low volume acquisition rates.
Method: Freehand transverse B-mode data loops from 20 human volunteers (10 males, 10 females) were used for 3DUS reconstruction Our two-stream Physics inspired Learning-based Prediction of Pose Information (PLPPI) model explicitly integrates and utilizes speckle decorrelation as an inductive bias (temporal information) along with spatial information for alignment using 2D convolutions. A correlation layer then synergizes spatiotemporal cues for freehand frame alignment. A residual neural network (ResNet) predicted the spatial location of the input frames.
Results: PLPPI outperformed baseline deep learning networks (DLN), i.e. 2D CNN, ConvLSTM, and DC2-Net, with a 13% improvement in global pixel reconstruction error, 59.36% improvement in final drift, and 35.74% in final drift rate over the next best DLN, while requiring significantly less Graphics Processing Unit (GPU) memory.
Conclusion: Our model has fewer parameters, requiring less GPU memory to train for freehand 3DUS reconstruction along with a major reduction in computation time (106% speedup and 131% reduction in GPU memory usage) compared to baseline DLN.
Keywords: Carotid Artery; Freehand ultrasound; Reconstruction; Sensorless; Three-dimensional; Trackerless.
Copyright © 2026 World Federation for Ultrasound in Medicine and Biology. Published by Elsevier Inc. All rights reserved.