3D-DFM: Anchor-Free Multimodal 3-D Object Detection With Dynamic Fusion Module for Autonomous Driving

IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):10812-10822. doi: 10.1109/TNNLS.2022.3171553. Epub 2023 Nov 30.

Abstract

Recent advances in cross-modal 3D object detection rely heavily on anchor-based methods, and however, intractable anchor parameter tuning and computationally expensive postprocessing severely impede an embedded system application, such as autonomous driving. In this work, we develop an anchor-free architecture for efficient camera-light detection and ranging (LiDAR) 3D object detection. To highlight the effect of foreground information from different modalities, we propose a dynamic fusion module (DFM) to adaptively interact images with point features via learnable filters. In addition, the 3D distance intersection-over-union (3D-DIoU) loss is explicitly formulated as a supervision signal for 3D-oriented box regression and optimization. We integrate these components into an end-to-end multimodal 3D detector termed 3D-DFM. Comprehensive experimental results on the widely used KITTI dataset demonstrate the superiority and universality of 3D-DFM architecture, with competitive detection accuracy and real-time inference speed. To the best of our knowledge, this is the first work that incorporates an anchor-free pipeline with multimodal 3D object detection.