COME: A Collaborative Optimization Framework With Low-Rank MoE for Indoor 3D Object Detection

IEEE Trans Image Process. 2026:35:685-700. doi: 10.1109/TIP.2025.3648200.

Abstract

Indoor 3D object detection serves as a fundamental task in computer vision and robotics. Existing research predominantly focuses on training domain-specific optimal models for individual datasets, yet it overlooks the potential value of capturing universal geometric attributes that can substantially enhance object detection performance across diverse domains. To resolve this gap, we propose COME, a novel and effective collaborative optimization framework designed to seamlessly integrate these universal attributes while preserving the domain-specific characteristics of each dataset domain. COME is built on VoteNet and incorporates a Cross-Domain Expert Parameter Sharing Strategy (CEPSS) that draws inspiration from the Mixture of Experts (MoE) framework. Its core innovation resides in the dual-expert design of CEPSS: domain-shared experts capture universal geometric relationships across datasets, whereas domain-specific experts encode unique features for individual datasets. This separation enables the model to focus on learning both generic and domain-specialized visual cues, without mutual interference. In addition, to dynamically adapt to different domains, we design a lightweight gating network that automatically selects relevant experts, eliminating irrelevant feature interference and enhancing model specialization. Compared to standard parameter-sharing architectures, this design significantly reduces gradient conflicts during multi-domain training. We further optimize computational efficiency by implementing low-rank structures for domain-shared and domain-specific experts, thus striking a better balance between memory overhead and detection performance. Experiments show that COME achieves state-of-the-art results across benchmarks, with acceptable parameter growth, and outperforms existing multi-domain detection methods.