Pixel-level depth information is crucial to many applications, such as autonomous driving, robotics navigation, 3D scene reconstruction, and augmented reality. However, depth information, which is usually acquired by sensors such as LiDAR, is sparse. Depth completion is a process that predicts missing pixels' depth information from a set of sparse depth measurements. Most of the ongoing research applies deep neural networks on the entire sparse depth map and camera scene without utilizing any information about the available objects, which results in more complex and resource-demanding networks. In this work, we propose to use image instance segmentation to detect objects of interest with pixel-level locations, along with sparse depth data, to support depth completion. The framework utilizes a two-branch encoder-decoder deep neural network. It fuses information about scene available objects, such as objects' type and pixel-level location, LiDAR, and RGB camera, to predict dense accurate depth maps. Experimental results on the KITTI dataset showed faster training and improved prediction accuracy. The proposed method reaches a convergence state faster and surpasses the baseline model in all evaluation metrics.
Keywords: LiDAR; depth completion; instance segmentation; object detection; sensor fusion.