Progressive Hard-Mining Network for Monocular Depth Estimation

IEEE Trans Image Process. 2018 Aug;27(8):3691-3702. doi: 10.1109/TIP.2018.2821979.

Abstract

Depth estimation from the monocular RGB image is a challenging task for computer vision due to no reliable cues as the prior knowledge. Most existing monocular depth estimation works including various geometric or network learning methods lack of an effective mechanism to preserve the cross-border details of depth maps, which yet is very important for the performance promotion. In this paper, we propose a novel end-to-end progressive hard-mining network (PHN) framework to address this problem. Specifically, we construct the hard-mining objective function, the intra-scale and inter-scale refinement subnetworks to accurately localize and refine those hard-mining regions. The intra-scale refining block recursively recovers details of depth maps from different semantic features in the same receptive field while the inter-scale block favors a complementary interaction among multi-scale depth cues of different receptive fields. For further reducing the uncertainty of the network, we design a difficulty-ware refinement loss function to guide the depth learning process, which can adaptively focus on mining these hard-regions where accumulated errors easily occur. All three modules collaborate together to progressively reduce the error propagation in the depth learning process, and then, boost the performance of monocular depth estimation to some extent. We conduct comprehensive evaluations on several public benchmark data sets (including NYU Depth V2, KITTI, and Make3D). The experiment results well demonstrate the superiority of our proposed PHN framework over other state of the arts for monocular depth estimation task.