Residual Learning for Salient Object Detection

IEEE Trans Image Process. 2020 Feb 28. doi: 10.1109/TIP.2020.2975919. Online ahead of print.

Abstract

Recent deep learning based salient object detection methods improve the performance by introducing multi-scale strategies into fully convolutional neural networks (FCNs). The final result is obtained by integrating all the predictions at each scale. However, the existing multi-scale based methods suffer from several problems: 1) it is difficult to directly learn discriminative features and filters to regress high-resolution saliency masks for each scale; 2) rescaling the multi-scale features could pull in many redundant and inaccurate values, and this weakens the representational ability of the network. In this paper, we propose a residual learning strategy and introduce to gradually refine the coarse prediction scale-by-scale. Concretely, instead of directly predicting the finest-resolution result at each scale, we learn to predict residuals to remedy the errors between coarse saliency map and scale-matching ground truth masks. We employ a Dilated Convolutional Pyramid Pooling (DCPP) module to generate the coarse prediction and guide the the residual learning process through several novel Attentional Residual Modules (ARMs). We name our network as Residual Refinement Network (R2Net). We demonstrate the effectiveness of the proposed method against other state-of-the-art algorithms on five released benchmark datasets. Our R2Net is a fully convolutional network which does not need any post-processing and achieves a real-time speed of 33 FPS when it is run on one GPU.