Edge-Guided Recurrent Positioning Network for Salient Object Detection in Optical Remote Sensing Images

Xiaofei Zhou; Kunye Shen; Li Weng; Runmin Cong; Bolun Zheng; Jiyong Zhang; Chenggang Yan

doi:10.1109/TCYB.2022.3163152

Edge-Guided Recurrent Positioning Network for Salient Object Detection in Optical Remote Sensing Images

IEEE Trans Cybern. 2023 Jan;53(1):539-552. doi: 10.1109/TCYB.2022.3163152. Epub 2022 Dec 23.

Authors

Xiaofei Zhou, Kunye Shen, Li Weng, Runmin Cong, Bolun Zheng, Jiyong Zhang, Chenggang Yan

PMID: 35417369
DOI: 10.1109/TCYB.2022.3163152

Abstract

Optical remote sensing images (RSIs) have been widely used in many applications, and one of the interesting issues about optical RSIs is the salient object detection (SOD). However, due to diverse object types, various object scales, numerous object orientations, and cluttered backgrounds in optical RSIs, the performance of the existing SOD models often degrade largely. Meanwhile, cutting-edge SOD models targeting optical RSIs typically focus on suppressing cluttered backgrounds, while they neglect the importance of edge information which is crucial for obtaining precise saliency maps. To address this dilemma, this article proposes an edge-guided recurrent positioning network (ERPNet) to pop-out salient objects in optical RSIs, where the key point lies in the edge-aware position attention unit (EPAU). First, the encoder is used to give salient objects a good representation, that is, multilevel deep features, which are then delivered into two parallel decoders, including: 1) an edge extraction part and 2) a feature fusion part. The edge extraction module and the encoder form a U-shape architecture, which not only provides accurate salient edge clues but also ensures the integrality of edge information by extra deploying the intraconnection. That is to say, edge features can be generated and reinforced by incorporating object features from the encoder. Meanwhile, each decoding step of the feature fusion module provides the position attention about salient objects, where position cues are sharpened by the effective edge information and are used to recurrently calibrate the misaligned decoding process. After that, we can obtain the final saliency map by fusing all position attention cues. Extensive experiments are conducted on two public optical RSIs datasets, and the results show that the proposed ERPNet can accurately and completely pop-out salient objects, which consistently outperforms the state-of-the-art SOD models.