Salient Object Detection by Contextual Refinement

doi:10.1109/CVPRW50498.2020.00186

Home
/
Papers
/
Salient Object Detection by Contextual Refinement

Proceedings Article•DOI•

Salient Object Detection by Contextual Refinement

Sayanti Bardhan¹•Institutions (1)

Indian Institute of Technology Madras¹

14 Jun 2020-pp 1464-1472

TL;DR: A novel saliency detection framework with a Contextual Refinement Module (CRM) which consists of two sub-networks, Object Relation Unit (ORU) and Scene Context Unit (SCU) which captures complementary contextual information to give a holistic estimation of salient regions.

read less

Abstract: Context plays an important role in the saliency prediction task. In this work, we propose a saliency detection framework that not only extracts visual features but also models two kinds of context including object-object relationships within a single image and scene contextual information. Specifically, we develop a novel saliency detection framework with a Contextual Refinement Module (CRM) which consists of two sub-networks, Object Relation Unit (ORU) and Scene Context Unit (SCU). ORU encodes the object-object relationship based on object relative position and object co-occurrence pattern in an image, by graphical approach, while SCU incorporates the scene contextual information of an image. Object Relation Unit (ORU) and Scene Context Unit (SCU) captures complementary contextual information to give a holistic estimation of salient regions. Extensive experiments show the effectiveness of modelling object relations and scene context in boosting the performance of saliency prediction. In particular, our frame-work outperforms the state-of-the-art models on challenging benchmark datasets.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Segment, Magnify and Reiterate: Detecting Camouflaged Objects the Hard Way

[...]

Qi Jia, Shuilian Yao, Yu Liu, Xin Fan, Risheng Liu, Zhongxuan Luo - Show less +2 more

01 Jun 2022

TL;DR: This work designs a new discriminative mask which makes the model attend on the fixation and edge regions, and proposes an iterative refinement framework, coined SegMaR, which integrates Segment, Magnify and Reiterate in a multi-stage detection fashion.

...read moreread less

Abstract: It is challenging to accurately detect camouflaged objects from their highly similar surroundings. Existing methods mainly leverage a single-stage detection fashion, while neglecting small objects with low-resolution fine edges requires more operations than the larger ones. To tackle camouflaged object detection (COD), we are inspired by humans attention coupled with the coarse-to-fine detection strategy, and thereby propose an iterative refinement framework, coined SegMaR, which integrates Segment, Magnify and Reiterate in a multi-stage detection fashion. Specifically, we design a new discriminative mask which makes the model attend on the fixation and edge regions. In addition, we leverage an attention-based sampler to magnify the object region progressively with no need of enlarging the image size. Extensive experiments show our SegMaR achieves remarkable and consistent improvements over other state-of-the-art methods. Especially, we surpass two competitive methods 7.4% and 20.0% respectively in average over standard evaluation metrics on small camouflaged objects. Additional studies provide more promising insights into Seg-MaR, including its effectiveness on the discriminative mask and its generalization to other network architectures. Code is available at https://github.com/dlut-dimt/SegMaR.

...read moreread less

21 citations

Journal Article•DOI•

BGRDNet: RGB-D salient object detection with a bidirectional gated recurrent decoding network

[...]

Zhengyi Liu, Yuan Wang, Zhili Zhang, Yacheng Tan

23 Mar 2022-Multimedia Tools and Applications

1 citations

Journal Article•DOI•

AGRFNet: Two-stage cross-modal and multi-level attention gated recurrent fusion network for RGB-D saliency detection

[...]

Zhengyi Liu, Yuan Wang, Yacheng Tan, Wei Li, Yun Xiao - Show less +1 more

01 Mar 2022-Signal Processing-image Communication

TL;DR: Zhang et al. as discussed by the authors proposed an Attention Gated Recurrent Unit (AGRU) for RGB-D saliency detection, which can reduce the influence of low-quality depth image, and retain more semantic features in the progressive fusion process.

...read moreread less

Abstract: RGB-D saliency detection aims to identify the most attractive objects in a pair of color and depth images. However, most existing models adopt classic U-Net framework which progressively decodes two-stream features. In this paper, we decode the cross-modal and multi-level features in a unified unit, named Attention Gated Recurrent Unit (AGRU). It can reduce the influence of low-quality depth image, and retain more semantic features in the progressive fusion process. Specifically, the features of different modalities and different levels are organized as the sequential input, recurrently fed into AGRU which consists of reset gate, update gate and memory unit to be selectively fused and adaptively memorized based on attention mechanism. Further, two-stage AGRU serves as the decoder of RGB-D salient object detection network, named AGRFNet. Due to the recurrent nature, it achieves the best performance with the little parameters. In order to further improve the performance, three auxiliary modules are designed to better fuse semantic information, refine the features of the shallow layer and enhance the local detail. Extensive experiments on seven widely used benchmark datasets demonstrate that AGRFNet performs favorably against 18 state-of-the-art RGB-D SOD approaches.

...read moreread less

Journal Article•DOI•

DualRefine: Self-Supervised Depth and Pose Estimation Through Iterative Epipolar Sampling and Refinement Toward Equilibrium

[...]

Antyanta Bangunharcana, A. S. Abul Magd, Kyung-Soo Kim

07 Apr 2023-arXiv.org

TL;DR: The DualRefine model as mentioned in this paper uses a deep equilibrium model framework to iteratively refine depth estimates and a hidden state of feature maps by computing local matching costs based on epipolar geometry.

...read moreread less

Abstract: Self-supervised multi-frame depth estimation achieves high accuracy by computing matching costs of pixel correspondences between adjacent frames, injecting geometric information into the network. These pixel-correspondence candidates are computed based on the relative pose estimates between the frames. Accurate pose predictions are essential for precise matching cost computation as they influence the epipolar geometry. Furthermore, improved depth estimates can, in turn, be used to align pose estimates. Inspired by traditional structure-from-motion (SfM) principles, we propose the DualRefine model, which tightly couples depth and pose estimation through a feedback loop. Our novel update pipeline uses a deep equilibrium model framework to iteratively refine depth estimates and a hidden state of feature maps by computing local matching costs based on epipolar geometry. Importantly, we used the refined depth estimates and feature maps to compute pose updates at each step. This update in the pose estimates slowly alters the epipolar geometry during the refinement process. Experimental results on the KITTI dataset demonstrate competitive depth prediction and odometry prediction performance surpassing published self-supervised baselines.

...read moreread less

Journal Article•DOI•

A scene segmentation algorithm combining the body and the edge of the object

[...]

Xianfeng Ou¹, Wang Hanpu¹, Wujing Li¹, Guoyun Zhang¹, Siyuan Chen¹ - Show less +1 more•Institutions (1)

Hunan Institute of Science and Technology¹

01 Mar 2022-Information Processing and Management

TL;DR: Zhang et al. as discussed by the authors proposed a context feature extraction module to refine the rough feature map in the intermediate stage to reduce the misclassification of the target object, which shows better results when compared with most of the state-of-the-art methods.

...read moreread less

Abstract: Scene segmentation is a very challenging task where convolutional neural networks are used in this field and have achieved very good results. Current scene segmentation methods often ignore the internal consistency of the target object, and lack to make full use of global and local context information which leads to the situation of object misclassification. In addition, most of the previous work focused on the segmentation of the main part of the object, however, there are few researches on the quality of the object edge segmentation. In this article, based on the use of flow information to maintain body consistency, the context feature extraction module is designed to fully consider the global and local body context information of the target object, refining the rough feature map in the intermediate stage. So, the misclassification of the target object is reduced. Besides, in the proposed edge attention module, the low-level feature map guided by the global feature and the edge feature map with semantic information obtained by intermediate process are connected to obtain more accurate edge detail information. Finally, the segmentation quality that contains the body part of the noise and the edge details can be improved. This paper not only conducts experiments on the classic FCN, PSPNet, and DeepLabv3+ several mainstream network architectures, but also on the real-time SFNet network structure proposed last year, and the value of mIoU in object and boundary is improved to verify the effectiveness of the method proposed in this paper. Moreover, in order to prove the robustness of the experiment, we conduct experiments on three complex scene segmentation data sets of Cityscapes, CamVid, and KiTTi, and obtained mIoU values of 80.52% on the Cityscapes validation data set, and 71.4%, 56.53% on the Camvid and KITTI test data set, which shows better results when compared with most of the state-of-the-art methods.

...read moreread less

References

PDF

Open Access

More filters

Posted Content•

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

[...]

Shaoqing Ren¹, Kaiming He², Ross Girshick³, Jian Sun²•Institutions (3)

University of Science and Technology of China¹, Microsoft², Facebook³

04 Jun 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.

...read moreread less

Abstract: State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features---using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

...read moreread less

23,183 citations

Journal Article•DOI•

Object Detection with Discriminatively Trained Part-Based Models

[...]

Pedro F. Felzenszwalb¹, Ross Girshick¹, David McAllester², Deva Ramanan³•Institutions (3)

University of Chicago¹, Toyota², University of California, Irvine³

01 Sep 2010-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: An object detection system based on mixtures of multiscale deformable part models that is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges is described.

...read moreread less

Abstract: We describe an object detection system based on mixtures of multiscale deformable part models. Our system is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges. While deformable part models have become quite popular, their value had not been demonstrated on difficult benchmarks such as the PASCAL data sets. Our system relies on new methods for discriminative training with partially labeled data. We combine a margin-sensitive approach for data-mining hard negative examples with a formalism we call latent SVM. A latent SVM is a reformulation of MI--SVM in terms of latent variables. A latent SVM is semiconvex, and the training problem becomes convex once latent information is specified for the positive examples. This leads to an iterative training algorithm that alternates between fixing latent values for positive examples and optimizing the latent SVM objective function.

...read moreread less

10,501 citations

"Salient Object Detection by Context..." refers methods in this paper

...Further, we use NonMaximum Suppression [6] to choose a fixed number of Re- gion of Interests (ROIs)....
[...]
...Further, we use NonMaximum Suppression [6] to choose a fixed number of Re-...
[...]

Proceedings Article•

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

[...]

Philipp Krähenbühl¹, Vladlen Koltun¹•Institutions (1)

Stanford University¹

12 Dec 2011

TL;DR: This paper considers fully connected CRF models defined on the complete set of pixels in an image and proposes a highly efficient approximate inference algorithm in which the pairwise edge potentials are defined by a linear combination of Gaussian kernels.

...read moreread less

Abstract: Most state-of-the-art techniques for multi-class image segmentation and labeling use conditional random fields defined over pixels or image regions. While region-level models often feature dense pairwise connectivity, pixel-level models are considerably larger and have only permitted sparse graph structures. In this paper, we consider fully connected CRF models defined on the complete set of pixels in an image. The resulting graphs have billions of edges, making traditional inference algorithms impractical. Our main contribution is a highly efficient approximate inference algorithm for fully connected CRF models in which the pairwise edge potentials are defined by a linear combination of Gaussian kernels. Our experiments demonstrate that dense connectivity at the pixel level substantially improves segmentation and labeling accuracy.

...read moreread less

3,233 citations

"Salient Object Detection by Context..." refers methods in this paper

...To further preserve boundary information and improve spatial coherence, we utilize fully connected Conditional Random Field (CRF) [11] to obtain the final saliency map output....
[...]

Proceedings Article•DOI•

Saliency Detection via Graph-Based Manifold Ranking

[...]

Chuan Yang¹, Lihe Zhang¹, Huchuan Lu¹, Xiang Ruan², Ming-Hsuan Yang³ - Show less +1 more•Institutions (3)

Dalian University of Technology¹, Omron², University of California, Merced³

23 Jun 2013

TL;DR: This work considers both foreground and background cues in a different way and ranks the similarity of the image elements with foreground cues or background cues via graph-based manifold ranking, defined based on their relevances to the given seeds or queries.

...read moreread less

Abstract: Most existing bottom-up methods measure the foreground saliency of a pixel or region based on its contrast within a local context or the entire image, whereas a few methods focus on segmenting out background regions and thereby salient objects Instead of considering the contrast between the salient objects and their surrounding regions, we consider both foreground and background cues in a different way We rank the similarity of the image elements (pixels or regions) with foreground cues or background cues via graph-based manifold ranking The saliency of the image elements is defined based on their relevances to the given seeds or queries We represent the image as a close-loop graph with super pixels as nodes These nodes are ranked based on the similarity to background and foreground queries, based on affinity matrices Saliency detection is carried out in a two-stage scheme to extract background regions and foreground salient objects efficiently Experimental results on two large benchmark databases demonstrate the proposed method performs well when against the state-of-the-art methods in terms of accuracy and speed We also create a more difficult benchmark database containing 5,172 images to test the proposed saliency model and make this database publicly available with this paper for further studies in the saliency field

...read moreread less

2,278 citations

"Salient Object Detection by Context..." refers methods in this paper

...We evaluate our framework on PASCAL-S [14], ECSSD [24], HKU-IS [12], DUTS-TE (DUTS test set) [26] and DUT-OMORON (partitioned for testing) [35] saliency datasets....
[...]
...Our proposed architecture is trained on DUT-OMRON dataset [35]....
[...]

Journal Article•DOI•

Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search.

[...]

Antonio Torralba¹, Aude Oliva¹, Monica S. Castelhano², John M. Henderson³•Institutions (3)

Massachusetts Institute of Technology¹, University of Massachusetts Amherst², University of Edinburgh³

01 Oct 2006-Psychological Review

TL;DR: An original approach of attentional guidance by global scene context is presented that combines bottom-up saliency, scene context, and top-down mechanisms at an early stage of visual processing and predicts the image regions likely to be fixated by human observers performing natural search tasks in real-world scenes.

...read moreread less

Abstract: Many experiments have shown that the human visual system makes extensive use of contextual information for facilitating object search in natural scenes. However, the question of how to formally model contextual influences is still open. On the basis of a Bayesian framework, the authors present an original approach of attentional guidance by global scene context. The model comprises 2 parallel pathways; one pathway computes local features (saliency) and the other computes global (scene-centered) features. The contextual guidance model of attention combines bottom-up saliency, scene context, and top-down mechanisms at an early stage of visual processing and predicts the image regions likely to be fixated by human observers performing natural search tasks in real-world scenes.

...read moreread less

1,613 citations

"Salient Object Detection by Context..." refers background in this paper

...In an image, contextual information determines the relative importance of objects in the image, which in turn determines the saliency of an object [25]....
[...]