Visual Saliency Detection via Convolutional Gated Recurrent Units

doi:10.1007/978-3-030-36711-4_15

Home
/
Papers
/
Visual Saliency Detection via Convolutional Gated Recurrent Units

Book Chapter•DOI•

Visual Saliency Detection via Convolutional Gated Recurrent Units

Sayanti Bardhan¹, Sukhendu Das¹, Shibu Jacob²•Institutions (2)

Indian Institute of Technology Madras¹, National Institute of Ocean Technology²

12 Dec 2019-pp 162-174

TL;DR: This work proposes a proposed novel end-to-end framework with a Contextual Unit (CTU) module that models the scene contextual information to give efficient saliency maps with the help of Convolutional GRU (Conv-GRU).

read less

Abstract: Context is an important aspect for accurate saliency detection. However, the question of how to formally model image context within saliency detection frameworks is still an open problem. Recent saliency detection models designed using complex Deep Neural Networks to extract robust features, however often fail to select the right contextual features. These methods generally utilize physical attributes of objects for generating final saliency maps, but ignores scene contextual information. In this paper, we overcome such limitation using (i) a proposed novel end-to-end framework with a Contextual Unit (CTU) module that models the scene contextual information to give efficient saliency maps with the help of Convolutional GRU (Conv-GRU). This is the first work reported so far that utilizes Conv-GRU to generate image saliency maps. In addition, (ii) we propose a novel way of using the Conv-GRU that helps to refine saliency maps based on input image context. The proposed model has been evaluated on challenging benchmark saliency datasets, where it outperforms prominent state-of-the-art methods.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Salient Object Detection by Contextual Refinement

[...]

Sayanti Bardhan¹•Institutions (1)

Indian Institute of Technology Madras¹

14 Jun 2020

TL;DR: A novel saliency detection framework with a Contextual Refinement Module (CRM) which consists of two sub-networks, Object Relation Unit (ORU) and Scene Context Unit (SCU) which captures complementary contextual information to give a holistic estimation of salient regions.

...read moreread less

Abstract: Context plays an important role in the saliency prediction task. In this work, we propose a saliency detection framework that not only extracts visual features but also models two kinds of context including object-object relationships within a single image and scene contextual information. Specifically, we develop a novel saliency detection framework with a Contextual Refinement Module (CRM) which consists of two sub-networks, Object Relation Unit (ORU) and Scene Context Unit (SCU). ORU encodes the object-object relationship based on object relative position and object co-occurrence pattern in an image, by graphical approach, while SCU incorporates the scene contextual information of an image. Object Relation Unit (ORU) and Scene Context Unit (SCU) captures complementary contextual information to give a holistic estimation of salient regions. Extensive experiments show the effectiveness of modelling object relations and scene context in boosting the performance of saliency prediction. In particular, our frame-work outperforms the state-of-the-art models on challenging benchmark datasets.

...read moreread less

6 citations

Journal Article•DOI•

BGRDNet: RGB-D salient object detection with a bidirectional gated recurrent decoding network

[...]

Zhengyi Liu, Yuan Wang, Zhili Zhang, Yacheng Tan

23 Mar 2022-Multimedia Tools and Applications

1 citations

Journal Article•DOI•

An Image Saliency Detection Method Based on Combining Global and Local Information

[...]

Huanming Yang, Yong-chao Gong, Kai Wang

19 Apr 2022-Mathematical Problems in Engineering

TL;DR: Experimental results show that the proposed saliency target detection algorithm can not only accurately and comprehensively extract significant target regions but also retain more texture information and complete edge information while satisfying the human visual experience.

...read moreread less

Abstract: In the field of computer vision, image saliency target detection can not only improve the accuracy of image detection but also accelerate the speed of image detection. In order to solve the existing problems of the saliency target detection algorithms at present, such as inconspicuous texture details and incomplete edge contour display, this paper proposes a saliency target detection algorithm integrating multiple information. The algorithm consists of three processes: preprocessing process, multi-information extraction process, and fusion optimization process. The frequency domain features of the image are calculated, the algorithm calculates the frequency domain features of the image, introduces power law transform and feature normalization, improves the frequency domain features of the image, saves the information of the target region, and inhibits the information of the background region. On three public MSRA, SED2, and ECSSD image datasets, the proposed algorithm is compared with other classical algorithms in subjective and objective comparison experiments. Experimental results show that the proposed algorithm can not only accurately and comprehensively extract significant target regions but also retain more texture information and complete edge information while satisfying the human visual experience. All evaluation indexes are significantly better than the comparison algorithm, showing good reliability and adaptability.

...read moreread less

Journal Article•DOI•

AGRFNet: Two-stage cross-modal and multi-level attention gated recurrent fusion network for RGB-D saliency detection

[...]

Zhengyi Liu, Yuan Wang, Yacheng Tan, Wei Li, Yun Xiao - Show less +1 more

01 Mar 2022-Signal Processing-image Communication

TL;DR: Zhang et al. as discussed by the authors proposed an Attention Gated Recurrent Unit (AGRU) for RGB-D saliency detection, which can reduce the influence of low-quality depth image, and retain more semantic features in the progressive fusion process.

...read moreread less

Abstract: RGB-D saliency detection aims to identify the most attractive objects in a pair of color and depth images. However, most existing models adopt classic U-Net framework which progressively decodes two-stream features. In this paper, we decode the cross-modal and multi-level features in a unified unit, named Attention Gated Recurrent Unit (AGRU). It can reduce the influence of low-quality depth image, and retain more semantic features in the progressive fusion process. Specifically, the features of different modalities and different levels are organized as the sequential input, recurrently fed into AGRU which consists of reset gate, update gate and memory unit to be selectively fused and adaptively memorized based on attention mechanism. Further, two-stage AGRU serves as the decoder of RGB-D salient object detection network, named AGRFNet. Due to the recurrent nature, it achieves the best performance with the little parameters. In order to further improve the performance, three auxiliary modules are designed to better fuse semantic information, refine the features of the shallow layer and enhance the local detail. Extensive experiments on seven widely used benchmark datasets demonstrate that AGRFNet performs favorably against 18 state-of-the-art RGB-D SOD approaches.

...read moreread less

References

PDF

Open Access

More filters

Journal Article•DOI•

Deeply Supervised Salient Object Detection with Short Connections

[...]

Qibin Hou¹, Ming-Ming Cheng¹, Xiaowei Hu¹, Ali Borji², Zhuowen Tu³, Philip H. S. Torr⁴ - Show less +2 more•Institutions (4)

Nankai University¹, University of Central Florida², University of California, San Diego³, University of Oxford⁴

01 Apr 2019-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A new saliency method is proposed by introducing short connections to the skip-layer structures within the HED architecture, which produces state-of-the-art results on 5 widely tested salient object detection benchmarks, with advantages in terms of efficiency, effectiveness, and simplicity over the existing algorithms.

...read moreread less

Abstract: Recent progress on salient object detection is substantial, benefiting mostly from the explosive development of Convolutional Neural Networks (CNNs). Semantic segmentation and salient object detection algorithms developed lately have been mostly based on Fully Convolutional Neural Networks (FCNs). There is still a large room for improvement over the generic FCN models that do not explicitly deal with the scale-space problem. The Holistically-Nested Edge Detector (HED) provides a skip-layer structure with deep supervision for edge and boundary detection, but the performance gain of HED on saliency detection is not obvious. In this paper, we propose a new salient object detection method by introducing short connections to the skip-layer structures within the HED architecture. Our framework takes full advantage of multi-level and multi-scale features extracted from FCNs, providing more advanced representations at each layer, a property that is critically needed to perform segment detection. Our method produces state-of-the-art results on 5 widely tested salient object detection benchmarks, with advantages in terms of efficiency (0.08 seconds per image), effectiveness, and simplicity over the existing algorithms. Beyond that, we conduct an exhaustive analysis of the role of training data on performance. We provide a training set for future research and fair comparisons.

...read moreread less

1,041 citations

Proceedings Article•DOI•

Saliency detection by multi-context deep learning

[...]

Rui Zhao¹, Wanli Ouyang², Hongsheng Li², Xiaogang Wang¹•Institutions (2)

Chinese Academy of Sciences¹, The Chinese University of Hong Kong²

07 Jun 2015

TL;DR: This paper proposes a multi-context deep learning framework for salient object detection that employs deep Convolutional Neural Networks to model saliency of objects in images and investigates different pre-training strategies to provide a better initialization for training the deep neural networks.

...read moreread less

Abstract: Low-level saliency cues or priors do not produce good enough saliency detection results especially when the salient object presents in a low-contrast background with confusing visual appearance. This issue raises a serious problem for conventional approaches. In this paper, we tackle this problem by proposing a multi-context deep learning framework for salient object detection. We employ deep Convolutional Neural Networks to model saliency of objects in images. Global context and local context are both taken into account, and are jointly modeled in a unified multi-context deep learning framework. To provide a better initialization for training the deep neural networks, we investigate different pre-training strategies, and a task-specific pre-training scheme is designed to make the multi-context modeling suited for saliency detection. Furthermore, recently proposed contemporary deep models in the ImageNet Image Classification Challenge are tested, and their effectiveness in saliency detection are investigated. Our approach is extensively evaluated on five public datasets, and experimental results show significant and consistent improvements over the state-of-the-art methods.

...read moreread less

983 citations

Proceedings Article•DOI•

Learning to Detect Salient Objects with Image-Level Supervision

[...]

Lijun Wang¹, Huchuan Lu¹, Yifan Wang¹, Mengyang Feng¹, Dong Wang¹, Baocai Yin¹, Xiang Ruan - Show less +3 more•Institutions (1)

Dalian University of Technology¹

21 Jul 2017

TL;DR: This paper develops a weakly supervised learning method for saliency detection using image-level tags only, which outperforms unsupervised ones with a large margin, and achieves comparable or even superior performance than fully supervised counterparts.

...read moreread less

Abstract: Deep Neural Networks (DNNs) have substantially improved the state-of-the-art in salient object detection. However, training DNNs requires costly pixel-level annotations. In this paper, we leverage the observation that image-level tags provide important cues of foreground salient objects, and develop a weakly supervised learning method for saliency detection using image-level tags only. The Foreground Inference Network (FIN) is introduced for this challenging task. In the first stage of our training method, FIN is jointly trained with a fully convolutional network (FCN) for image-level tag prediction. A global smooth pooling layer is proposed, enabling FCN to assign object category tags to corresponding object regions, while FIN is capable of capturing all potential foreground regions with the predicted saliency maps. In the second stage, FIN is fine-tuned with its predicted saliency maps as ground truth. For refinement of ground truth, an iterative Conditional Random Field is developed to enforce spatial label consistency and further boost performance. Our method alleviates annotation efforts and allows the usage of existing large scale training sets with image-level tags. Our model runs at 60 FPS, outperforms unsupervised ones with a large margin, and achieves comparable or even superior performance than fully supervised counterparts.

...read moreread less

909 citations

Posted Content•

The Secrets of Salient Object Segmentation

[...]

Yin Li¹, Xiaodi Hou², Christof Koch³, James M. Rehg¹, Alan L. Yuille⁴ - Show less +1 more•Institutions (4)

Georgia Institute of Technology¹, California Institute of Technology², Allen Institute for Brain Science³, University of California, Los Angeles⁴

11 Jun 2014-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the authors provide an extensive evaluation of fixation prediction and salient object segmentation algorithms as well as statistics of major datasets, and propose a new high quality dataset that offers both fixations and salient objects.

...read moreread less

Abstract: In this paper we provide an extensive evaluation of fixation prediction and salient object segmentation algorithms as well as statistics of major datasets. Our analysis identifies serious design flaws of existing salient object benchmarks, called the dataset design bias, by over emphasizing the stereotypical concepts of saliency. The dataset design bias does not only create the discomforting disconnection between fixations and salient object segmentation, but also misleads the algorithm designing. Based on our analysis, we propose a new high quality dataset that offers both fixation and salient object segmentation ground-truth. With fixations and salient object being presented simultaneously, we are able to bridge the gap between fixations and salient objects, and propose a novel method for salient object segmentation. Finally, we report significant benchmark progress on three existing datasets of segmenting salient objects

...read moreread less

878 citations

Proceedings Article•DOI•

DHSNet: Deep Hierarchical Saliency Network for Salient Object Detection

[...]

Nian Liu, Junwei Han

27 Jun 2016

TL;DR: Evaluations on four benchmark datasets and comparisons with other 11 state-of-the-art algorithms demonstrate that DHSNet not only shows its significant superiority in terms of performance, but also achieves a real-time speed of 23 FPS on modern GPUs.

...read moreread less

Abstract: Traditional1 salient object detection models often use hand-crafted features to formulate contrast and various prior knowledge, and then combine them artificially. In this work, we propose a novel end-to-end deep hierarchical saliency network (DHSNet) based on convolutional neural networks for detecting salient objects. DHSNet first makes a coarse global prediction by automatically learning various global structured saliency cues, including global contrast, objectness, compactness, and their optimal combination. Then a novel hierarchical recurrent convolutional neural network (HRCNN) is adopted to further hierarchically and progressively refine the details of saliency maps step by step via integrating local context information. The whole architecture works in a global to local and coarse to fine manner. DHSNet is directly trained using whole images and corresponding ground truth saliency masks. When testing, saliency maps can be generated by directly and efficiently feedforwarding testing images through the network, without relying on any other techniques. Evaluations on four benchmark datasets and comparisons with other 11 state-of-the-art algorithms demonstrate that DHSNet not only shows its significant superiority in terms of performance, but also achieves a real-time speed of 23 FPS on modern GPUs.

...read moreread less

770 citations