scispace - formally typeset

Book ChapterDOI

Visual Saliency Detection via Convolutional Gated Recurrent Units

12 Dec 2019-pp 162-174

TL;DR: This work proposes a proposed novel end-to-end framework with a Contextual Unit (CTU) module that models the scene contextual information to give efficient saliency maps with the help of Convolutional GRU (Conv-GRU).
Abstract: Context is an important aspect for accurate saliency detection. However, the question of how to formally model image context within saliency detection frameworks is still an open problem. Recent saliency detection models designed using complex Deep Neural Networks to extract robust features, however often fail to select the right contextual features. These methods generally utilize physical attributes of objects for generating final saliency maps, but ignores scene contextual information. In this paper, we overcome such limitation using (i) a proposed novel end-to-end framework with a Contextual Unit (CTU) module that models the scene contextual information to give efficient saliency maps with the help of Convolutional GRU (Conv-GRU). This is the first work reported so far that utilizes Conv-GRU to generate image saliency maps. In addition, (ii) we propose a novel way of using the Conv-GRU that helps to refine saliency maps based on input image context. The proposed model has been evaluated on challenging benchmark saliency datasets, where it outperforms prominent state-of-the-art methods.
Citations
More filters

Proceedings ArticleDOI
Sayanti Bardhan1Institutions (1)
14 Jun 2020-
TL;DR: A novel saliency detection framework with a Contextual Refinement Module (CRM) which consists of two sub-networks, Object Relation Unit (ORU) and Scene Context Unit (SCU) which captures complementary contextual information to give a holistic estimation of salient regions.
Abstract: Context plays an important role in the saliency prediction task. In this work, we propose a saliency detection framework that not only extracts visual features but also models two kinds of context including object-object relationships within a single image and scene contextual information. Specifically, we develop a novel saliency detection framework with a Contextual Refinement Module (CRM) which consists of two sub-networks, Object Relation Unit (ORU) and Scene Context Unit (SCU). ORU encodes the object-object relationship based on object relative position and object co-occurrence pattern in an image, by graphical approach, while SCU incorporates the scene contextual information of an image. Object Relation Unit (ORU) and Scene Context Unit (SCU) captures complementary contextual information to give a holistic estimation of salient regions. Extensive experiments show the effectiveness of modelling object relations and scene context in boosting the performance of saliency prediction. In particular, our frame-work outperforms the state-of-the-art models on challenging benchmark datasets.

1 citations


References
More filters

Proceedings Article
Karen Simonyan1, Andrew Zisserman1Institutions (1)
01 Jan 2015-
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

49,857 citations


Proceedings Article
Karen Simonyan1, Andrew Zisserman1Institutions (1)
04 Sep 2014-
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

38,283 citations


Proceedings ArticleDOI
01 Jan 2014-
Abstract: In this paper, we propose a novel neural network model called RNN Encoder‐ Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixedlength vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder‐Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases.

14,140 citations


Posted Content
TL;DR: These advanced recurrent units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU), are found to be comparable to LSTM.
Abstract: In this paper we compare different types of recurrent units in recurrent neural networks (RNNs). Especially, we focus on more sophisticated units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU). We evaluate these recurrent units on the tasks of polyphonic music modeling and speech signal modeling. Our experiments revealed that these advanced recurrent units are indeed better than more traditional recurrent units such as tanh units. Also, we found GRU to be comparable to LSTM.

7,349 citations


Proceedings Article
Philipp Krähenbühl1, Vladlen Koltun1Institutions (1)
12 Dec 2011-
TL;DR: This paper considers fully connected CRF models defined on the complete set of pixels in an image and proposes a highly efficient approximate inference algorithm in which the pairwise edge potentials are defined by a linear combination of Gaussian kernels.
Abstract: Most state-of-the-art techniques for multi-class image segmentation and labeling use conditional random fields defined over pixels or image regions. While region-level models often feature dense pairwise connectivity, pixel-level models are considerably larger and have only permitted sparse graph structures. In this paper, we consider fully connected CRF models defined on the complete set of pixels in an image. The resulting graphs have billions of edges, making traditional inference algorithms impractical. Our main contribution is a highly efficient approximate inference algorithm for fully connected CRF models in which the pairwise edge potentials are defined by a linear combination of Gaussian kernels. Our experiments demonstrate that dense connectivity at the pixel level substantially improves segmentation and labeling accuracy.

2,822 citations


Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20201