scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Deep Level Sets for Salient Object Detection

01 Jul 2017-pp 540-549
TL;DR: This work proposes a deep Level Set network to produce compact and uniform saliency maps and drives the network to learn a Level Set function for salient objects so it can output more accurate boundaries and compact saliency.
Abstract: Deep learning has been applied to saliency detection in recent years. The superior performance has proved that deep networks can model the semantic properties of salient objects. Yet it is difficult for a deep network to discriminate pixels belonging to similar receptive fields around the object boundaries, thus deep networks may output maps with blurred saliency and inaccurate boundaries. To tackle such an issue, in this work, we propose a deep Level Set network to produce compact and uniform saliency maps. Our method drives the network to learn a Level Set function for salient objects so it can output more accurate boundaries and compact saliency. Besides, to propagate saliency information among pixels and recover full resolution saliency map, we extend a superpixel-based guided filter to be a layer in the network. The proposed network has a simple structure and is trained end-to-end. During testing, the network can produce saliency maps by efficiently feedforwarding testing images at a speed over 12FPS on GPUs. Evaluations on benchmark datasets show that the proposed method achieves state-of-the-art performance.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
01 Jun 2019
TL;DR: Experimental results on six public datasets show that the proposed predict-refine architecture, BASNet, outperforms the state-of-the-art methods both in terms of regional and boundary evaluation measures.
Abstract: Deep Convolutional Neural Networks have been adopted for salient object detection and achieved the state-of-the-art performance. Most of the previous works however focus on region accuracy but not on the boundary quality. In this paper, we propose a predict-refine architecture, BASNet, and a new hybrid loss for Boundary-Aware Salient object detection. Specifically, the architecture is composed of a densely supervised Encoder-Decoder network and a residual refinement module, which are respectively in charge of saliency prediction and saliency map refinement. The hybrid loss guides the network to learn the transformation between the input image and the ground truth in a three-level hierarchy -- pixel-, patch- and map- level -- by fusing Binary Cross Entropy (BCE), Structural SIMilarity (SSIM) and Intersection-over-Union (IoU) losses. Equipped with the hybrid loss, the proposed predict-refine architecture is able to effectively segment the salient object regions and accurately predict the fine structures with clear boundaries. Experimental results on six public datasets show that our method outperforms the state-of-the-art methods both in terms of regional and boundary evaluation measures. Our method runs at over 25 fps on a single GPU. The code is available at: https://github.com/NathanUA/BASNet.

962 citations


Cites background from "Deep Level Sets for Salient Object ..."

  • ...[18] proposed to learn a Level Set [48] function to output accurate boundaries and compact saliency....

    [...]

Proceedings ArticleDOI
01 Oct 2019
TL;DR: In this article, an edge guidance network (EGNet) is proposed for salient object detection with three steps to simultaneously model these two kinds of complementary information in a single network, which can help locate salient objects especially their boundaries more accurately.
Abstract: Fully convolutional neural networks (FCNs) have shown their advantages in the salient object detection task. However, most existing FCNs-based methods still suffer from coarse object boundaries. In this paper, to solve this problem, we focus on the complementarity between salient edge information and salient object information. Accordingly, we present an edge guidance network (EGNet) for salient object detection with three steps to simultaneously model these two kinds of complementary information in a single network. In the first step, we extract the salient object features by a progressive fusion way. In the second step, we integrate the local edge information and global location information to obtain the salient edge features. Finally, to sufficiently leverage these complementary features, we couple the same salient edge features with salient object features at various resolutions. Benefiting from the rich edge information and location information in salient edge features, the fused features can help locate salient objects, especially their boundaries more accurately. Experimental results demonstrate that the proposed method performs favorably against the state-of-the-art methods on six widely used datasets without any pre-processing and post-processing. The source code is available at http: //mmcheng.net/egnet/.

803 citations

Proceedings ArticleDOI
18 Jun 2018
TL;DR: Zhang et al. as discussed by the authors proposed a pixel-wise contextual attention network to learn to selectively attend to informative context locations for each pixel, which can generate an attention map in which each attention weight corresponds to the contextual relevance at each context location.
Abstract: Contexts play an important role in the saliency detection task. However, given a context region, not all contextual information is helpful for the final task. In this paper, we propose a novel pixel-wise contextual attention network, i.e., the PiCANet, to learn to selectively attend to informative context locations for each pixel. Specifically, for each pixel, it can generate an attention map in which each attention weight corresponds to the contextual relevance at each context location. An attended contextual feature can then be constructed by selectively aggregating the contextual information. We formulate the proposed PiCANet in both global and local forms to attend to global and local contexts, respectively. Both models are fully differentiable and can be embedded into CNNs for joint training. We also incorporate the proposed models with the U-Net architecture to detect salient objects. Extensive experiments show that the proposed PiCANets can consistently improve saliency detection performance. The global and local PiCANets facilitate learning global contrast and homogeneousness, respectively. As a result, our saliency model can detect salient objects more accurately and uniformly, thus performing favorably against the state-of-the-art methods.

631 citations

Proceedings ArticleDOI
15 Jun 2019
TL;DR: Exhaustive experiments confirm that the proposed pyramid attention and salient edges are effective for salient object detection and the deep saliency model outperforms state-of-the-art approaches for several benchmarks with a fast processing speed (25fps on one GPU).
Abstract: This paper presents a new method for detecting salient objects in images using convolutional neural networks (CNNs). The proposed network, named PAGE-Net, offers two key contributions. The first is the exploitation of an essential pyramid attention structure for salient object detection. This enables the network to concentrate more on salient regions while considering multi-scale saliency information. Such a stacked attention design provides a powerful tool to efficiently improve the representation ability of the corresponding network layer with an enlarged receptive field. The second contribution lies in the emphasis on the importance of salient edges. Salient edge information offers a strong cue to better segment salient objects and refine object boundaries. To this end, our model is equipped with a salient edge detection module, which is learned for precise salient boundary estimation. This encourages better edge-preserving salient object segmentation. Exhaustive experiments confirm that the proposed pyramid attention and salient edges are effective for salient object detection. We show that our deep saliency model outperforms state-of-the-art approaches for several benchmarks with a fast processing speed (25fps on one GPU).

464 citations


Cites methods from "Deep Level Sets for Salient Object ..."

  • ...We compare the proposed PAGE-Net against 19 recent deep learning based alternatives: MDF [21], LEGS [34], DS [24], DCL [22], ELD [20], MC [57], RFCN [36], DHS [26], HEDS [14], KSR [38], NLDF [29], DLS [15], AMU [54], UCF [55], SRM [37], FSN [8], PAGR [56], RAS [7] and C2S [23]. we use either the implementations with the recommended parameter settings or the saliency maps shared by the authors....

    [...]

  • ...7 shows a visual comparison of the results of our method against those of five other top- Method LEGS [34] MDF [21] DS [24] DCL [22] ELD [20] Time(s) 1.54 7.83 0.13 0.39 0.55 Method RFCN [36] DHS [26] HEDS [14] KSR [38] NLDF [29] Time(s) 4.65 0.04 0.57 49.64 0.09 Method DLS [15] AMU [54] UCF [55] SRM [37] PAGE-Net Time(s) 0.08 0.07 0.04 0.07 0.04 Table 2: Runtime comparison (GPU time) with previous deep learning based saliency models....

    [...]

  • ...We compare the proposed PAGE-Net against 19 recent deep learning based alternatives: MDF [21], LEGS [34], DS [24], DCL [22], ELD [20], MC [57], RFCN [36], DHS [26], HEDS [14], KSR [38], NLDF [29], DLS [15], AMU [54], UCF [55], SRM [37], FSN [8], PAGR [56], RAS [7] and C2S [23]....

    [...]

  • ...Method DLS [15] AMU [54] UCF [55] SRM [37] PAGE-Net...

    [...]

  • ...For example, some methods integrate deep learning models with hand-crafted features [20], heuristic saliency priors [36], level set [15], contextual information [57], or explicit visual fixation [40]....

    [...]

Book ChapterDOI
08 Sep 2018
TL;DR: An accurate yet compact deep network for efficient salient object detection that employs residual learning to learn side-output residual features for saliency refinement, which can be achieved with very limited convolutional parameters while keep accuracy.
Abstract: Benefit from the quick development of deep learning techniques, salient object detection has achieved remarkable progresses recently. However, there still exists following two major challenges that hinder its application in embedded devices, low resolution output and heavy model weight. To this end, this paper presents an accurate yet compact deep network for efficient salient object detection. More specifically, given a coarse saliency prediction in the deepest layer, we first employ residual learning to learn side-output residual features for saliency refinement, which can be achieved with very limited convolutional parameters while keep accuracy. Secondly, we further propose reverse attention to guide such side-output residual learning in a top-down manner. By erasing the current predicted salient regions from side-output features, the network can eventually explore the missing object parts and details which results in high resolution and accuracy. Experiments on six benchmark datasets demonstrate that the proposed approach compares favorably against state-of-the-art methods, and with advantages in terms of simplicity, efficiency (45 FPS) and model size (81 MB).

448 citations


Cites background or methods from "Deep Level Sets for Salient Object ..."

  • ...We compare the proposed method with 10 state-of-the-art ones, including 9 recent CNN-based approaches, DCL+ [22], DHS [26], SSD [16], RFCN [39], DLS [10], NLDF [30], DSS and DSS+ [8], Amulet [45], UCF [46], and one conventional top approach, DRFI [13], where symbol “+” indicates that the network includes CRF-based post-processing....

    [...]

  • ...We compare the proposed method with 10 state-of-the-art ones, including 9 recent CNN-based approaches, DCL [8], DHS [44], SSD [45], RFCN [9], DLS [23], NLDF [10], DSS and DSS [11], Amulet [13], UCF [14], and one conventional top approach, DRFI [42], where symbol “+” indicates that the network includes CRF-based post-processing....

    [...]

  • ...Recently, dilated convolution [23] and dense connections [17] are further incorporated to obtain high resolution saliency map....

    [...]

  • ...[23] entended a superpixel-based guided filter to be a layer in the network for boundary refinement....

    [...]

  • ..., superpixel-based filter [23], fully connected conditional random field (CRF) [8,11,24]....

    [...]

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Proceedings Article
01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

111,197 citations


"Deep Level Sets for Salient Object ..." refers methods in this paper

  • ...We use Adam [23] with an initial learning rate of 1e-4 to update the weights....

    [...]

Proceedings Article
01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

49,914 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: DenseNet as mentioned in this paper proposes to connect each layer to every other layer in a feed-forward fashion, which can alleviate the vanishing gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.
Abstract: Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections—one between each layer and its subsequent layer—our network has L(L+1)/2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less memory and computation to achieve high performance. Code and pre-trained models are available at https://github.com/liuzhuang13/DenseNet.

27,821 citations

Book
01 Jan 1941
TL;DR: In this paper, the authors present a theory for linear PDEs: Sobolev spaces Second-order elliptic equations Linear evolution equations, Hamilton-Jacobi equations and systems of conservation laws.
Abstract: Introduction Part I: Representation formulas for solutions: Four important linear partial differential equations Nonlinear first-order PDE Other ways to represent solutions Part II: Theory for linear partial differential equations: Sobolev spaces Second-order elliptic equations Linear evolution equations Part III: Theory for nonlinear partial differential equations: The calculus of variations Nonvariational techniques Hamilton-Jacobi equations Systems of conservation laws Appendices Bibliography Index.

25,734 citations