scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Revisiting Co-Saliency Detection: A Novel Approach Based on Two-Stage Multi-View Spectral Rotation Co-clustering

01 Jul 2017-IEEE Transactions on Image Processing (IEEE)-Vol. 26, Iss: 7, pp 3196-3209
TL;DR: This paper revisits the co- saliency detection task and advances its development into a new phase, where the problem setting is generalized to allow the image group to contain objects in arbitrary number of categories and the algorithms need to simultaneously detect multi-class co-salient objects from such complex data.
Abstract: With the goal of discovering the common and salient objects from the given image group, co-saliency detection has received tremendous research interest in recent years. However, as most of the existing co-saliency detection methods are performed based on the assumption that all the images in the given image group should contain co-salient objects in only one category, they can hardly be applied in practice, particularly for the large-scale image set obtained from the Internet. To address this problem, this paper revisits the co-saliency detection task and advances its development into a new phase, where the problem setting is generalized to allow the image group to contain objects in arbitrary number of categories and the algorithms need to simultaneously detect multi-class co-salient objects from such complex data. To solve this new challenge, we decompose it into two sub-problems, i.e., how to identify subgroups of relevant images and how to discover relevant co-salient objects from each subgroup, and propose a novel co-saliency detection framework to correspondingly address the two sub-problems via two-stage multi-view spectral rotation co-clustering. Comprehensive experiments on two publically available benchmarks demonstrate the effectiveness of the proposed approach. Notably, it can even outperform the state-of-the-art co-saliency detection methods, which are performed based on the image subgroups carefully separated by the human labor.
Citations
More filters
Journal ArticleDOI
TL;DR: This paper proposes a simple but effective method to learn discriminative CNNs (D-CNNs) to boost the performance of remote sensing image scene classification and comprehensively evaluates the proposed method on three publicly available benchmark data sets using three off-the-shelf CNN models.
Abstract: Remote sensing image scene classification is an active and challenging task driven by many applications. More recently, with the advances of deep learning models especially convolutional neural networks (CNNs), the performance of remote sensing image scene classification has been significantly improved due to the powerful feature representations learnt through CNNs. Although great success has been obtained so far, the problems of within-class diversity and between-class similarity are still two big challenges. To address these problems, in this paper, we propose a simple but effective method to learn discriminative CNNs (D-CNNs) to boost the performance of remote sensing image scene classification. Different from the traditional CNN models that minimize only the cross entropy loss, our proposed D-CNN models are trained by optimizing a new discriminative objective function. To this end, apart from minimizing the classification error, we also explicitly impose a metric learning regularization term on the CNN features. The metric learning regularization enforces the D-CNN models to be more discriminative so that, in the new D-CNN feature spaces, the images from the same scene class are mapped closely to each other and the images of different classes are mapped as farther apart as possible. In the experiments, we comprehensively evaluate the proposed method on three publicly available benchmark data sets using three off-the-shelf CNN models. Experimental results demonstrate that our proposed D-CNN methods outperform the existing baseline methods and achieve state-of-the-art results on all three data sets.

1,001 citations

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a skip-layer network structure to predict human attention from multiple convolutional layers with various reception fields, which significantly decreases the redundancy of previous approaches of learning multiple network streams with different input scales.
Abstract: In this paper, we aim to predict human eye fixation with view-free scenes based on an end-to-end deep learning architecture. Although convolutional neural networks (CNNs) have made substantial improvement on human attention prediction, it is still needed to improve the CNN-based attention models by efficiently leveraging multi-scale features. Our visual attention network is proposed to capture hierarchical saliency information from deep, coarse layers with global saliency information to shallow, fine layers with local saliency response. Our model is based on a skip-layer network structure, which predicts human attention from multiple convolutional layers with various reception fields. Final saliency prediction is achieved via the cooperation of those global and local predictions. Our model is learned in a deep supervision manner, where supervision is directly fed into multi-level layers, instead of previous approaches of providing supervision only at the output layer and propagating this supervision back to earlier layers. Our model thus incorporates multi-level saliency predictions within a single network, which significantly decreases the redundancy of previous approaches of learning multiple network streams with different input scales. Extensive experimental analysis on various challenging benchmark data sets demonstrate our method yields the state-of-the-art performance with competitive inference time. 1 1 Our source code is available at https://github.com/wenguanwang/deepattention .

532 citations

Journal ArticleDOI
TL;DR: This paper proposes a novel deep-learning-based object detection framework including region proposal network (RPN) and local-contextual feature fusion network designed for remote sensing images that can deal with the multiangle and multiscale characteristics of geospatial objects.
Abstract: Most of the existing deep-learning-based methods are difficult to effectively deal with the challenges faced for geospatial object detection such as rotation variations and appearance ambiguity. To address these problems, this paper proposes a novel deep-learning-based object detection framework including region proposal network (RPN) and local-contextual feature fusion network designed for remote sensing images. Specifically, the RPN includes additional multiangle anchors besides the conventional multiscale and multiaspect-ratio ones, and thus can deal with the multiangle and multiscale characteristics of geospatial objects. To address the appearance ambiguity problem, we propose a double-channel feature fusion network that can learn local and contextual properties along two independent pathways. The two kinds of features are later combined in the final layers of processing in order to form a powerful joint representation. Comprehensive evaluations on a publicly available ten-class object detection data set demonstrate the effectiveness of the proposed method.

296 citations

Journal ArticleDOI
TL;DR: This letter proposes a novel feature representation method for scene classification, named bag of convolutional features (BoCF), different from the traditional bag of visual words-based methods in which the visual words are usually obtained by using handcrafted feature descriptors, the proposed BoCF generates visual words from deep convolutionAL features using off-the-shelf Convolutional neural networks.
Abstract: More recently, remote sensing image classification has been moving from pixel-level interpretation to scene-level semantic understanding, which aims to label each scene image with a specific semantic class. While significant efforts have been made in developing various methods for remote sensing image scene classification, most of them rely on handcrafted features. In this letter, we propose a novel feature representation method for scene classification, named bag of convolutional features (BoCF). Different from the traditional bag of visual words-based methods in which the visual words are usually obtained by using handcrafted feature descriptors, the proposed BoCF generates visual words from deep convolutional features using off-the-shelf convolutional neural networks. Extensive evaluations on a publicly available remote sensing image scene classification benchmark and comparison with the state-of-the-art methods demonstrate the effectiveness of the proposed BoCF method for remote sensing image scene classification.

276 citations


Cites background from "Revisiting Co-Saliency Detection: A..."

  • ...More recently, various deep learning algorithms, especially convolutional neural networks (CNNs), have shown their much stronger feature representation power in the field of computer vision [26]–[30]....

    [...]

Journal ArticleDOI
TL;DR: A band grouping-based long short-term memory model and a multiscale convolutional neural network are proposed as the spectral and spatial feature extractors, respectively, for the hyperspectral image (HSI) classification.
Abstract: In this paper, we propose a spectral–spatial unified network (SSUN) with an end-to-end architecture for the hyperspectral image (HSI) classification. Different from traditional spectral–spatial classification frameworks where the spectral feature extraction (FE), spatial FE, and classifier training are separated, these processes are integrated into a unified network in our model. In this way, both FE and classifier training will share a uniform objective function and all the parameters in the network can be optimized at the same time. In the implementation of the SSUN, we propose a band grouping-based long short-term memory model and a multiscale convolutional neural network as the spectral and spatial feature extractors, respectively. In the experiments, three benchmark HSIs are utilized to evaluate the performance of the proposed method. The experimental results demonstrate that the SSUN can yield a competitive performance compared with existing methods.

259 citations


Cites background from "Revisiting Co-Saliency Detection: A..."

  • ...vision and artificial intelligence [28]–[31], a promising way to extract deep features for hyperspectral data has become...

    [...]

References
More filters
Proceedings ArticleDOI
07 Jun 2015
TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
Abstract: Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet [20], the VGG net [31], and GoogLeNet [32]) into fully convolutional networks and transfer their learned representations by fine-tuning [3] to the segmentation task. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes less than one fifth of a second for a typical image.

28,225 citations

Posted Content
TL;DR: It is shown that convolutional networks by themselves, trained end- to-end, pixels-to-pixels, improve on the previous best result in semantic segmentation.
Abstract: Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build "fully convolutional" networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet, the VGG net, and GoogLeNet) into fully convolutional networks and transfer their learned representations by fine-tuning to the segmentation task. We then define a novel architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes one third of a second for a typical image.

9,803 citations

Journal ArticleDOI
TL;DR: In this article, the authors present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches, and discuss the advantages and disadvantages of these algorithms.
Abstract: In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works at all and what it really does. The goal of this tutorial is to give some intuition on those questions. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.

9,141 citations

Journal ArticleDOI
TL;DR: A new superpixel algorithm is introduced, simple linear iterative clustering (SLIC), which adapts a k-means clustering approach to efficiently generate superpixels and is faster and more memory efficient, improves segmentation performance, and is straightforward to extend to supervoxel generation.
Abstract: Computer vision applications have come to rely increasingly on superpixels in recent years, but it is not always clear what constitutes a good superpixel algorithm. In an effort to understand the benefits and drawbacks of existing methods, we empirically compare five state-of-the-art superpixel algorithms for their ability to adhere to image boundaries, speed, memory efficiency, and their impact on segmentation performance. We then introduce a new superpixel algorithm, simple linear iterative clustering (SLIC), which adapts a k-means clustering approach to efficiently generate superpixels. Despite its simplicity, SLIC adheres to boundaries as well as or better than previous methods. At the same time, it is faster and more memory efficient, improves segmentation performance, and is straightforward to extend to supervoxel generation.

7,849 citations


"Revisiting Co-Saliency Detection: A..." refers methods in this paper

  • ...For each image, the object proposals extraction was performed by objectness method [12] with the object proposals number Nop being 100 and the superpixel over segmentation was carried out by the SLIC method [13] with the superpixel number Nq being 200....

    [...]

  • ...images, we adopt SLIC (Simple Linear Iterative Clustering) method [13] to obtain Nq superpixels and use the same object proposals extracted in the first stage....

    [...]

  • ...In the second stage, we first extract superpixel regions from each image in the obtained same subgroup by using SLIC method [13] and then construct...

    [...]

Journal ArticleDOI
TL;DR: This work presents a simple and efficient implementation of Lloyd's k-means clustering algorithm, which it calls the filtering algorithm, and establishes the practical efficiency of the algorithm's running time.
Abstract: In k-means clustering, we are given a set of n data points in d-dimensional space R/sup d/ and an integer k and the problem is to determine a set of k points in Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's (1982) algorithm. We present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time, which shows that the algorithm runs faster as the separation between clusters increases. Second, we present a number of empirical studies both on synthetically generated data and on real data sets from applications in color quantization, data compression, and image segmentation.

5,288 citations


"Revisiting Co-Saliency Detection: A..." refers methods in this paper

  • ..., k-means [46] and spectral clustering [47] ) and SRCC algorithm proposed...

    [...]