scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Unsupervised GIST based Clustering for Object Localization

TL;DR: The work presented in this paper is based on the localization of a single object instance, in an image, in a fully unsupervised manner, and is comparable with various state-of-the-art weakly supervised and unsuper supervised approaches for the problem of localization of an object.
Abstract: In the past years, there have been several attempts for the task of object localization in an image. However, most of the algorithms for object localization have been either supervised or weakly supervised. The work presented in this paper is based on the localization of a single object instance, in an image, in a fully unsupervised manner. Initially, from the input image, object proposals are generated where the proposal score for each of these proposals is calculated using a saliency map. Next, a graph by the GIST feature similarity between each pair of proposals is constructed. Density-based spatial clustering of applications with noise (DBSCAN) is used to make clusters of proposals based on GIST similarity, which eventually helps us in the final localization of the object. The setup is evaluated on two challenging benchmark datasets - PASCAL VOC 2007 dataset and object discovery dataset. The performance of the proposed approach is observed to be comparable with various state-of-the-art weakly supervised and unsupervised approaches for the problem of localization of an object.
Citations
More filters
01 Jan 2006

3,012 citations

Journal ArticleDOI
TL;DR: This research investigates human action recognition in still images and utilizes deep ensemble learning to automatically decompose the body pose and perceive its background information and proposes an end-to-endDeep ensemble learning based on the weight optimization (DELWO) model that contributes to fusing the deep information derived from multiple models automatically from the data.
Abstract: Numerous human actions such as “Phoning,” “PlayingGuitar,” and “RidingHorse” can be inferred by static cue-based approaches even if their motions in video are available considering one single still image may already sufficiently explain a particular action. In this research, we investigate human action recognition in still images and utilize deep ensemble learning to automatically decompose the body pose and perceive its background information. Firstly, we construct an end-to-end NCNN-based model by attaching the nonsequential convolutional neural network (NCNN) module to the top of the pretrained model. The nonsequential network topology of NCNN can separately learn the spatial- and channel-wise features with parallel branches, which helps improve the model performance. Subsequently, in order to further exploit the advantage of the nonsequential topology, we propose an end-to-end deep ensemble learning based on the weight optimization (DELWO) model. It contributes to fusing the deep information derived from multiple models automatically from the data. Finally, we design the deep ensemble learning based on voting strategy (DELVS) model to pool together multiple deep models with weighted coefficients to obtain a better prediction. More importantly, the model complexity can be reduced by lessening the number of trainable parameters, thereby effectively mitigating overfitting issues of the model in small datasets to some extent. We conduct experiments in Li’s action dataset, uncropped and 1.5x cropped Willow action datasets, and the results have validated the effectiveness and robustness of our proposed models in terms of mitigating overfitting issues in small datasets. Finally, we open source our code for the model in GitHub ( https://github.com/yxchspring/deep_ensemble_learning ) in order to share our model with the community.

38 citations


Cites methods from "Unsupervised GIST based Clustering ..."

  • ...For the nondeep ensemble learning approaches, only the performance of the voting-based approach surpasses that of GIST, while the remaining RF and GBM achieve worse results compared with the GIST approach....

    [...]

  • ...Figure 10 shows that almost all comparative methods including SURF, BOF, PBOF, GIST, and nondeep ensemble learning approaches including RF, GM, and voting fail in this task, and only BOF, PBOF, GIST, RF, GM, and voting approaches show good results in Li’s action dataset....

    [...]

  • ...Similar to the results in Table 2, the performance of nondeep ensemble learning approaches does not exceed the GIST approach....

    [...]

  • ...At the same time, the GIST [43, 44] (with SVM classifier) method was also attached to the comparison experiments....

    [...]

  • ...All the evaluation of nondeep ensemble learning approaches is based on the 512-dimensional GIST descriptors....

    [...]

References
More filters
Proceedings ArticleDOI
20 Jun 2005
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Abstract: We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.

31,952 citations


"Unsupervised GIST based Clustering ..." refers methods in this paper

  • ...They have formulated the task as an undirected graph using HOG descriptor [11] and performed iterative spectral clustering on the graph constructed....

    [...]

Proceedings Article
02 Aug 1996
TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.
Abstract: Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, we present the new clustering algorithm DBSCAN relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape. DBSCAN requires only one input parameter and supports the user in determining an appropriate value for it. We performed an experimental evaluation of the effectiveness and efficiency of DBSCAN using synthetic data and real data of the SEQUOIA 2000 benchmark. The results of our experiments demonstrate that (1) DBSCAN is significantly more effective in discovering clusters of arbitrary shape than the well-known algorithm CLAR-ANS, and that (2) DBSCAN outperforms CLARANS by a factor of more than 100 in terms of efficiency.

17,056 citations

Journal ArticleDOI
TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.
Abstract: The Pascal Visual Object Classes (VOC) challenge is a benchmark in visual object category recognition and detection, providing the vision and machine learning communities with a standard dataset of images and annotation, and standard evaluation procedures. Organised annually from 2005 to present, the challenge and its associated dataset has become accepted as the benchmark for object detection. This paper describes the dataset and evaluation procedure. We review the state-of-the-art in evaluated methods for both classification and detection, analyse whether the methods are statistically different, what they are learning from the images (e.g. the object or its context), and what the methods find easy or confuse. The paper concludes with lessons learnt in the three year history of the challenge, and proposes directions for future improvement and extension.

15,935 citations


"Unsupervised GIST based Clustering ..." refers background in this paper

  • ...Co-localization has the same type of input as co-segmentation ([15], [27], [29])....

    [...]

Proceedings Article
01 Jan 1996
TL;DR: DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.
Abstract: Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, we present the new clustering algorithm DBSCAN relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape. DBSCAN requires only one input parameter and supports the user in determining an appropriate value for it. We performed an experimental evaluation of the effectiveness and efficiency of DBSCAN using synthetic data and real data of the SEQUOIA 2000 benchmark. The results of our experiments demonstrate that (1) DBSCAN is significantly more effective in discovering clusters of arbitrary shape than the well-known algorithm CLARANS, and that (2) DBSCAN outperforms CLARANS by a factor of more than 100 in terms of efficiency.

14,297 citations


"Unsupervised GIST based Clustering ..." refers methods in this paper

  • ...Based on the perceptual similarity, the proposals are clustered using DBSCAN....

    [...]

  • ...Finally, for obtaining the final localization window Pf inal, we consider the mean of all the coordinates of the clustered proposals obtained after DBSCAN in this final set....

    [...]

  • ...At each step, we change the input parameter, maximum distance between two points in a cluster, to DBSCAN by dividing the previous one by 2....

    [...]

  • ...Now, we apply DBSCAN [13] to make clusters of...

    [...]

  • ...Index Terms—Object Localization, Unsupervised Learning, GIST, DBSCAN I. INTRODUCTION Object localization is an important and highly challenging problem faced in the field of computer vision where the main aim is to figure out the location as well as estimating a bounding box around the different categories of objects present in an image....

    [...]

Journal ArticleDOI
TL;DR: An object detection system based on mixtures of multiscale deformable part models that is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges is described.
Abstract: We describe an object detection system based on mixtures of multiscale deformable part models. Our system is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges. While deformable part models have become quite popular, their value had not been demonstrated on difficult benchmarks such as the PASCAL data sets. Our system relies on new methods for discriminative training with partially labeled data. We combine a margin-sensitive approach for data-mining hard negative examples with a formalism we call latent SVM. A latent SVM is a reformulation of MI--SVM in terms of latent variables. A latent SVM is semiconvex, and the training problem becomes convex once latent information is specified for the positive examples. This leads to an iterative training algorithm that alternates between fixing latent values for positive examples and optimizing the latent SVM objective function.

10,501 citations


"Unsupervised GIST based Clustering ..." refers methods in this paper

  • ...• A simple but efficient algorithm for object localization is proposed and explored on challenging benchmark datasets such as the object discovery dataset [16] and the PASCAL VOC 2007 dataset [14]....

    [...]

  • ...This dataset has been widely used to benchmark algorithms for object discovery ([5], [16], [34])....

    [...]