scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Pruning SIFT & SURF for Efficient Clustering of Near-duplicate Images

TL;DR: A simple approach to reduce the cardinality of keypoint set and prune the dimension of SIFT and SURF feature descriptors for efficient image clustering is proposed, and clustering accuracy is found to be at par with traditional SIFTand SURF with a significant reduction in computational cost.
Abstract: Clustering and categorization of similar images using SIFT and SURF require a high computational cost. In this paper, a simple approach to reduce the cardinality of keypoint set and prune the dimension of SIFT and SURF feature descriptors for efficient image clustering is proposed. For this purpose, sparsely spaced (uniformly distributed) important keypoints are chosen. In addition, multiple reduced dimensional variants of SIFT and SURF descriptors are presented. Moreover, clustering time complexity is also improved by proposed contextual bag-of-features approach for partitioned keypoint set. The F-measure statistic is used to evaluate clustering performance on a California-ND dataset containing near-duplicate images. Clustering accuracy of the proposed pruned SIFT and SURF is found to be at par with traditional SIFT and SURF with a significant reduction in computational cost.
Citations
More filters
Proceedings ArticleDOI
01 Sep 2019
TL;DR: This thesis proposes approaches for efficient clustering, fast direction oriented motion estimation algorithm, and an image reordering scheme with minimum predictive costs for better compression of near-duplicate image collection.
Abstract: The explosion of digital photos has posed a challenge for storage and transmission bandwidth. The thesis briefly discusses my work on efficient image set compression. The lossless compression of near-duplicate image collection is carried out using multiple steps, where each step is computationally demanding. My task is to make them work faster without compromising on compression efficiency. In this pursuit, we propose approaches for efficient clustering, fast direction oriented motion estimation algorithm, and an image reordering scheme with minimum predictive costs for better compression. The preliminary results for the proposed approach are promising. We also aim to extend our approach to hyperspectral and medical image set compression.

10 citations


Cites background from "Pruning SIFT & SURF for Efficient C..."

  • ...The preliminary clustering results on the publicly available California-ND dataset containing nearduplicate images have been promising (see [7])....

    [...]

Proceedings ArticleDOI
01 Dec 2019
TL;DR: The proposed Bag-of-Visual Word Modelling in which Image Feature Extraction is achieved using Deep Feature Learning via Stacked-Autoencoder is tested and the ability of deep feature learning to yield optimum image categorisation performance is confirmed.
Abstract: The Bag-of-Visual Words has been recognised as an effective mean of representing images for image classification. However, its reliance on hand crafted image feature extraction algorithms often results in significant computational overhead, and poor classification accuracies. Therefore, this paper presents a Bag-of-Visual Word Modelling in which Image Feature Extraction is achieved using Deep Feature Learning via Stacked-Autoencoder. The proposed method is tested using three image collections constituted from the Caltech 101 image collection and the results confirm the ability of deep feature learning to yield optimum image categorisation performance.

4 citations

Journal ArticleDOI
TL;DR: This study presents an adaptive BOVW modelling, in which image feature extraction is achieved using deep feature learning and the amount of computation required for the development of visual codebook is minimised using a batch implementation of particle swarm optimisation.
Abstract: The bag-of-visual words (BOVWs) have been recognised as an effective mean of representing images for image classification. However, its reliance on a visual codebook developed using handcrafted image feature extraction algorithms and vector quantisation via k -means clustering often results in significant computational overhead, and poor classification accuracies. Therefore, this study presents an adaptive BOVW modelling, in which image feature extraction is achieved using deep feature learning and the amount of computation required for the development of visual codebook is minimised using a batch implementation of particle swarm optimisation. The proposed method is tested using Caltech-101 image dataset, and the results confirm the suitability of the proposed method in improving the categorisation performance while reducing the computational load.

3 citations


Cites background from "Pruning SIFT & SURF for Efficient C..."

  • ...Although compared to SIFT and SURF, the image features generated for any given image collection with the Stacked-Autoencoder is considerably less, when the image collection is large, the number of image features generated using Stacked Autoencoder may still be numerous enough to cause lengthy computation during the implementation of the PSO clustering [59]....

    [...]

  • ...In [8] the authors demonstrated that the opportunity to change the number of layers and the number of neurons in each layer of a Deep Learning algorithm allows the feature extraction process to be adaptable to the content diversity of the image collection during BOVW modelling, thus generating image feature vectors whose dimension guarantees optimum discrimination, unlike the fixed 128 dimensions of Scale Invariant Feature Transform (SIFT) and 64 dimensions of Speeded-Up Robust Feature (SURF) [57, 58, 59]....

    [...]

  • ...An important advantage of the application of Deep Feature learning at this stage is the opportunity to control the number of image features to be collected from each image in the collection to be processed thus avoiding excessive computational overhead, commonly associated with sparse image features such as SIFT or SURF where the number of image features per image is not predetermined or in Dense-SIFT where the number of features per image can be more than 10,000 with no means of controlling the number of image features....

    [...]

  • ...Although the time taken is higher than the time taken to complete the unsupervised categorisation with SURF features due to the time taken to train the Stacked-Autoencoder, the higher accuracy recorded by Stacked Autoencoder confirms its better efficiency....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations


"Pruning SIFT & SURF for Efficient C..." refers background in this paper

  • ...Many feature extraction algorithms are available in literature [4]....

    [...]

Journal ArticleDOI
01 Jun 2010
TL;DR: A brief overview of clustering is provided, well known clustering methods are summarized, the major challenges and key issues in designing clustering algorithms are discussed, and some of the emerging and useful research directions are pointed out.
Abstract: Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms into a system of ranked taxa: domain, kingdom, phylum, class, etc. Cluster analysis is the formal study of methods and algorithms for grouping, or clustering, objects according to measured or perceived intrinsic characteristics or similarity. Cluster analysis does not use category labels that tag objects with prior identifiers, i.e., class labels. The absence of category information distinguishes data clustering (unsupervised learning) from classification or discriminant analysis (supervised learning). The aim of clustering is to find structure in data and is therefore exploratory in nature. Clustering has a long and rich history in a variety of scientific fields. One of the most popular and simple clustering algorithms, K-means, was first published in 1955. In spite of the fact that K-means was proposed over 50 years ago and thousands of clustering algorithms have been published since then, K-means is still widely used. This speaks to the difficulty in designing a general purpose clustering algorithm and the ill-posed problem of clustering. We provide a brief overview of clustering, summarize well known clustering methods, discuss the major challenges and key issues in designing clustering algorithms, and point out some of the emerging and useful research directions, including semi-supervised clustering, ensemble clustering, simultaneous feature selection during data clustering, and large scale data clustering.

6,601 citations


"Pruning SIFT & SURF for Efficient C..." refers methods in this paper

  • ...K-means is the most famous clustering algorithm in literature [18]....

    [...]

Proceedings ArticleDOI
27 Jun 2004
TL;DR: This paper examines (and improves upon) the local image descriptor used by SIFT, and demonstrates that the PCA-based local descriptors are more distinctive, more robust to image deformations, and more compact than the standard SIFT representation.
Abstract: Stable local feature detection and representation is a fundamental component of many image registration and object recognition algorithms. Mikolajczyk and Schmid (June 2003) recently evaluated a variety of approaches and identified the SIFT [D. G. Lowe, 1999] algorithm as being the most resistant to common image deformations. This paper examines (and improves upon) the local image descriptor used by SIFT. Like SIFT, our descriptors encode the salient aspects of the image gradient in the feature point's neighborhood; however, instead of using SIFT's smoothed weighted histograms, we apply principal components analysis (PCA) to the normalized gradient patch. Our experiments demonstrate that the PCA-based local descriptors are more distinctive, more robust to image deformations, and more compact than the standard SIFT representation. We also present results showing that using these descriptors in an image retrieval application results in increased accuracy and faster matching.

3,325 citations


"Pruning SIFT & SURF for Efficient C..." refers background in this paper

  • ...Many researchers have reduced the dimension of the SIFT descriptor [12, 13]....

    [...]

Proceedings ArticleDOI
13 Jun 2010
TL;DR: This paper presents a simple but effective coding scheme called Locality-constrained Linear Coding (LLC) in place of the VQ coding in traditional SPM, using the locality constraints to project each descriptor into its local-coordinate system, and the projected coordinates are integrated by max pooling to generate the final representation.
Abstract: The traditional SPM approach based on bag-of-features (BoF) requires nonlinear classifiers to achieve good image classification performance. This paper presents a simple but effective coding scheme called Locality-constrained Linear Coding (LLC) in place of the VQ coding in traditional SPM. LLC utilizes the locality constraints to project each descriptor into its local-coordinate system, and the projected coordinates are integrated by max pooling to generate the final representation. With linear classifier, the proposed approach performs remarkably better than the traditional nonlinear SPM, achieving state-of-the-art performance on several benchmarks. Compared with the sparse coding strategy [22], the objective function used by LLC has an analytical solution. In addition, the paper proposes a fast approximated LLC method by first performing a K-nearest-neighbor search and then solving a constrained least square fitting problem, bearing computational complexity of O(M + K2). Hence even with very large codebooks, our system can still process multiple frames per second. This efficiency significantly adds to the practical values of LLC for real applications.

3,307 citations


"Pruning SIFT & SURF for Efficient C..." refers methods in this paper

  • ...In the computer science community, Lloyd’s algorithm is widely used for generating visual dictionaries [20]....

    [...]

Journal ArticleDOI
TL;DR: With the categorizing framework, the efforts toward-building an integrated system for intelligent feature selection are continued, and an illustrative example is presented to show how existing feature selection algorithms can be integrated into a meta algorithm that can take advantage of individual algorithms.
Abstract: This paper introduces concepts and algorithms of feature selection, surveys existing feature selection algorithms for classification and clustering, groups and compares different algorithms with a categorizing framework based on search strategies, evaluation criteria, and data mining tasks, reveals unattempted combinations, and provides guidelines in selecting feature selection algorithms. With the categorizing framework, we continue our efforts toward-building an integrated system for intelligent feature selection. A unifying platform is proposed as an intermediate step. An illustrative example is presented to show how existing feature selection algorithms can be integrated into a meta algorithm that can take advantage of individual algorithms. An added advantage of doing so is to help a user employ a suitable algorithm without knowing details of each algorithm. Some real-world applications are included to demonstrate the use of feature selection in data mining. We conclude this work by identifying trends and challenges of feature selection research and development.

2,605 citations


"Pruning SIFT & SURF for Efficient C..." refers methods in this paper

  • ...In general, image clustering involves the following steps: extracting features from the image, organizing them, and then classifying the image to specific cluster [3]....

    [...]