scispace - formally typeset
Search or ask a question

Showing papers on "Scale-invariant feature transform published in 2011"


01 Jan 2011
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.

14,708 citations


Proceedings ArticleDOI
06 Nov 2011
TL;DR: This paper proposes a very fast binary descriptor based on BRIEF, called ORB, which is rotation invariant and resistant to noise, and demonstrates through experiments how ORB is at two orders of magnitude faster than SIFT, while performing as well in many situations.
Abstract: Feature matching is at the base of many computer vision problems, such as object recognition or structure from motion. Current methods rely on costly descriptors for detection and matching. In this paper, we propose a very fast binary descriptor based on BRIEF, called ORB, which is rotation invariant and resistant to noise. We demonstrate through experiments how ORB is at two orders of magnitude faster than SIFT, while performing as well in many situations. The efficiency is tested on several real-world applications, including object detection and patch-tracking on a smart phone.

8,702 citations


Proceedings ArticleDOI
06 Nov 2011
TL;DR: A comprehensive evaluation on benchmark datasets reveals BRISK's adaptive, high quality performance as in state-of-the-art algorithms, albeit at a dramatically lower computational cost (an order of magnitude faster than SURF in cases).
Abstract: Effective and efficient generation of keypoints from an image is a well-studied problem in the literature and forms the basis of numerous Computer Vision applications. Established leaders in the field are the SIFT and SURF algorithms which exhibit great performance under a variety of image transformations, with SURF in particular considered as the most computationally efficient amongst the high-performance methods to date. In this paper we propose BRISK1, a novel method for keypoint detection, description and matching. A comprehensive evaluation on benchmark datasets reveals BRISK's adaptive, high quality performance as in state-of-the-art algorithms, albeit at a dramatically lower computational cost (an order of magnitude faster than SURF in cases). The key to speed lies in the application of a novel scale-space FAST-based detector in combination with the assembly of a bit-string descriptor from intensity comparisons retrieved by dedicated sampling of each keypoint neighborhood.

3,292 citations


Journal ArticleDOI
TL;DR: SIFT flow is proposed, a method to align an image to its nearest neighbors in a large image corpus containing a variety of scenes, where image information is transferred from the nearest neighbors to a query image according to the dense scene correspondence.
Abstract: While image alignment has been studied in different areas of computer vision for decades, aligning images depicting different scenes remains a challenging problem. Analogous to optical flow, where an image is aligned to its temporally adjacent frame, we propose SIFT flow, a method to align an image to its nearest neighbors in a large image corpus containing a variety of scenes. The SIFT flow algorithm consists of matching densely sampled, pixelwise SIFT features between two images while preserving spatial discontinuities. The SIFT features allow robust matching across different scene/object appearances, whereas the discontinuity-preserving spatial model allows matching of objects located at different parts of the scene. Experiments show that the proposed approach robustly aligns complex scene pairs containing significant spatial differences. Based on SIFT flow, we propose an alignment-based large database framework for image analysis and synthesis, where image information is transferred from the nearest neighbors to a query image according to the dense scene correspondence. This framework is demonstrated through concrete applications such as motion field prediction from a single image, motion synthesis via object transfer, satellite image registration, and face recognition.

1,726 citations


Journal ArticleDOI
TL;DR: Hough forests can be regarded as task-adapted codebooks of local appearance that allow fast supervised training and fast matching at test time that improve the performance of the generalized Hough transform for object detection on a categorical level and extend to new domains such as object tracking and action recognition.
Abstract: The paper introduces Hough forests, which are random forests adapted to perform a generalized Hough transform in an efficient way. Compared to previous Hough-based systems such as implicit shape models, Hough forests improve the performance of the generalized Hough transform for object detection on a categorical level. At the same time, their flexibility permits extensions of the Hough transform to new domains such as object tracking and action recognition. Hough forests can be regarded as task-adapted codebooks of local appearance that allow fast supervised training and fast matching at test time. They achieve high detection accuracy since the entries of such codebooks are optimized to cast Hough votes with small variance and since their efficiency permits dense sampling of local image patches or video cuboids during detection. The efficacy of Hough forests for a set of computer vision tasks is validated through experiments on a large set of publicly available benchmark data sets and comparisons with the state-of-the-art.

629 citations


Proceedings ArticleDOI
20 Jun 2011
TL;DR: This paper proposes a model based approach, which detects humans using a 2-D head contour model and a 3-DHead surface model and proposes a segmentation scheme to segment the human from his/her surroundings and extract the whole contours of the figure based on the authors' detection point.
Abstract: Conventional human detection is mostly done in images taken by visible-light cameras. These methods imitate the detection process that human use. They use features based on gradients, such as histograms of oriented gradients (HOG), or extract interest points in the image, such as scale-invariant feature transform (SIFT), etc. In this paper, we present a novel human detection method using depth information taken by the Kinect for Xbox 360. We propose a model based approach, which detects humans using a 2-D head contour model and a 3-D head surface model. We propose a segmentation scheme to segment the human from his/her surroundings and extract the whole contours of the figure based on our detection point. We also explore the tracking algorithm based on our detection result. The methods are tested on our database taken by the Kinect in our lab and present superior results.

574 citations


Journal ArticleDOI
TL;DR: This system includes detecting and tracking bare hand in cluttered background using skin detection and hand posture contour comparison algorithm after face subtraction, recognizing hand gestures via bag-of-features and multiclass support vector machine (SVM) and building a grammar that generates gesture commands to control an application.
Abstract: This paper presents a novel and real-time system for interaction with an application or video game via hand gestures. Our system includes detecting and tracking bare hand in cluttered background using skin detection and hand posture contour comparison algorithm after face subtraction, recognizing hand gestures via bag-of-features and multiclass support vector machine (SVM) and building a grammar that generates gesture commands to control an application. In the training stage, after extracting the keypoints for every training image using the scale invariance feature transform (SIFT), a vector quantization technique will map keypoints from every training image into a unified dimensional histogram vector (bag-of-words) after K-means clustering. This histogram is treated as an input vector for a multiclass SVM to build the training classifier. In the testing stage, for every frame captured from a webcam, the hand is detected using our algorithm, then, the keypoints are extracted for every small image that contains the detected hand gesture only and fed into the cluster model to map them into a bag-of-words vector, which is finally fed into the multiclass SVM training classifier to recognize the hand gesture.

419 citations


Journal ArticleDOI
TL;DR: AffineSIFT (ASIFT), simulates a set of sample views of the initial images, obtainable by varying the two camera axis orientation parameters, namely the latitude and the longitude angles, which are not treated by the SIFT method.
Abstract: If a physical object has a smooth or piecewise smooth boundary, its images obtained by cameras in varying positions undergo smooth apparent deformations. These deformations are locally well approximated by affine transforms of the image plane. In consequencethe solid object recognition problem has often been led back to the computation of affine invariant image local features. The similarity invariance (invariance to translation, rotation, and zoom) is dealt with rigorously by the SIFT method The method illustrated and demonstrated in this work, AffineSIFT (ASIFT), simulates a set of sample views of the initial images, obtainable by varying the two camera axis orientation parameters, namely the latitude and the longitude angles, which are not treated by the SIFT method. Then it applies the SIFT method itself to all images thus generated. Thus, ASIFT covers effectively all six parameters of the affine transform. Source Code The source code (ANSI C), its documentation, and the online demo are accessible at the IPOL web page of this article 1 .

329 citations


Journal ArticleDOI
TL;DR: This paper proposes a discriminative model to address face matching in the presence of age variation and shows that this approach outperforms a state-of-the-art commercial face recognition engine on two public domain face aging data sets: MORPH and FG-NET.
Abstract: Aging variation poses a serious problem to automatic face recognition systems. Most of the face recognition studies that have addressed the aging problem are focused on age estimation or aging simulation. Designing an appropriate feature representation and an effective matching framework for age invariant face recognition remains an open problem. In this paper, we propose a discriminative model to address face matching in the presence of age variation. In this framework, we first represent each face by designing a densely sampled local feature description scheme, in which scale invariant feature transform (SIFT) and multi-scale local binary patterns (MLBP) serve as the local descriptors. By densely sampling the two kinds of local descriptors from the entire facial image, sufficient discriminatory information, including the distribution of the edge direction in the face image (that is expected to be age invariant) can be extracted for further analysis. Since both SIFT-based local features and MLBP-based local features span a high-dimensional feature space, to avoid the overfitting problem, we develop an algorithm, called multi-feature discriminant analysis (MFDA) to process these two local feature spaces in a unified framework. The MFDA is an extension and improvement of the LDA using multiple features combined with two different random sampling methods in feature and sample space. By random sampling the training set as well as the feature space, multiple LDA-based classifiers are constructed and then combined to generate a robust decision via a fusion rule. Experimental results show that our approach outperforms a state-of-the-art commercial face recognition engine on two public domain face aging data sets: MORPH and FG-NET. We also compare the performance of the proposed discriminative model with a generative aging model. A fusion of discriminative and generative models further improves the face matching accuracy in the presence of aging.

265 citations


Journal ArticleDOI
TL;DR: Comprehensive evaluation of efficiency, distribution quality, and positional accuracy of the extracted point pairs proves the capabilities of the proposed matching algorithm on a variety of optical remote sensing images.
Abstract: Extracting well-distributed, reliable, and precisely aligned point pairs for accurate image registration is a difficult task, particularly for multisource remote sensing images that have significant illumination, rotation, and scene differences. The scale-invariant feature transform (SIFT) approach, as a well-known feature-based image matching algorithm, has been successfully applied in a number of automatic registration of remote sensing images. Regardless of its distinctiveness and robustness, the SIFT algorithm suffers from some problems in the quality, quantity, and distribution of extracted features particularly in multisource remote sensing imageries. In this paper, an improved SIFT algorithm is introduced that is fully automated and applicable to various kinds of optical remote sensing images, even with those that are five times the difference in scale. The main key of the proposed approach is a selection strategy of SIFT features in the full distribution of location and scale where the feature qualities are quarantined based on the stability and distinctiveness constraints. Then, the extracted features are introduced to an initial cross-matching process followed by a consistency check in the projective transformation model. Comprehensive evaluation of efficiency, distribution quality, and positional accuracy of the extracted point pairs proves the capabilities of the proposed matching algorithm on a variety of optical remote sensing images.

255 citations


Proceedings ArticleDOI
20 Jun 2011
TL;DR: This paper proposes a novel approach to unsupervised integrate such heterogeneous features by performing multi-modal spectral clustering on unlabeled images and unsegmented images using a commonly shared graph Laplacian matrix.
Abstract: In recent years, more and more visual descriptors have been proposed to describe objects and scenes appearing in images. Different features describe different aspects of the visual characteristics. How to combine these heterogeneous features has become an increasing critical problem. In this paper, we propose a novel approach to unsupervised integrate such heterogeneous features by performing multi-modal spectral clustering on unlabeled images and unsegmented images. Considering each type of feature as one modal, our new multi-modal spectral clustering (MMSC) algorithm is to learn a commonly shared graph Laplacian matrix by unifying different modals (image features). A non-negative relaxation is also added in our method to improve the robustness and efficiency of image clustering. We applied our MMSC method to integrate five types of popularly used image features, including SIFT, HOG, GIST, LBP, CENTRIST and evaluated the performance by two benchmark data sets: Caltech-101 and MSRC-v1. Compared with existing unsupervised scene and object categorization methods, our approach always achieves superior performances measured by three standard clustering evaluation metrices.

Proceedings ArticleDOI
21 Mar 2011
TL;DR: This work demonstrates a large-scale feature search approach to generating new, more powerful feature representations in which a multitude of complex, nonlinear, multilayer neuromorphic feature representations are randomly generated and screened to find those best suited for the task at hand.
Abstract: Many modern computer vision algorithms are built atop of a set of low-level feature operators (such as SIFT [1], [2]; HOG [3], [4]; or LBP [5], [6]) that transform raw pixel values into a representation better suited to subsequent processing and classification. While the choice of feature representation is often not central to the logic of a given algorithm, the quality of the feature representation can have critically important implications for performance. Here, we demonstrate a large-scale feature search approach to generating new, more powerful feature representations in which a multitude of complex, nonlinear, multilayer neuromorphic feature representations are randomly generated and screened to find those best suited for the task at hand. In particular, we show that a brute-force search can generate representations that, in combination with standard machine learning blending techniques, achieve state-of-the-art performance on the Labeled Faces in the Wild (LFW) [7] unconstrained face recognition challenge set. These representations outperform previous state-of-the-art approaches, in spite of requiring less training data and using a conceptually simpler machine learning backend. We argue that such large-scale-search-derived feature sets can play a synergistic role with other computer vision approaches by providing a richer base of features with which to work.

Proceedings Article
12 Dec 2011
TL;DR: This paper proposes hierarchical matching pursuit (HMP), which builds a feature hierarchy layer-by-layer using an efficient matching pursuit encoder that includes three modules: batch (tree) orthogonal matching pursuit, spatial pyramid max pooling, and contrast normalization.
Abstract: Extracting good representations from images is essential for many computer vision tasks. In this paper, we propose hierarchical matching pursuit (HMP), which builds a feature hierarchy layer-by-layer using an efficient matching pursuit encoder. It includes three modules: batch (tree) orthogonal matching pursuit, spatial pyramid max pooling, and contrast normalization. We investigate the architecture of HMP, and show that all three components are critical for good performance. To speed up the orthogonal matching pursuit, we propose a batch tree orthogonal matching pursuit that is particularly suitable to encode a large number of observations that share the same large dictionary. HMP is scalable and can efficiently handle full-size images. In addition, HMP enables linear support vector machines (SVM) to match the performance of nonlinear SVM while being scalable to large datasets. We compare HMP with many state-of-the-art algorithms including convolutional deep belief networks, SIFT based single layer sparse coding, and kernel based feature learning. HMP consistently yields superior accuracy on three types of image classification problems: object recognition (Caltech-101), scene recognition (MIT-Scene), and static event recognition (UIUC-Sports).

Journal ArticleDOI
TL;DR: A new statistical predictor based upon the Weibull distribution is developed, which produces accurate results on a per instance recognition basis across different recognition problems.
Abstract: In this paper, we define meta-recognition, a performance prediction method for recognition algorithms, and examine the theoretical basis for its postrecognition score analysis form through the use of the statistical extreme value theory (EVT). The ability to predict the performance of a recognition system based on its outputs for each match instance is desirable for a number of important reasons, including automatic threshold selection for determining matches and nonmatches, and automatic algorithm selection or weighting for multi-algorithm fusion. The emerging body of literature on postrecognition score analysis has been largely constrained to biometrics, where the analysis has been shown to successfully complement or replace image quality metrics as a predictor. We develop a new statistical predictor based upon the Weibull distribution, which produces accurate results on a per instance recognition basis across different recognition problems. Experimental results are provided for two different face recognition algorithms, a fingerprint recognition algorithm, a SIFT-based object recognition system, and a content-based image retrieval system.

Journal ArticleDOI
TL;DR: A new AIR method is proposed, based on the combination of image segmentation and SIFT, complemented by a robust procedure of outlier removal, which allows for an accurate obtention of tie points for a pair of remote sensing images, being a powerful scheme for AIR.
Abstract: Automatic image registration (AIR) is still a present challenge for the remote sensing community. Although a wide variety of AIR methods have been proposed in the last few years, there are several drawbacks which avoid their common use in practice. The recently proposed scale invariant feature transform (SIFT) approach has already revealed to be a powerful tool for the obtention of tie points in general image processing tasks, but it has a limited performance when directly applied to remote sensing images. In this paper, a new AIR method is proposed, based on the combination of image segmentation and SIFT, complemented by a robust procedure of outlier removal. This combination allows for an accurate obtention of tie points for a pair of remote sensing images, being a powerful scheme for AIR. Both synthetic and real data have been considered in this work for the evaluation of the proposed methodology, comprising medium and high spatial resolution images, and single-band, multispectral, and hyperspectral images. A set of measures which allow for an objective evaluation of the geometric correction process quality has been used. The proposed methodology allows for a fully automatic registration of pairs of remote sensing images, leading to a subpixel accuracy for the whole considered data set. Furthermore, it is able to account for differences in spectral content, rotation, scale, translation, different viewpoint, and change in illumination.

Proceedings Article
12 Dec 2011
TL;DR: A discriminatively trained model of person-object interactions for recognizing common human actions in still images that bypasses the difficult problem of estimating the complete human body pose configuration is investigated.
Abstract: We investigate a discriminatively trained model of person-object interactions for recognizing common human actions in still images. We build on the locally order-less spatial pyramid bag-of-features model, which was shown to perform extremely well on a range of object, scene and human action recognition tasks. We introduce three principal contributions. First, we replace the standard quantized local HOG/SIFT features with stronger discriminatively trained body part and object detectors. Second, we introduce new person-object interaction features based on spatial co-occurrences of individual body parts and objects. Third, we address the combinatorial problem of a large number of possible interaction pairs and propose a discriminative selection procedure using a linear support vector machine (SVM) with a sparsity inducing regularizer. Learning of action-specific body part and object interactions bypasses the difficult problem of estimating the complete human body pose configuration. Benefits of the proposed model are shown on human action recognition in consumer photographs, outperforming the strong bag-of-features baseline.

Proceedings ArticleDOI
06 Dec 2011
TL;DR: This paper summarizes the performance of two robust feature detection algorithms namely Scale Invariant Feature Transform (SIFT) and Speeded up Robust Features (SURF) on several classification datasets.
Abstract: Scene classification in indoor and outdoor environments is a fundamental problem to the vision and robotics community. Scene classification benefits from image features which are invariant to image transformations such as rotation, illumination, scale, viewpoint, noise etc. Selecting suitable features that exhibit such invariances plays a key part in classification performance. This paper summarizes the performance of two robust feature detection algorithms namely Scale Invariant Feature Transform (SIFT) and Speeded up Robust Features (SURF) on several classification datasets. In this paper, we have proposed three shorter SIFT descriptors. Results show that the proposed 64D and 96D SIFT descriptors perform as well as traditional 128D SIFT descriptors for image matching at a significantly reduced computational cost. SURF has also been observed to give good classification results on different datasets.

Journal ArticleDOI
TL;DR: A novel region-location algorithm is proposed, which exploits the clustering information from matched SIFT keypoints, as well as the region information extracted through the image segmentation, which outperforms the existing algorithms in terms of detection accuracy.
Abstract: This letter presents a new method for airport detection from large high-spatial-resolution IKONOS images. To this end, we describe airport by a set of scale-invariant feature transform (SIFT) keypoints and detect it using an improved SIFT matching strategy. After obtaining SIFT matched keypoints, to both discard the redundant matched points and locate the possible regions of candidates that contain the target, a novel region-location algorithm is proposed, which exploits the clustering information from matched SIFT keypoints, as well as the region information extracted through the image segmentation. Finally, airport recognition is achieved by applying the prior knowledge to the candidate regions. Experimental results show that the proposed approach outperforms the existing algorithms in terms of detection accuracy.

Proceedings ArticleDOI
06 Nov 2011
TL;DR: A novel method to select informative object features using a more efficient algorithm called Sparse PCA is proposed and it is shown that using a large-scale multiple-view object database, informative features can be reliably identified from a highdimensional visual dictionary by applying SparsePCA on the histograms of each object category.
Abstract: Bag-of-words (BoW) methods are a popular class of object recognition methods that use image features (e.g., SIFT) to form visual dictionaries and subsequent histogram vectors to represent object images in the recognition process. The accuracy of the BoW classifiers, however, is often limited by the presence of uninformative features extracted from the background or irrelevant image segments. Most existing solutions to prune out uninformative features rely on enforcing pairwise epipolar geometry via an expensive structure-from-motion (SfM) procedure. Such solutions are known to break down easily when the camera transformation is large or when the features are extracted from low-resolution, low-quality images. In this paper, we propose a novel method to select informative object features using a more efficient algorithm called Sparse PCA. First, we show that using a large-scale multiple-view object database, informative features can be reliably identified from a highdimensional visual dictionary by applying Sparse PCA on the histograms of each object category. Our experiment shows that the new algorithm improves recognition accuracy compared to the traditional BoW methods and SfM methods. Second, we present a new solution to Sparse PCA as a semidefinite programming problem using the Augmented Lagrangian Method. The new solver outperforms the state of the art for estimating sparse principal vectors as a basis for a low-dimensional subspace model.

Journal ArticleDOI
TL;DR: In this article, the scale invariance of Lowe's Scale-Invariant Feature Transform (SIFT) has been investigated under the assumption that the Gaussian smoothing performed by SIFT gives an aliasing free sampling of the image evolution.
Abstract: This note is devoted to a mathematical exploration of whether Lowe's Scale-Invariant Feature Transform (SIFT)[21], a very successful image matching method, is similarity invariant as claimed. It is proved that the method is scale invariant only if the initial image blurs are exactly guessed. Yet, even a large error on the initial blur is quickly attenuated by this multiscale method, when the scale of analysis increases. In consequence, its scale invariance is almost perfect. The mathematical arguments are given under the assumption that the Gaussian smoothing performed by SIFT gives an aliasing free sampling of the image evolution. The validity of this main assumption is confirmed by a rigorous experimental procedure, and by a mathematical proof. These results explain why SIFT outperforms all other image feature extraction methods when it comes to scale invariance.

Journal ArticleDOI
TL;DR: The experimental results showed that the detection algorithm with gradient-based Random Hough Transform was adaptive to the difference of plant density in the crop row effectively and was faster and had a high detection correction rate.

Journal ArticleDOI
TL;DR: The experimental results show that Multi-region Histogram (MRH) feature is more discriminative for face recognition compared to Local Binary Patterns (LBP) and raw pixel intensity and is more suitable for CCTV surveillance systems with constraints on the number of images and the speed of processing.
Abstract: Although automatic faces recognition has shown success for high-quality images under controlled conditions, for videobased recognition it is hard to attain similar levels of performance. We describe in this paper recent advances in a project being undertaken to trial and develop advanced surveillance systems for public safety. In this paper, we propose a local facial feature based framework for both still image and video-based face recognition. The evaluation is performed on a still image dataset LFW and a video sequence dataset MOBIO to compare 4 methods for operation on feature: feature averaging (Avg-Feature), Mutual Subspace Method (MSM), Manifold to Manifold Distance (MMS), and Affine Hull Method (AHM), and 4 methods for operation on distance on 3 different features. The experimental results show thatMulti-region Histogram (MRH) feature ismore discriminative for face recognition compared to Local Binary Patterns (LBP) and raw pixel intensity. Under the limitation on a small number of images available per person, feature averaging ismore reliable than MSM, MMD, and AHM and ismuch faster. Thus, our proposed framework--veraging MRH feature is more suitable for CCTV surveillance systems with constraints on the number of images and the speed of processing.

Proceedings ArticleDOI
30 Aug 2011
TL;DR: The results indicate that there are significant differences between the evaluated descriptors, with GLOH and SIFT outperforming both Shape Context and SURF descriptors.
Abstract: In this paper we present a comparative study of local features for the task of person (re) identification. A combination of state of the art interest point detectors and descriptors is evaluated. The experiments are performed on a novel dataset which we make publicly available for future research in this area. The results indicate that there are significant differences between the evaluated descriptors, with GLOH and SIFT outperforming both Shape Context and SURF descriptors. The evaluated interest point descriptors perform equally well, with a slight advantage for the Hessian-Laplace detector. The Harris-Affine and Hessian-Affine affine invariant region detectors do not provide any performance advantage and therefore do not justify their additional computational expense.


Proceedings ArticleDOI
Hong Zhang1
09 May 2011
TL;DR: A novel method for visual loop-closure detection in autonomous robot navigation that uses scale-invariant visual features directly, rather than their vector-quantized representation or bag-of-words (BoW), which is popular in recent studies of the problem.
Abstract: In this paper, we present a novel method for visual loop-closure detection in autonomous robot navigation. Our method, which we refer to as bag-of-raw-features or BoRF, uses scale-invariant visual features (such as SIFT) directly, rather than their vector-quantized representation or bag-of-words (BoW), which is popular in recent studies of the problem. BoRF avoids the offline process of vocabulary construction, and does not suffer from the perceptual aliasing problem of BoW, thereby significantly improving the recall performance. To reduce the computational cost of direct feature matching, we exploit the fact that images in the case of robot navigation are acquired sequentially, and that feature matching repeatability with respect to scale can be learned and used to reduce the number of the features considered for matching. The proposed method is tested experimentally using indoor visual SLAM image sequences.

01 Jan 2011
TL;DR: This paper proposes an efficient computer-aided Plant Image Retrieval method based on plant leaf images using Shape, Color and Texture features intended mainly for medical industry, botanical gardening and cosmetic industry, which outperforms the recently developed methods.
Abstract: This paper proposes an efficient computer-aided Plant Image Retrieval method based on plant leaf images using Shape, Color and Texture features intended mainly for medical industry, botanical gardening and cosmetic industry. Here, we use HSV color space to extract the various features of leaves. Log-Gabor wavelet is applied to the input image for texture feature extraction. The Scale Invariant Feature Transform (SIFT) is incorporated to extract the feature points of the leaf image. Scale Invariant Feature Transform transforms an image into a large collection of feature vectors, each of which is invariant to image translation, scaling, and rotation, partially invariant to illumination changes and robust to local geometric distortion. SIFT has four modules namely detection of scale space extrema, local extrema detection, orientation assignment and key point descriptor. Results on a database of 500 plant images belonging to 45 different types of plants with different orientations scales, and translations show that proposed method outperforms the recently developed methods by giving 97.9% of retrieval efficiency for 20, 50, 80 and 100 retrievals.

Journal ArticleDOI
TL;DR: Experimental results demonstrate that the proposed QP assignment outperforms the traditional nearest neighbor assignment, the multiple assignment, and the soft assignment, whereas the proposed boosting based weighting strategy outper performs the state-of-the-art weighting methods, such as the term frequency weights and theterm frequency-inverse document frequency weights.
Abstract: Bag-of-features based approaches have become prominent for image retrieval and image classification tasks in the past decade. Such methods represent an image as a collection of local features, such as image patches and key points with scale invariant feature transform (SIFT) descriptors. To improve the bag-of-features methods, we first model the assignments of local descriptors as contribution functions, and then propose a novel multiple assignment strategy. Assuming the local features can be reconstructed by their neighboring visual words in a vocabulary, reconstruction weights can be solved by quadratic programming. The weights are then used to build contribution functions, resulting in a novel assignment method, called quadratic programming (QP) assignment. We further propose a novel visual word weighting method. The discriminative power of each visual word is analyzed by the sub-similarity function in the bin that corresponds to the visual word. Each sub-similarity function is then treated as a weak classifier. A strong classifier is learned by boosting methods that combine those weak classifiers. The weighting factors of the visual words are learned accordingly. We evaluate the proposed methods on medical image retrieval tasks. The methods are tested on three well-known data sets, i.e., the ImageCLEFmed data set, the 304 CT Set, and the basal-cell carcinoma image set. Experimental results demonstrate that the proposed QP assignment outperforms the traditional nearest neighbor assignment, the multiple assignment, and the soft assignment, whereas the proposed boosting based weighting strategy outperforms the state-of-the-art weighting methods, such as the term frequency weights and the term frequency-inverse document frequency weights.

Book ChapterDOI
23 Nov 2011
TL;DR: A novel combination of local-local information for an efficient finger-knuckle-print (FKP) based recognition system which is robust to scale and rotation and evaluated against various scales and rotations of the query image.
Abstract: This paper presents a novel combination of local-local information for an efficient finger-knuckle-print (FKP) based recognition system which is robust to scale and rotation. The non-uniform brightness of the FKP due to relatively curvature surface is corrected and texture is enhanced. The local features of the enhanced FKP are extracted using the scale invariant feature transform (SIFT) and the speeded up robust features (SURF). Corresponding features of the enrolled and the query FKPs are matched using nearest-neighbour-ratio method and then the derived SIFT and SURF matching scores are fused using weighted sum rule. The proposed system is evaluated using PolyU FKP database of 7920 images for both identification mode and verification mode. It is observed that the system performs with CRR of 100% and EER of 0.215%. Further, it is evaluated against various scales and rotations of the query image and is found to be robust for query images downscaled upto 60% and for any orientation of query image.

Proceedings ArticleDOI
29 Dec 2011
TL;DR: A system for retrieving photographs using free-hand sketched queries that enables localization of the sketched object within matching images, and significant performance improvements over the previous GF-HOG results reliant on single-scale Canny edge maps, and over leading descriptors for visual search.
Abstract: This paper presents a system for retrieving photographs using free-hand sketched queries. Regions are extracted from each image by gathering nodes of a hierarchical image segmentation into a bag-of-regions (BoR) representation. The BoR represents object shape at multiple scales, encoding shape even in the presence of adjacent clutter. We extract a shape representation from each region, using the Gradient Field HoG (GF-HOG) descriptor which enables direct comparison with the sketched query. The retrieval pipeline yields significant performance improvements over the previous GF-HOG results reliant on single-scale Canny edge maps, and over leading descriptors (SIFT, SSIM) for visual search. In addition, our system enables localization of the sketched object within matching images.

Proceedings ArticleDOI
16 May 2011
TL;DR: Evaluating the performance of some of the most popular descriptor and detector combinations on the DTU Robot dataset, which is a very large dataset with massive amounts of systematic data aimed at two view matching, concludes, that the MSER and Difference of Gaussian detectors with a SIFT or DAISY descriptor are the top performers.
Abstract: Addressing the image correspondence problem by feature matching is a central part of computer vision and 3D inference from images. Consequently, there is a substantial amount of work on evaluating feature detection and feature description methodology. However, the performance of the feature matching is an interplay of both detector and descriptor methodology. Our main contribution is to evaluate the performance of some of the most popular descriptor and detector combinations on the DTU Robot dataset, which is a very large dataset with massive amounts of systematic data aimed at two view matching. The size of the dataset implies that we can also reasonably make deductions about the statistical significance of our results. We conclude, that the MSER and Difference of Gaussian (DoG) detectors with a SIFT or DAISY descriptor are the top performers. This performance is, however, not statistically significantly better than some other methods. As a byproduct of this investigation, we have also tested various DAISY type descriptors, and found that the difference among their performance is statistically insignificant using this dataset. Furthermore, we have not been able to produce results collaborating that using affine invariant feature detectors carries a statistical significant advantage on general scene types.