scispace - formally typeset
Search or ask a question

Showing papers on "Scale-invariant feature transform published in 2013"


Proceedings ArticleDOI
01 Dec 2013
TL;DR: This paper introduces a new dataset called StreetViewText-Perspective, which contains texts in street images with a great variety of viewpoints and significantly outperforms the state-of-the-art on perspective texts of arbitrary orientations.
Abstract: This paper presents an approach to text recognition in natural scene images. Unlike most existing works which assume that texts are horizontal and frontal parallel to the image plane, our method is able to recognize perspective texts of arbitrary orientations. For individual character recognition, we adopt a bag-of-key points approach, in which Scale Invariant Feature Transform (SIFT) descriptors are extracted densely and quantized using a pre-trained vocabulary. Following [1, 2], the context information is utilized through lexicons. We formulate word recognition as finding the optimal alignment between the set of characters and the list of lexicon words. Furthermore, we introduce a new dataset called StreetViewText-Perspective, which contains texts in street images with a great variety of viewpoints. Experimental results on public datasets and the proposed dataset show that our method significantly outperforms the state-of-the-art on perspective texts of arbitrary orientations.

378 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: This work proposes a joint Bayesian adaptation algorithm to adapt the universally trained GMM to better model the pose variations between the target pair of faces/face tracks, which consistently improves face verification accuracy.
Abstract: Pose variation remains to be a major challenge for real-world face recognition. We approach this problem through a probabilistic elastic matching method. We take a part based representation by extracting local features (e.g., LBP or SIFT) from densely sampled multi-scale image patches. By augmenting each feature with its location, a Gaussian mixture model (GMM) is trained to capture the spatial-appearance distribution of all face images in the training corpus. Each mixture component of the GMM is confined to be a spherical Gaussian to balance the influence of the appearance and the location terms. Each Gaussian component builds correspondence of a pair of features to be matched between two faces/face tracks. For face verification, we train an SVM on the vector concatenating the difference vectors of all the feature pairs to decide if a pair of faces/face tracks is matched or not. We further propose a joint Bayesian adaptation algorithm to adapt the universally trained GMM to better model the pose variations between the target pair of faces/face tracks, which consistently improves face verification accuracy. Our experiments show that our method outperforms the state-of-the-art in the most restricted protocol on Labeled Face in the Wild (LFW) and the YouTube video face database by a significant margin.

232 citations


Journal ArticleDOI
TL;DR: An automated species identification method for wildlife pictures captured by remote camera traps that uses improved sparse coding spatial pyramid matching (ScSPM), which extracts dense SIFT descriptor and cell-structured LBP as the local features and generates global feature via weighted sparse coding and max pooling using multi-scale pyramid kernel.
Abstract: Image sensors are increasingly being used in biodiversity monitoring, with each study generating many thousands or millions of pictures. Efficiently identifying the species captured by each image is a critical challenge for the advancement of this field. Here, we present an automated species identification method for wildlife pictures captured by remote camera traps. Our process starts with images that are cropped out of the background. We then use improved sparse coding spatial pyramid matching (ScSPM), which extracts dense SIFT descriptor and cell-structured LBP (cLBP) as the local features, that generates global feature via weighted sparse coding and max pooling using multi-scale pyramid kernel, and classifies the images by a linear support vector machine algorithm. Weighted sparse coding is used to enforce both sparsity and locality of encoding in feature space. We tested the method on a dataset with over 7,000 camera trap images of 18 species from two different field cites, and achieved an average classification accuracy of 82%. Our analysis demonstrates that the combination of SIFT and cLBP can serve as a useful technique for animal species recognition in real, complex scenarios.

184 citations


Journal Article
TL;DR: Two different methods for scale and rotation invariant interest point/feature detector and descriptor are presented: Scale Invariant Feature Transform (SIFT) and Speed Up Robust Features (SURF).
Abstract: Accurate, robust and automatic image registration is critical task in many applications. To perform image registration/alignment, required steps are: Feature detection, Feature matching, derivation of transformation function based on corresponding features in images and reconstruction of images based on derived transformation function. Accuracy of registered image depends on accurate feature detection and matching. So these two intermediate steps are very important in many image applications: image registration, computer vision, image mosaic etc. This paper presents two different methods for scale and rotation invariant interest point/feature detector and descriptor: Scale Invariant Feature Transform (SIFT) and Speed Up Robust Features (SURF). It also presents a way to extract distinctive invariant features from images that can be used to perform reliable matching between different views of an object/scene.

166 citations


Journal ArticleDOI
TL;DR: An improved version of the scale-invariant feature transform is first proposed to obtainInitial matching features from optical and SAR images, and the initial matching features are refined by exploring their spatial relationship.
Abstract: Although feature-based methods have been successfully developed in the past decades for the registration of optical images, the registration of optical and synthetic aperture radar (SAR) images is still a challenging problem in remote sensing. In this letter, an improved version of the scale-invariant feature transform is first proposed to obtain initial matching features from optical and SAR images. Then, the initial matching features are refined by exploring their spatial relationship. The refined feature matches are finally used for estimating registration parameters. Experimental results have shown the effectiveness of the proposed method.

163 citations


Journal ArticleDOI
TL;DR: This paper systematically analyzed SIFT and its variants and evaluated their performance in different situations: scale change, rotation change, blur change, illumination change, and affine change to show that each has its own advantages.
Abstract: SIFT is an image local feature description algorithm based on scale-space. Due to its strong matching ability, SIFT has many applications in different fields, such as image retrieval, image stitching, and machine vision. After SIFT was proposed, researchers have never stopped tuning it. The improved algorithms that have drawn a lot of attention are PCA-SIFT, GSIFT, CSIFT, SURF and ASIFT. In this paper, we first systematically analyze SIFT and its variants. Then, we evaluate their performance in different situations: scale change, rotation change, blur change, illumination change, and affine change. The experimental results show that each has its own advantages. SIFT and CSIFT perform the best under scale and rotation change. CSIFT improves SIFT under blur change and affine change, but not illumination change. GSIFT performs the best under blur change and illumination change. ASIFT performs the best under affine change. PCA-SIFT is always the second in different situations. SURF performs the worst in different situations, but runs the fastest.

159 citations


Proceedings ArticleDOI
01 Dec 2013
TL;DR: A method to produce tentative object segmentation masks to suppress background clutter in the features to improve object detection significantly and exploit contextual features in the form of a full-image FV descriptor, and an inter-category rescoring mechanism.
Abstract: We present an object detection system based on the Fisher vector (FV) image representation computed over SIFT and color descriptors. For computational and storage efficiency, we use a recent segmentation-based method to generate class-independent object detection hypotheses, in combination with data compression techniques. Our main contribution is a method to produce tentative object segmentation masks to suppress background clutter in the features. Re-weighting the local image features based on these masks is shown to improve object detection significantly. We also exploit contextual features in the form of a full-image FV descriptor, and an inter-category rescoring mechanism. Our experiments on the VOC 2007 and 2010 datasets show that our detector improves over the current state-of-the-art detection results.

140 citations


Journal ArticleDOI
TL;DR: A new operator called the orthogonal combination of local binary patterns and six new local descriptors based on OC-LBP enhanced with color information for image region description are proposed to increase both discriminative power and photometric invariance properties of the original LBP operator while keeping its computational efficiency.

128 citations


Journal ArticleDOI
TL;DR: A new descriptor, named flip-invariant SIFT (or F-SIFT), that preserves the original properties of SIFT while being tolerant to flips is proposed and demonstrated, which leads to a more than 50% savings in computational cost.
Abstract: Scale-invariant feature transform (SIFT) feature has been widely accepted as an effective local keypoint descriptor for its invariance to rotation, scale, and lighting changes in images. However, it is also well known that SIFT, which is derived from directionally sensitive gradient fields, is not flip invariant. In real-world applications, flip or flip-like transformations are commonly observed in images due to artificial flipping, opposite capturing viewpoint, or symmetric patterns of objects. This paper proposes a new descriptor, named flip-invariant SIFT (or F-SIFT), that preserves the original properties of SIFT while being tolerant to flips. F-SIFT starts by estimating the dominant curl of a local patch and then geometrically normalizes the patch by flipping before the computation of SIFT. We demonstrate the power of F-SIFT on three tasks: large-scale video copy detection, object recognition, and detection. In copy detection, a framework, which smartly indices the flip properties of F-SIFT for rapid filtering and weak geometric checking, is proposed. F-SIFT not only significantly improves the detection accuracy of SIFT, but also leads to a more than 50% savings in computational cost. In object recognition, we demonstrate the superiority of F-SIFT in dealing with flip transformation by comparing it to seven other descriptors. In object detection, we further show the ability of F-SIFT in describing symmetric objects. Consistent improvement across different kinds of keypoint detectors is observed for F-SIFT over the original SIFT.

111 citations


Journal ArticleDOI
TL;DR: A layer parallel SIFT (LPSIFT) with integral image, and its parallel hardware design with an on-the-fly feature extraction flow for real-time application needs, which reduces the computational amount by 90% and memory usage by 95%.
Abstract: Visual feature extraction with scale invariant feature transform (SIFT) is widely used for object recognition. However, its real-time implementation suffers from long latency, heavy computation, and high memory storage because of its frame level computation with iterated Gaussian blur operations. Thus, this paper proposes a layer parallel SIFT (LPSIFT) with integral image, and its parallel hardware design with an on-the-fly feature extraction flow for real-time application needs. Compared with the original SIFT algorithm, the proposed approach reduces the computational amount by 90% and memory usage by 95%. The final implementation uses 580-K gate count with 90-nm CMOS technology, and offers 6000 feature points/frame for VGA images at 30 frames/s and ~ 2000 feature points/frame for 1920 × 1080 images at 30 frames/s at the clock rate of 100 MHz.

108 citations


Journal ArticleDOI
TL;DR: A novel integrated approach which exploits features of uniform robust scale invariant feature transform (UR-SIFT) and PIIFD and is robust against low content contrast of color images and large content, appearance, and scale changes between color and other retinal image modalities like the fluorescein angiography.
Abstract: Existing algorithms based on scale invariant feature transform (SIFT) and Harris corners such as edge-driven dual-bootstrap iterative closest point and Harris-partial intensity invariant feature descriptor (PIIFD) respectivley have been shown to be robust in registering multimodal retinal images. However, they fail to register color retinal images with other modalities in the presence of large content or scale changes. Moreover, the approaches need preprocessing operations such as image resizing to do well. This restricts the application of image registration for further analysis such as change detection and image fusion. Motivated by the need for efficient registration of multimodal retinal image pairs, this paper introduces a novel integrated approach which exploits features of uniform robust scale invariant feature transform (UR-SIFT) and PIIFD. The approach is robust against low content contrast of color images and large content, appearance, and scale changes between color and other retinal image modalities like the fluorescein angiography. Due to low efficiency of standard SIFT detector for multimodal images, the UR-SIFT algorithm extracts high stable and distinctive features in the full distribution of location and scale in images. Then, feature points are adequate and repeatable. Moreover, the PIIFD descriptor is symmetric to contrast, which makes it suitable for robust multimodal image registration. After the UR-SIFT feature extraction and the PIIFD descriptor generation in images, an initial cross-matching process is performed and followed by a mismatch elimination algorithm. Our dataset consists of 120 pairs of multimodal retinal images. Experiment results show the outperformance of the UR-SIFT-PIIFD over the Harris-PIIFD and similar algorithms in terms of efficiency and positional accuracy.

Journal ArticleDOI
TL;DR: This study addresses the limitations of the existing comparative tools and delivers a generalized criterion to determine beforehand the level of efficiency expected from a matching algorithm given the type of images evaluated.

Journal ArticleDOI
TL;DR: It is shown that, in the complex CT imagery domain containing a high degree of noise and imaging artefacts, a specific instance object recognition system using simpler descriptors appears to outperform a more complex RIFT/SIFT solution.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: A probabilistic parametric model that allows us to assign confidence values for each matching correspondence and therefore accelerates the generation of hypothesis models for RANSAC under these conditions and is able to estimate accurate hypotheses at low inlier ratios significantly faster than previous state-of-the-art approaches.
Abstract: Algorithms based on RANSAC that estimate models using feature correspondences between images can slow down tremendously when the percentage of correct correspondences (inliers) is small. In this paper, we present a probabilistic parametric model that allows us to assign confidence values for each matching correspondence and therefore accelerates the generation of hypothesis models for RANSAC under these conditions. Our framework leverages Extreme Value Theory to accurately model the statistics of matching scores produced by a nearest-neighbor feature matcher. Using a new algorithm based on this model, we are able to estimate accurate hypotheses with RANSAC at low inlier ratios significantly faster than previous state-of-the-art approaches, while still performing comparably when the number of inliers is large. We present results of homography and fundamental matrix estimation experiments for both SIFT and SURF matches that demonstrate that our method leads to accurate and fast model estimations.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: This paper uses a single-shot light field image as an input and proposes a new feature, called the light field distortion (LFD) feature, for identifying a transparent object, which is incorporated into the bag-of-features approach for recognizing transparent objects.
Abstract: Current object-recognition algorithms use local features, such as scale-invariant feature transform (SIFT) and speeded-up robust features (SURF), for visually learning to recognize objects. These approaches though cannot apply to transparent objects made of glass or plastic, as such objects take on the visual features of background objects, and the appearance of such objects dramatically varies with changes in scene background. Indeed, in transmitting light, transparent objects have the unique characteristic of distorting the background by refraction. In this paper, we use a single-shot light field image as an input and model the distortion of the light field caused by the refractive property of a transparent object. We propose a new feature, called the light field distortion (LFD) feature, for identifying a transparent object. The proposal incorporates this LFD feature into the bag-of-features approach for recognizing transparent objects. We evaluated its performance in laboratory and real settings.

Journal ArticleDOI
TL;DR: This article proposes a novel geometric coding algorithm, to encode the spatial context among local features for large-scale partial-duplicate Web image retrieval, which achieves comparable performance to other state-of-the-art global geometric verification methods, but is more computationally efficient.
Abstract: Most large-scale image retrieval systems are based on the bag-of-visual-words model. However, the traditional bag-of-visual-words model does not capture the geometric context among local features in images well, which plays an important role in image retrieval. In order to fully explore geometric context of all visual words in images, efficient global geometric verification methods have been attracting lots of attention. Unfortunately, current existing methods on global geometric verification are either computationally expensive to ensure real-time response, or cannot handle rotation well. To solve the preceding problems, in this article, we propose a novel geometric coding algorithm, to encode the spatial context among local features for large-scale partial-duplicate Web image retrieval. Our geometric coding consists of geometric square coding and geometric fan coding, which describe the spatial relationships of SIFT features into three geo-maps for global verification to remove geometrically inconsistent SIFT matches. Our approach is not only computationally efficient, but also effective in detecting partial-duplicate images with rotation, scale changes, partial-occlusion, and background clutter.Experiments in partial-duplicate Web image search, using two datasets with one million Web images as distractors, reveal that our approach outperforms the baseline bag-of-visual-words approach even following a RANSAC verification in mean average precision. Besides, our approach achieves comparable performance to other state-of-the-art global geometric verification methods, for example, spatial coding scheme, but is more computationally efficient.

Book ChapterDOI
02 Jun 2013
TL;DR: It is shown how a significant increase in matching performance can be obtained in relation to the underlying interest point detectors in the SIFT and the SURF operators.
Abstract: The performance of matching and object recognition methods based on interest points depends on both the properties of the underlying interest points and the associated image descriptors. This paper demonstrates the advantages of using generalized scale-space interest point detectors when computing image descriptors for image-based matching. These generalized scale-space interest points are based on linking of image features over scale and scale selection by weighted averaging along feature trajectories over scale and allow for a higher ratio of correct matches and a lower ratio of false matches compared to previously known interest point detectors within the same class. Specifically, it is shown how a significant increase in matching performance can be obtained in relation to the underlying interest point detectors in the SIFT and the SURF operators. We propose that these generalized scale-space interest points when accompanied by associated scale-invariant image descriptors should allow for better performance of interest point based methods for image-based matching, object recognition and related vision tasks.

Proceedings ArticleDOI
01 Jul 2013
TL;DR: An up-to-date detailed, clear, and complete evaluation of local feature detector and descriptors, focusing on the methods that were designed with complexity constraints is provided, providing a much needed reference for researchers in this field.
Abstract: Several visual feature extraction algorithms have recently appeared in the literature, with the goal of reducing the computational complexity of state-of-the-art solutions (e.g., SIFT and SURF). Therefore, it is necessary to evaluate the performance of these emerging visual descriptors in terms of processing time, repeatability and matching accuracy, and whether they can obtain competitive performance in applications such as image retrieval. This paper aims to provide an up-to-date detailed, clear, and complete evaluation of local feature detector and descriptors, focusing on the methods that were designed with complexity constraints, providing a much needed reference for researchers in this field. Our results demonstrate that recent feature extraction algorithms, e.g., BRISK and ORB, have competitive performance requiring much lower complexity and can be efficiently used in low-power devices.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed Stereo Color Histogram Equalization (SCHE) method produces both accurate depth maps and color-consistent stereo images, even for stereo images with severe radiometric differences.
Abstract: In this paper, we propose a method that infers both accurate depth maps and color-consistent stereo images for radiometrically varying stereo images. In general, stereo matching and performing color consistency between stereo images are a chicken-and-egg problem since it is not a trivial task to simultaneously achieve both goals. Hence, we have developed an iterative framework in which these two processes can boost each other. First, we transform the input color images to log-chromaticity color space, from which a linear relationship can be established during constructing a joint pdf of transformed left and right color images. From this joint pdf, we can estimate a linear function that relates the corresponding pixels in stereo images. Based on this linear property, we present a new stereo matching cost by combining Mutual Information (MI), SIFT descriptor, and segment-based plane-fitting to robustly find correspondence for stereo image pairs which undergo radiometric variations. Meanwhile, we devise a Stereo Color Histogram Equalization (SCHE) method to produce color-consistent stereo image pairs, which conversely boost the disparity map estimation. Experimental results show that our method produces both accurate depth maps and color-consistent stereo images, even for stereo images with severe radiometric differences.

Book ChapterDOI
09 Sep 2013
TL;DR: A novel framework for the detection and tracking in real-time of unknown object in a video stream using multiple keypoint-based methods inside a fallback model, to correctly localize the object frame by frame exploiting the strengths of each method.
Abstract: In this paper we propose a novel framework for the detection and tracking in real-time of unknown object in a video stream. We decompose the problem into two separate modules: detection and learning. The detection module can use multiple keypoint-based methods (ORB, FREAK, BRISK, SIFT, SURF and more) inside a fallback model, to correctly localize the object frame by frame exploiting the strengths of each method. The learning module updates the object model, with a growing and pruning approach, to account for changes in its appearance and extracts negative samples to further improve the detector performance. To show the effectiveness of the proposed tracking-by-detection algorithm, we present quantitative results on a number of challenging sequences where the target object goes through changes of pose, scale and illumination.

Journal ArticleDOI
TL;DR: Experimental results show that PSIFT outperforms significantly the state-of-the-art ASIFT, SIFT, Random Ferns, Harris-Affine, MSER and Hessian Affine, especially when images suffer severe perspective distortion.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: An automatic classification approach for the Nile Tilapia fish using support vector machines (SVMs) algorithm in conjunction with feature extraction techniques based on Scale Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF) algorithms is introduced.
Abstract: Commonly, aquatic experts use traditional methods such as casting nets or underwater human monitoring for detecting existence and quantities of different species of fish. However, the recent breakthrough in digital cameras and storage abilities, with consequent cost reduction, can be utilized for automatically observing different underwater species. This article introduces an automatic classification approach for the Nile Tilapia fish using support vector machines (SVMs) algorithm in conjunction with feature extraction techniques based on Scale Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF) algorithms. The core of this approach is to apply the feature extraction algorithms in order to describe local features extracted from a set of fish images. Then, the proposed approach classifies the fish images using a number of support vector machines classifiers to differentiate between fish species. Experimental results obtained show that the support vector machines algorithm outperformed other machine learning techniques, such as artificial neural networks (ANN) and k-nearest neighbor (k-NN) algorithms, in terms of the overall classification accuracy.

Journal ArticleDOI
TL;DR: A robust segmentation and an adaptive SURF descriptor are proposed for iris recognition and the proposed approach performs with improved accuracy and reduced computation cost.

Journal ArticleDOI
TL;DR: The framework of the proposed descriptor consists of the following steps: normalizing elliptical neighboring region, transforming to affine scale-space, improving the SIFT descriptor with polar histogram orientation bin, as well as integrating the mirror reflection invariant.

Proceedings ArticleDOI
25 Aug 2013
TL;DR: Experiments show that the Co-HOG based technique clearly outperforms state-of-the-art techniques that use HOG, Scale Invariant Feature Transform (SIFT), and Maximally Stable Extremal Regions (MSER).
Abstract: Scene text recognition is a fundamental step in End-to-End applications where traditional optical character recognition (OCR) systems often fail to produce satisfactory results. This paper proposes a technique that uses co-occurrence histogram of oriented gradients (Co-HOG) to recognize the text in scenes. Compared with histogram of oriented gradients (HOG), Co-HOG is a more powerful tool that captures spatial distribution of neighboring orientation pairs instead of just a single gradient orientation. At the same time, it is more efficient compared with HOG and therefore more suitable for real-time applications. The proposed scene text recognition technique is evaluated on ICDAR2003 character dataset and Street View Text (SVT) dataset. Experiments show that the Co-HOG based technique clearly outperforms state-of-the-art techniques that use HOG, Scale Invariant Feature Transform (SIFT), and Maximally Stable Extremal Regions (MSER).

Proceedings ArticleDOI
01 Sep 2013
TL;DR: This paper will compare the most popular feature descriptors in a typical graph-based VSLAM algorithm using two publicly available datasets to determine the impact of the choice for feature descriptor in terms of accuracy and speed in a realistic scenario.
Abstract: Feature detection and feature description plays an important part in Visual Simultaneous Localization and Mapping (VSLAM). Visual features are commonly used to efficiently estimate the motion of the camera (visual odometry) and link the current image to previously visited parts of the environment (place recognition, loop closure). Gradient histogram-based feature descriptors, like SIFT and SURF, are frequently used for this task. Recently introduced binary descriptors, as BRIEF or BRISK, claim to offer similar capabilities at lower computational cost. In this paper, we will compare the most popular feature descriptors in a typical graph-based VSLAM algorithm using two publicly available datasets to determine the impact of the choice for feature descriptor in terms of accuracy and speed in a realistic scenario.

Journal ArticleDOI
05 Nov 2013-Sensors
TL;DR: A vein pattern extraction method to extract the finger-vein shape and orientation features is proposed and a region-based matching scheme is investigated by employing the Scale Invariant Feature Transform (SIFT) matching method to accommodate the potential local and global variations at the same time.
Abstract: This paper presents a new scheme to improve the performance of finger-vein identification systems. Firstly, a vein pattern extraction method to extract the finger-vein shape and orientation features is proposed. Secondly, to accommodate the potential local and global variations at the same time, a region-based matching scheme is investigated by employing the Scale Invariant Feature Transform (SIFT) matching method. Finally, the finger-vein shape, orientation and SIFT features are combined to further enhance the performance. The experimental results on databases of 426 and 170 fingers demonstrate the consistent superiority of the proposed approach.

Proceedings ArticleDOI
09 Jul 2013
TL;DR: This work presents an online terrain classification system which uses only a monocular camera with a feature-based terrain classification algorithm which is robust to changes in illumination and view points and is successfully applied to the small hexapod robot AMOS II.
Abstract: Legged robots need to be able to classify and recognize different terrains to adapt their gait accordingly. Recent works in terrain classification use different types of sensors (like stereovision, 3D laser range, and tactile sensors) and their combination. However, such sensor systems require more computing power, produce extra load to legged robots, and/or might be difficult to install on a small size legged robot. In this work, we present an online terrain classification system. It uses only a monocular camera with a feature-based terrain classification algorithm which is robust to changes in illumination and view points. For this algorithm, we extract local features of terrains using either Scale Invariant Feature Transform (SIFT) or Speed Up Robust Feature (SURF). We encode the features using the Bag of Words (BoW) technique, and then classify the words using Support Vector Machines (SVMs) with a radial basis function kernel. We compare this feature-based approach with a color-based approach on the Caltech-256 benchmark as well as eight different terrain image sets (grass, gravel, pavement, sand, asphalt, floor, mud, and fine gravel). For terrain images, we observe up to 90% accuracy with the feature-based approach. Finally, this online terrain classification system is successfully applied to our small hexapod robot AMOS II. The output of the system providing terrain information is used as an input to its neural locomotion control to trigger an energy-efficient gait while traversing different terrains.

Posted Content
TL;DR: The goal of this survey is to give an overview of this model and introduce different strategies when building the system based on this model.
Abstract: This article gives a survey for bag-of-words (BoW) or bag-of-features model in image retrieval system. In recent years, large-scale image retrieval shows significant potential in both industry applications and research problems. As local descriptors like SIFT demonstrate great discriminative power in solving vision problems like object recognition, image classification and annotation, more and more state-of-the-art large scale image retrieval systems are trying to rely on them. A common way to achieve this is first quantizing local descriptors into visual words, and then applying scalable textual indexing and retrieval schemes. We call this model as bag-of-words or bag-of-features model. The goal of this survey is to give an overview of this model and introduce different strategies when building the system based on this model.

Patent
03 Apr 2013
TL;DR: In this paper, the authors proposed a remote sensing image registration method of a multi-source sensor, relating to an image processing technology, which consists of the following steps of: respectively carrying out scale-invariant feature transform (SIFT) on a reference image and a registration image, extracting feature points, calculating the nearest Euclidean distances and the nearer Euclidein distances of the feature points in the image to be registered and the reference image, and screening an optimal matching point pair according to a ratio; rejecting error registration points through a random consistency sampling algorithm, screening an
Abstract: The invention provides a remote sensing image registration method of a multi-source sensor, relating to an image processing technology. The remote sensing image registration method comprises the following steps of: respectively carrying out scale-invariant feature transform (SIFT) on a reference image and a registration image, extracting feature points, calculating the nearest Euclidean distances and the nearer Euclidean distances of the feature points in the image to be registered and the reference image, and screening an optimal matching point pair according to a ratio; rejecting error registration points through a random consistency sampling algorithm, and screening an original registration point pair; calculating distribution quality parameters of feature point pairs and selecting effective control point parts with uniform distribution according to a feature point weight coefficient; searching an optimal registration point in control points of the image to be registered according to a mutual information assimilation judging criteria, thus obtaining an optimal registration point pair of the control points; and acquiring a geometric deformation parameter of the image to be registered by polynomial parameter transformation, thus realizing the accurate registration of the image to be registered and the reference image. The remote sensing image registration method provided by the invention has the advantages of high calculation speed and high registration precision, and can meet the registration requirements of a multi-sensor, multi-temporal and multi-view remote sensing image.