scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Object recognition from local scale-invariant features

20 Sep 1999-Vol. 2, pp 1150-1157
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
11 Nov 2010
TL;DR: A SIFT algorithm adapted for 3D surfaces (called meshSIFT) and its applications to 3D face pose normalisation and recognition that outperform most other algorithms found in literature.
Abstract: This paper presents a SIFT algorithm adapted for 3D surfaces (called meshSIFT) and its applications to 3D face pose normalisation and recognition. The algorithm allows reliable detection of scale space extrema as local feature locations. The scale space contains the mean curvature in each vertex on different smoothed versions of the input mesh. The meshSIFT algorithm then describes the neighbourhood of every scale space extremum in a feature vector consisting of concatenated histograms of shape indices and slant angles. The feature vectors are reliably matched by comparing the angle in feature space. Using RANSAC, the best rigid transformation can be estimated based on the matched features leading to 84% correct pose normalisation of 3D faces from the Bosphorus database. Matches are mostly found between two face surfaces of the same person, allowing the algorithm to be used for 3D face recognition. Simply counting the number of matches allows 93.7% correct identification for face surfaces in the Bosphorus database and 97.7% when only frontal images are considered. In the verification scenario, we obtain an equal error rate of 15.0% to 5.1% (depending on the investigated face surfaces). These results outperform most other algorithms found in literature.

122 citations


Cites methods from "Object recognition from local scale..."

  • ...This ratio was determined emperically, and is the same as in the original SIFT algorithm [20]....

    [...]

  • ...The use of meshSIFT for 3D face recognition is a natural way to compare faces based on characteristic features in the human face [20]....

    [...]

Proceedings ArticleDOI
05 Jan 2015
TL;DR: This work presents an automated computer vision system for logging food and calorie intake using images and introduces a key insight that addresses this problem specifically: restaurant plates are often both nutritionally and visually consistent across many servings.
Abstract: Logging food and calorie intake has been shown to facilitate weight management. Unfortunately, current food logging methods are time-consuming and cumbersome, which limits their effectiveness. To address this limitation, we present an automated computer vision system for logging food and calorie intake using images. We focus on the "restaurant" scenario, which is often a challenging aspect of diet management. We introduce a key insight that addresses this problem specifically: restaurant plates are often both nutritionally and visually consistent across many servings. This insight provides a path to robust calorie estimation from a single RGB photograph: using a database of known food items together with restaurant-specific classifiers, calorie estimation can be achieved through identification followed by calorie lookup. As demonstrated on a challenging Menu-Match dataset and an existing third party dataset, our approach outperforms previous computer vision methods and a commercial calorie estimation app. Our Menu-Match dataset of realistic restaurant meals is made publicly available.

122 citations


Cites background from "Object recognition from local scale..."

  • ...The gradient-based HOG [11] and SIFT [24] that are widely used for object recognition are weaker, supporting the intuition that texture and color are the most useful features to describe food images....

    [...]

  • ...For SIFT they use sparse coding, and mean pooling across the whole images plane....

    [...]

  • ...The improved pooling and encoding scheme may explain why our SIFT descriptor is significantly stronger....

    [...]

  • ...The SIFT base feature was extracted at patch sizes of 8, 16, and 24 pixels at each location in the image....

    [...]

  • ...In the first step, five types of base features are extracted from the images: color [19], histogram of oriented gradients (HOG) [11], scale-invariant feature transforms (SIFT) [24], local binary patterns (LBP) [27], and filter responses from the MR8 filter bank [33]....

    [...]

Proceedings ArticleDOI
04 Jun 2008
TL;DR: This work presents an approach that is able to distinguish between multiple weather situations based on the classification of single monocular color images, without any additional assumptions or prior knowledge.
Abstract: Present vision based driver assistance systems are designed to perform under good-natured weather conditions. However, limited visibility caused by heavy rain or fog strongly affects vision systems. To improve machine vision in bad weather situations, a reliable detection system is necessary as a ground base. We present an approach that is able to distinguish between multiple weather situations based on the classification of single monocular color images, without any additional assumptions or prior knowledge. The proposed image descriptor clearly outperforms existing descriptors for that task. Experimental results on real traffic images are characterized by high accuracy, efficiency, and versatility with respect to driver assistance systems.

122 citations


Cites background from "Object recognition from local scale..."

  • ...In oder to benchmark its performance, we additionally extracted color wavelets as well as a combination of SIFT features and color histograms and compared the classification results....

    [...]

  • ...Different kinds of local features have been proposed with histogram-based features like SIFT [12], HOG [3], and shape context [1] being among the most discriminant....

    [...]

Proceedings ArticleDOI
01 Jan 2010
TL;DR: A head-mounted, stereo-vision based navigational assistance device for the visually impaired that enables subjects to stand and scan the scene for integrating wide-field information, compared to shoulder or waist-mounted designs in literature which require body rotations.
Abstract: We present a head-mounted, stereo-vision based navigational assistance device for the visually impaired. The head-mounted design enables our subjects to stand and scan the scene for integrating wide-field information, compared to shoulder or waist-mounted designs in literature which require body rotations. In order to extract and maintain orientation information for creating a sense of egocentricity in blind users, we incorporate visual odometry and feature based metric-topological SLAM into our system. Using camera pose estimates with dense 3D data obtained from stereo triangulation, we build a vicinity map of the user's environment. On this map, we perform 3D traversability analysis to steer subjects away from obstacles in the path. A tactile interface consisting of microvibration motors provides cues for taking evasive action, as determined by our vision processing algorithms. We report experimental results of our system (running at 10 Hz) and conduct mobility tests with blindfolded subjects to demonstrate the usefulness of our approach over conventional navigational aids like the white cane.

121 citations


Cites methods from "Object recognition from local scale..."

  • ...The local submap level estimates state information corresponding to the six dimensional camera trajectory st and sparse map mt, given feature observations (KLT/SIFT) zt and camera motion estimates ut collected until the current time t....

    [...]

  • ...The SLAM implementation is a Rao-Blackwellised particle filter (RBPF) [25] in a FastSLAM [24, 23] framework using a combination of KLT [20] and SIFT [19] tracking to solve for data association....

    [...]

  • ...KLT [20] and SIFT [19] tracking to solve for data association....

    [...]

Book ChapterDOI
01 Jan 2007
TL;DR: A hand posture recognition system using the discrete Adaboost learning algorithm with Lowe’s scale invariant feature transform (SIFT) features is proposed to tackle the degraded performance due to background noise in training images and the in-plane rotation variant detection.
Abstract: Hand posture understanding is essential to human robot interaction. The existing hand detection approaches using a Viola-Jones detector have two fundamental issues, the degraded performance due to background noise in training images and the in-plane rotation variant detection. In this paper, a hand posture recognition system using the discrete Adaboost learning algorithm with Lowe’s scale invariant feature transform (SIFT) features is proposed to tackle these issues simultaneously. In addition, we apply a sharing feature concept to increase the accuracy of multi-class hand posture recognition. The experimental results demonstrate that the proposed approach successfully recognizes three hand posture classes and can deal with the background noise issues. Our detector is in-plane rotation invariant, and achieves satisfactory multi-view hand detection.

121 citations


Cites methods from "Object recognition from local scale..."

  • ...Lowe also provided a matching algorithm for recognize the same object in different images....

    [...]

  • ...The Scale Invariant Feature Transform (SIFT) feature introduced by Lowe [7] consists of a histogram representing gradient orientation and magnitude information within a small image patch....

    [...]

  • ...1 SIFT The Scale Invariant Feature Transform (SIFT) feature introduced by Lowe [7] consists of a histogram representing gradient orientation and magnitude information within a small image patch....

    [...]

  • ...In this paper, a hand posture recognition system using the discrete Adaboost learning algorithm with Lowe’s scale invariant feature transform (SIFT) features is proposed to tackle these issues simultaneously....

    [...]

  • ...In this paper, a discrete Adaboost learning algorithm with Lowe’s SIFT features [8] is proposed and applied to achieve inplane rotation invariant hand detection....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: In this paper, color histograms of multicolored objects provide a robust, efficient cue for indexing into a large database of models, and they can differentiate among a large number of objects.
Abstract: Computer vision is moving into a new era in which the aim is to develop visual skills for robots that allow them to interact with a dynamic, unconstrained environment. To achieve this aim, new kinds of vision algorithms need to be developed which run in real time and subserve the robot's goals. Two fundamental goals are determining the identity of an object with a known location, and determining the location of a known object. Color can be successfully used for both tasks. This dissertation demonstrates that color histograms of multicolored objects provide a robust, efficient cue for indexing into a large database of models. It shows that color histograms are stable object representations in the presence of occlusion and over change in view, and that they can differentiate among a large number of objects. For solving the identification problem, it introduces a technique called Histogram Intersection, which matches model and image histograms and a fast incremental version of Histogram Intersection which allows real-time indexing into a large database of stored models. It demonstrates techniques for dealing with crowded scenes and with models with similar color signatures. For solving the location problem it introduces an algorithm called Histogram Backprojection which performs this task efficiently in crowded scenes.

5,672 citations

Journal ArticleDOI
TL;DR: It is shown how the boundaries of an arbitrary non-analytic shape can be used to construct a mapping between image space and Hough transform space, which makes the generalized Houghtransform a kind of universal transform which can beused to find arbitrarily complex shapes.

4,310 citations

Journal ArticleDOI
TL;DR: A near real-time recognition system with 20 complex objects in the database has been developed and a compact representation of object appearance is proposed that is parametrized by pose and illumination.
Abstract: The problem of automatically learning object models for recognition and pose estimation is addressed. In contrast to the traditional approach, the recognition problem is formulated as one of matching appearance rather than shape. The appearance of an object in a two-dimensional image depends on its shape, reflectance properties, pose in the scene, and the illumination conditions. While shape and reflectance are intrinsic properties and constant for a rigid object, pose and illumination vary from scene to scene. A compact representation of object appearance is proposed that is parametrized by pose and illumination. For each object of interest, a large set of images is obtained by automatically varying pose and illumination. This image set is compressed to obtain a low-dimensional subspace, called the eigenspace, in which the object is represented as a manifold. Given an unknown input image, the recognition system projects the image to eigenspace. The object is recognized based on the manifold it lies on. The exact position of the projection on the manifold determines the object's pose in the image. A variety of experiments are conducted using objects with complex appearance characteristics. The performance of the recognition and pose estimation algorithms is studied using over a thousand input images of sample objects. Sensitivity of recognition to the number of eigenspace dimensions and the number of learning samples is analyzed. For the objects used, appearance representation in eigenspaces with less than 20 dimensions produces accurate recognition results with an average pose estimation error of about 1.0 degree. A near real-time recognition system with 20 complex objects in the database has been developed. The paper is concluded with a discussion on various issues related to the proposed learning and recognition methodology.

2,037 citations

Journal ArticleDOI
TL;DR: This paper addresses the problem of retrieving images from large image databases with a method based on local grayvalue invariants which are computed at automatically detected interest points and allows for efficient retrieval from a database of more than 1,000 images.
Abstract: This paper addresses the problem of retrieving images from large image databases. The method is based on local grayvalue invariants which are computed at automatically detected interest points. A voting algorithm and semilocal constraints make retrieval possible. Indexing allows for efficient retrieval from a database of more than 1,000 images. Experimental results show correct retrieval in the case of partial visibility, similarity transformations, extraneous features, and small perspective deformations.

1,756 citations


"Object recognition from local scale..." refers background or methods in this paper

  • ...This allows for the use of more distinctive image descriptors than the rotation-invariant ones used by Schmid and Mohr, and the descriptor is further modified to improve its stability to changes in affine projection and illumination....

    [...]

  • ...For the object recognition problem, Schmid & Mohr [19] also used the Harris corner detector to identify interest points, and then created a local image descriptor at each interest point from an orientation-invariant vector of derivative-of-Gaussian image measurements....

    [...]

  • ..., Schmid & Mohr [19]) has shown that efficient recognition can often be achieved by using local image descriptors sampled at a large number of repeatable locations....

    [...]

  • ...However, recent research on the use of dense local features (e.g., Schmid & Mohr [19]) has shown that efficient recognition can often be achieved by using local image descriptors sampled at a large number of repeatable locations....

    [...]

Journal ArticleDOI
TL;DR: A robust approach to image matching by exploiting the only available geometric constraint, namely, the epipolar constraint, is proposed and a new strategy for updating matches is developed, which only selects those matches having both high matching support and low matching ambiguity.

1,574 citations


"Object recognition from local scale..." refers methods in this paper

  • ...[23] used the Harris corner detector to identify feature locations for epipolar alignment of images taken from differing viewpoints....

    [...]