scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Object recognition from local scale-invariant features

20 Sep 1999-Vol. 2, pp 1150-1157
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

Content maybe subject to copyright    Report

Citations
More filters
Book
13 Sep 2013
TL;DR: In this article, the authors provide a tutorial on the relevant physical phenomena governing the operation and design of microrobots, as well as a survey of existing approaches to microbot design and control.
Abstract: The field of microrobotics has seen tremendous advances in recent years. The principles governing the design of such submillimeter scale robots rely on an understanding of microscale physics, fabrication, and novel control strategies. This monograph provides a tutorial on the relevant physical phenomena governing the operation and design of microrobots, as well as a survey of existing approaches to microrobot design and control. It also provides a detailed practical overview of actuation and control methods that are commonly used to remotely power these designs, as well as a discussion of possible future research directions. Potential high-impact applications of untethered microrobots such as minimally invasive diagnosis and treatment inside the human body, biological studies or bioengineering, microfluidics, desktop micromanufacturing, and mobile sensor networks for environmental and health monitoring are reported.

129 citations


Cites methods from "Object recognition from local scale..."

  • ...A scale invariant feature transform (SIFT) is then used to track the object under different image magnifications and rotations [126, 127]....

    [...]

Proceedings ArticleDOI
10 Mar 2007
TL;DR: In this paper, the authors present a framework for interactive task training of a mobile robot where the robot learns how to do various tasks while observing a human, and the robot listens to the human's speech and interprets the speech as behaviors that are required to be executed.
Abstract: Effective human/robot interfaces which mimic how humans interact with one another could ultimately lead to robots being accepted in a wider domain of applications. We present a framework for interactive task training of a mobile robot where the robot learns how to do various tasks while observing a human. In addition to observation, the robot listens to the human's speech and interprets the speech as behaviors that are required to be executed. This is especially important where individual steps of a given task may have contingencies that have to be dealt with depending on the situation. Finally, the context of the location where the task takes place and the people present factor heavily into the robot's interpretation of how to execute the task. In this paper, we describe the task training framework, describe how environmental context and communicative dialog with the human help the robot learn the task, and illustrate the utility of this approach with several experimental case studies.

128 citations

Proceedings ArticleDOI
02 Nov 2015
TL;DR: A fully automated tunnel assessment approach is proposed; using the raw input from a single monocular camera the authors hierarchically construct complex features, exploiting the advantages of deep learning architectures, and achieves very fast predictions due to the feedforward nature of Convolutional Neural Networks and Multi-Layer Perceptrons.
Abstract: The inspection, assessment, maintenance and safe operation of the existing civil infrastructure consists one of the major challenges facing engineers today. Such work requires either manual approaches, which are slow and yield subjective results, or automated approaches, which depend upon complex handcrafted features. Yet, for the latter case, it is rarely known in advance which features are important for the problem at hand. In this paper, we propose a fully automated tunnel assessment approach; using the raw input from a single monocular camera we hierarchically construct complex features, exploiting the advantages of deep learning architectures. Obtained features are used to train an appropriate defect detector. In particular, we exploit a Convolutional Neural Network to construct high-level features and as a detector we choose to use a Multi-Layer Perceptron due to its global function approximation properties. Such an approach achieves very fast predictions due to the feedforward nature of Convolutional Neural Networks and Multi-Layer Perceptrons.

128 citations


Cites methods from "Object recognition from local scale..."

  • ...Specifically, we use 12 filters with orientations 0◦, 30◦, 60◦ and 90◦ and frequencies 0.0, 0.1 and 0.4....

    [...]

Journal ArticleDOI
TL;DR: Among the strategies for dense 3D reconstruction, using the presented method for solving the scale problem and PMVS on the images captured with two DSLR cameras resulted in a dense point cloud as accurate as the Nikon laser scanner dataset.
Abstract: Photogrammetric methods for dense 3D surface reconstruction are increasingly available to both professional and amateur users who have requirements that span a wide variety of applications. One of the key concerns in choosing an appropriate method is to understand the achievable accuracy and how choices made within the workflow can alter that outcome. In this paper we consider accuracy in two components: the ability to generate a correctly scaled 3D model; and the ability to automatically deliver a high quality data set that provides good agreement to a reference surface. The determination of scale information is particularly important, since a network of images usually only provides angle measurements and thus leads to unscaled geometry. A solution is the introduction of known distances in object space, such as base lines between camera stations or distances between control points. In order to avoid using known object distances, the method presented in this paper exploits a calibrated stereo camera utilizing the calibrated base line information from the camera pair as an observational based geometric constraint. The method provides distance information throughout the object volume by orbiting the object. In order to test the performance of this approach, four topical surface matching methods have been investigated to determine their ability to produce accurate, dense point clouds. The methods include two versions of Semi-Global Matching as well as MicMac and Patch-based Multi-View Stereo (PMVS). These methods are implemented on a set of stereo images captured from four carefully selected objects by using (1) an off-the-shelf low cost 3D camera and (2) a pair of Nikon D700 DSLR cameras rigidly mounted in close proximity to each other. Inter-comparisons demonstrate the subtle differences between each of these permutations. The point clouds are also compared to a dataset obtained with a Nikon MMD laser scanner. Finally, the established process of achieving accurate point clouds from images and known object space distances are compared with the presented strategies. Results from the matching demonstrate that if a good imaging network is provided, using a stereo camera and bundle adjustment with geometric constraints can effectively resolve the scale. Among the strategies for dense 3D reconstruction, using the presented method for solving the scale problem and PMVS on the images captured with two DSLR cameras resulted in a dense point cloud as accurate as the Nikon laser scanner dataset.

128 citations


Cites methods from "Object recognition from local scale..."

  • ...As the first step, the Scale Invariant Feature Transform (SIFT) detector and descriptor are used to find keypoint locations and provide a local descriptor for each keypoint (Lowe, 1999)....

    [...]

Journal ArticleDOI
TL;DR: A new humanoid robot currently being developed for applications in human-centred environments is presented, consisting of a motion planner for the generation of collision-free paths and a vision system for the recognition and localization of a subset of household objects as well as a grasp analysis component which provides the most feasible grasp configurations for each object.

128 citations


Cites methods from "Object recognition from local scale..."

  • ...We have tested three different features respectively descriptors: Shi-Tomasi features and representing a patch by a view set [22,33], the Maximally Stable Extremal Regions (MSER) in combination with the Local Affine Frames (LAF) as presented in [25], and the SIFT features [18]....

    [...]

  • ...In the following, we present our system for the recognition and localization of textured objects, which builds on top of the approach proposed in [18]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: In this paper, color histograms of multicolored objects provide a robust, efficient cue for indexing into a large database of models, and they can differentiate among a large number of objects.
Abstract: Computer vision is moving into a new era in which the aim is to develop visual skills for robots that allow them to interact with a dynamic, unconstrained environment. To achieve this aim, new kinds of vision algorithms need to be developed which run in real time and subserve the robot's goals. Two fundamental goals are determining the identity of an object with a known location, and determining the location of a known object. Color can be successfully used for both tasks. This dissertation demonstrates that color histograms of multicolored objects provide a robust, efficient cue for indexing into a large database of models. It shows that color histograms are stable object representations in the presence of occlusion and over change in view, and that they can differentiate among a large number of objects. For solving the identification problem, it introduces a technique called Histogram Intersection, which matches model and image histograms and a fast incremental version of Histogram Intersection which allows real-time indexing into a large database of stored models. It demonstrates techniques for dealing with crowded scenes and with models with similar color signatures. For solving the location problem it introduces an algorithm called Histogram Backprojection which performs this task efficiently in crowded scenes.

5,672 citations

Journal ArticleDOI
TL;DR: It is shown how the boundaries of an arbitrary non-analytic shape can be used to construct a mapping between image space and Hough transform space, which makes the generalized Houghtransform a kind of universal transform which can beused to find arbitrarily complex shapes.

4,310 citations

Journal ArticleDOI
TL;DR: A near real-time recognition system with 20 complex objects in the database has been developed and a compact representation of object appearance is proposed that is parametrized by pose and illumination.
Abstract: The problem of automatically learning object models for recognition and pose estimation is addressed. In contrast to the traditional approach, the recognition problem is formulated as one of matching appearance rather than shape. The appearance of an object in a two-dimensional image depends on its shape, reflectance properties, pose in the scene, and the illumination conditions. While shape and reflectance are intrinsic properties and constant for a rigid object, pose and illumination vary from scene to scene. A compact representation of object appearance is proposed that is parametrized by pose and illumination. For each object of interest, a large set of images is obtained by automatically varying pose and illumination. This image set is compressed to obtain a low-dimensional subspace, called the eigenspace, in which the object is represented as a manifold. Given an unknown input image, the recognition system projects the image to eigenspace. The object is recognized based on the manifold it lies on. The exact position of the projection on the manifold determines the object's pose in the image. A variety of experiments are conducted using objects with complex appearance characteristics. The performance of the recognition and pose estimation algorithms is studied using over a thousand input images of sample objects. Sensitivity of recognition to the number of eigenspace dimensions and the number of learning samples is analyzed. For the objects used, appearance representation in eigenspaces with less than 20 dimensions produces accurate recognition results with an average pose estimation error of about 1.0 degree. A near real-time recognition system with 20 complex objects in the database has been developed. The paper is concluded with a discussion on various issues related to the proposed learning and recognition methodology.

2,037 citations

Journal ArticleDOI
TL;DR: This paper addresses the problem of retrieving images from large image databases with a method based on local grayvalue invariants which are computed at automatically detected interest points and allows for efficient retrieval from a database of more than 1,000 images.
Abstract: This paper addresses the problem of retrieving images from large image databases. The method is based on local grayvalue invariants which are computed at automatically detected interest points. A voting algorithm and semilocal constraints make retrieval possible. Indexing allows for efficient retrieval from a database of more than 1,000 images. Experimental results show correct retrieval in the case of partial visibility, similarity transformations, extraneous features, and small perspective deformations.

1,756 citations


"Object recognition from local scale..." refers background or methods in this paper

  • ...This allows for the use of more distinctive image descriptors than the rotation-invariant ones used by Schmid and Mohr, and the descriptor is further modified to improve its stability to changes in affine projection and illumination....

    [...]

  • ...For the object recognition problem, Schmid & Mohr [19] also used the Harris corner detector to identify interest points, and then created a local image descriptor at each interest point from an orientation-invariant vector of derivative-of-Gaussian image measurements....

    [...]

  • ..., Schmid & Mohr [19]) has shown that efficient recognition can often be achieved by using local image descriptors sampled at a large number of repeatable locations....

    [...]

  • ...However, recent research on the use of dense local features (e.g., Schmid & Mohr [19]) has shown that efficient recognition can often be achieved by using local image descriptors sampled at a large number of repeatable locations....

    [...]

Journal ArticleDOI
TL;DR: A robust approach to image matching by exploiting the only available geometric constraint, namely, the epipolar constraint, is proposed and a new strategy for updating matches is developed, which only selects those matches having both high matching support and low matching ambiguity.

1,574 citations


"Object recognition from local scale..." refers methods in this paper

  • ...[23] used the Harris corner detector to identify feature locations for epipolar alignment of images taken from differing viewpoints....

    [...]