scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Object recognition from local scale-invariant features

20 Sep 1999-Vol. 2, pp 1150-1157
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: A new image retrieval technique using local neighborhood difference pattern (LNDP) has been proposed for local features and shows a significant improvement in the proposed method over existing methods.
Abstract: A new image retrieval technique using local neighborhood difference pattern (LNDP) has been proposed for local features. The conventional local binary pattern (LBP) transforms every pixel of image into a binary pattern based on their relationship with neighboring pixels. The proposed feature descriptor differs from local binary pattern as it transforms the mutual relationship of all neighboring pixels in a binary pattern. Both LBP and LNDP are complementary to each other as they extract different information using local pixel intensity. In the proposed work, both LBP and LNDP features are combined to extract the most of the information that can be captured using local intensity differences. To prove the excellence of the proposed method, experiments have been conducted on four different databases of texture images and natural images. The performance has been observed using well-known evaluation measures, precision and recall and compared with some state-of-art local patterns. Comparison shows a significant improvement in the proposed method over existing methods.

123 citations


Cites background from "Object recognition from local scale..."

  • ...SIFT transforms image data into scale-invariant coordinates relative to local features [25]....

    [...]

  • ..., human detection [7], person re-identification [49, 50], object recognition [4, 25], etc....

    [...]

Proceedings Article
25 Jan 2012
TL;DR: A general Bag of Words model is used in order to compare two different classification methods, both K-Nearest-Neighbor and Support-Vector-Machine, and it is observed that the SVM classifier outperformed the KNN classifier.
Abstract: In order for a robot or a computer to perform tasks, it must recognize what it is looking at. Given an image a computer must be able to classify what the image represents. While this is a fairly simple task for humans, it is not an easy task for computers. Computers must go through a series of steps in order to classify a single image. In this paper, we used a general Bag of Words model in order to compare two different classification methods. Both K-Nearest-Neighbor (KNN) and Support-Vector-Machine (SVM) classification are well known and widely used. We were able to observe that the SVM classifier outperformed the KNN classifier. For future work, we hope to use more categories for the objects and to use more sophisticated classifiers.

122 citations


Additional excerpts

  • ...We extracted features using SIFT [8]....

    [...]

Book ChapterDOI
08 Sep 2018
TL;DR: Experiments on KITTI 2015 dataset show that the estimated geometry, 3D motion and moving object masks, not only are constrained to be consistent, but also significantly outperforms other SOTA algorithms, demonstrating the benefits of the approach.
Abstract: Learning to estimate 3D geometry in a single image by watching unlabeled videos via deep convolutional network has made significant process recently. Current state-of-the-art (SOTA) methods, are based on the learning framework of rigid structure-from-motion, where only 3D camera ego motion is modeled for geometry estimation. However, moving objects also exist in many videos, e.g. moving cars in a street scene. In this paper, we tackle such motion by additionally incorporating per-pixel 3D object motion into the learning framework, which provides holistic 3D scene flow understanding and helps single image geometry estimation. Specifically, given two consecutive frames from a video, we adopt a motion network to predict their relative 3D camera pose and a segmentation mask distinguishing moving objects and rigid background. An optical flow network is used to estimate dense 2D per-pixel correspondence. A single image depth network predicts depth maps for both images. The four types of information, i.e. 2D flow, camera pose, segment mask and depth maps, are integrated into a differentiable holistic 3D motion parser (HMP), where per-pixel 3D motion for rigid background and moving objects are recovered. We design various losses w.r.t. the two types of 3D motions for training the depth and motion networks, yielding further error reduction for estimated geometry. Finally, in order to solve the 3D motion confusion from monocular videos, we combine stereo images into joint training. Experiments on KITTI 2015 dataset show that our estimated geometry, 3D motion and moving object masks, not only are constrained to be consistent, but also significantly outperforms other SOTA algorithms, demonstrating the benefits of our approach.

122 citations


Cites result from "Object recognition from local scale..."

  • ...58,1,40] yields more robust matching and shows additional improvement on depth estimation. Structural matching has long been a center area for computer vision or optical flow based on SIFT [59] or HOG [60] descriptors. Most recently, unsupervised learning of dense matching [8] using deep CNN which integrates local and global context achieves impressive results according to the KITTI benchmark 1. In our...

    [...]

Proceedings ArticleDOI
23 Jun 2008
TL;DR: This paper presents a comprehensive extension of the Scale Invariant Feature Transform (SIFT), originally introduced in 2D, to volumetric images, and achieves, for the first time, full 3D orientation invariance of the descriptors, which is essential for 3D feature matching.
Abstract: This paper presents a comprehensive extension of the Scale Invariant Feature Transform (SIFT), originally introduced in 2D, to volumetric images. While tackling the significant computational efforts required by such multiscale processing of large data volumes, our implementation addresses two important mathematical issues related to the 2D-to-3D extension. It includes efficient steps to filter out extracted point candidates that have low contrast or are poorly localized along edges or ridges. In addition, it achieves, for the first time, full 3D orientation invariance of the descriptors, which is essential for 3D feature matching. An application of this technique is demonstrated to the feature-based automated registration and segmentation of clinical datasets in the context of radiation therapy.

122 citations


Cites methods from "Object recognition from local scale..."

  • ...It can be used, for example, for feature-based image registration [1, 11], object recognition [9], image segmentation, atlas generation and variability analysis [14], and image retrieval in databases....

    [...]

  • ...Extending from [9], it is efficient to detect stable feature point locations in the 4D scale space using extrema out of the convolution of the difference-of-Gaussian (DoG) function with the image, D(x, y, z, kσ)....

    [...]

Journal ArticleDOI
TL;DR: A general and comprehensive overview of the state of the art in the field of self-contained, i.e., GPS denied odometry systems, and identifies the out-coming challenges that demand further research in future are provided.
Abstract: The development of a navigation system is one of the major challenges in building a fully autonomous platform. Full autonomy requires a dependable navigation capability not only in a perfect situation with clear GPS signals but also in situations, where the GPS is unreliable. Therefore, self-contained odometry systems have attracted much attention recently. This paper provides a general and comprehensive overview of the state of the art in the field of self-contained, i.e., GPS denied odometry systems, and identifies the out-coming challenges that demand further research in future. Self-contained odometry methods are categorized into five main types, i.e., wheel, inertial, laser, radar, and visual, where such categorization is based on the type of the sensor data being used for the odometry. Most of the research in the field is focused on analyzing the sensor data exhaustively or partially to extract the vehicle pose. Different combinations and fusions of sensor data in a tightly/loosely coupled manner and with filtering or optimizing fusion method have been investigated. We analyze the advantages and weaknesses of each approach in terms of different evaluation metrics, such as performance, response time, energy efficiency, and accuracy, which can be a useful guideline for researchers and engineers in the field. In the end, some future research challenges in the field are discussed.

122 citations


Cites methods from "Object recognition from local scale..."

  • ...[64] N. M. Suaib, M. H. Marhaban, M. I. Saripan, and S. A. Ahmad, ‘‘Performance evaluation of feature detection and feature matching for stereo visual odometry using SIFT and SURF,’’ in Proc....

    [...]

  • ...[99] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, ‘‘Orb: An efficient alternative to SIFT or SURF,’’ in Proc....

    [...]

  • ...Consequently, several proposed methods are based on corners, for instance, Harris detector [53], SIFT [96], SURF [97], FAST [98], and ORB [99])....

    [...]

  • ...In [40], the amplitude gridmap accumulated from the radar scan is transformed into a grayscale image and then interesting points are detected using feature extraction techniques, e.g, SIFT....

    [...]

  • ...In addition, the feature detection and description are based on ORB rather than using a more robust but slow descriptor such as SIFT and FAST....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: In this paper, color histograms of multicolored objects provide a robust, efficient cue for indexing into a large database of models, and they can differentiate among a large number of objects.
Abstract: Computer vision is moving into a new era in which the aim is to develop visual skills for robots that allow them to interact with a dynamic, unconstrained environment. To achieve this aim, new kinds of vision algorithms need to be developed which run in real time and subserve the robot's goals. Two fundamental goals are determining the identity of an object with a known location, and determining the location of a known object. Color can be successfully used for both tasks. This dissertation demonstrates that color histograms of multicolored objects provide a robust, efficient cue for indexing into a large database of models. It shows that color histograms are stable object representations in the presence of occlusion and over change in view, and that they can differentiate among a large number of objects. For solving the identification problem, it introduces a technique called Histogram Intersection, which matches model and image histograms and a fast incremental version of Histogram Intersection which allows real-time indexing into a large database of stored models. It demonstrates techniques for dealing with crowded scenes and with models with similar color signatures. For solving the location problem it introduces an algorithm called Histogram Backprojection which performs this task efficiently in crowded scenes.

5,672 citations

Journal ArticleDOI
TL;DR: It is shown how the boundaries of an arbitrary non-analytic shape can be used to construct a mapping between image space and Hough transform space, which makes the generalized Houghtransform a kind of universal transform which can beused to find arbitrarily complex shapes.

4,310 citations

Journal ArticleDOI
TL;DR: A near real-time recognition system with 20 complex objects in the database has been developed and a compact representation of object appearance is proposed that is parametrized by pose and illumination.
Abstract: The problem of automatically learning object models for recognition and pose estimation is addressed. In contrast to the traditional approach, the recognition problem is formulated as one of matching appearance rather than shape. The appearance of an object in a two-dimensional image depends on its shape, reflectance properties, pose in the scene, and the illumination conditions. While shape and reflectance are intrinsic properties and constant for a rigid object, pose and illumination vary from scene to scene. A compact representation of object appearance is proposed that is parametrized by pose and illumination. For each object of interest, a large set of images is obtained by automatically varying pose and illumination. This image set is compressed to obtain a low-dimensional subspace, called the eigenspace, in which the object is represented as a manifold. Given an unknown input image, the recognition system projects the image to eigenspace. The object is recognized based on the manifold it lies on. The exact position of the projection on the manifold determines the object's pose in the image. A variety of experiments are conducted using objects with complex appearance characteristics. The performance of the recognition and pose estimation algorithms is studied using over a thousand input images of sample objects. Sensitivity of recognition to the number of eigenspace dimensions and the number of learning samples is analyzed. For the objects used, appearance representation in eigenspaces with less than 20 dimensions produces accurate recognition results with an average pose estimation error of about 1.0 degree. A near real-time recognition system with 20 complex objects in the database has been developed. The paper is concluded with a discussion on various issues related to the proposed learning and recognition methodology.

2,037 citations

Journal ArticleDOI
TL;DR: This paper addresses the problem of retrieving images from large image databases with a method based on local grayvalue invariants which are computed at automatically detected interest points and allows for efficient retrieval from a database of more than 1,000 images.
Abstract: This paper addresses the problem of retrieving images from large image databases. The method is based on local grayvalue invariants which are computed at automatically detected interest points. A voting algorithm and semilocal constraints make retrieval possible. Indexing allows for efficient retrieval from a database of more than 1,000 images. Experimental results show correct retrieval in the case of partial visibility, similarity transformations, extraneous features, and small perspective deformations.

1,756 citations


"Object recognition from local scale..." refers background or methods in this paper

  • ...This allows for the use of more distinctive image descriptors than the rotation-invariant ones used by Schmid and Mohr, and the descriptor is further modified to improve its stability to changes in affine projection and illumination....

    [...]

  • ...For the object recognition problem, Schmid & Mohr [19] also used the Harris corner detector to identify interest points, and then created a local image descriptor at each interest point from an orientation-invariant vector of derivative-of-Gaussian image measurements....

    [...]

  • ..., Schmid & Mohr [19]) has shown that efficient recognition can often be achieved by using local image descriptors sampled at a large number of repeatable locations....

    [...]

  • ...However, recent research on the use of dense local features (e.g., Schmid & Mohr [19]) has shown that efficient recognition can often be achieved by using local image descriptors sampled at a large number of repeatable locations....

    [...]

Journal ArticleDOI
TL;DR: A robust approach to image matching by exploiting the only available geometric constraint, namely, the epipolar constraint, is proposed and a new strategy for updating matches is developed, which only selects those matches having both high matching support and low matching ambiguity.

1,574 citations


"Object recognition from local scale..." refers methods in this paper

  • ...[23] used the Harris corner detector to identify feature locations for epipolar alignment of images taken from differing viewpoints....

    [...]