scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Object recognition from local scale-invariant features

20 Sep 1999-Vol. 2, pp 1150-1157
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The authors propose an automatic detection and classification method for sewer defects based on hierarchical deep learning based on a two-level hierarchical deep convolutional neural network, which shows high performance with respect to classification accuracy.
Abstract: Video and image sources are frequently applied in the area of defect inspection in industrial community. For the recognition and classification of sewer defects, a significant number of videos and images of sewers are collected. These data are then checked by human and some traditional methods to recognize and classify the sewer defects, which is inefficient and error-prone. Previously developed features like SIFT are unable to comprehensively represent such defects. Therefore, feature representation is especially important for defect autoclassification. In this paper, we study the automatic extraction of feature representation for sewer defects via deep learning. Moreover, a complete automatic system for classifying sewer defects is proposed built on a two-level hierarchical deep convolutional neural network, which shows high performance with respect to classification accuracy. The proposed network is trained on a novel data set with over 40 000 sewer images. The system has been successfully applied in the practical production, confirming its robustness and feasibility to real-world applications. The source code and trained model are available at the project website. 1 Note to Practitioners —Automatic defect inspection has become a fundamental research topic in engineering application field. Specifically, sewer defect detection is an important measure for maintenance, renewal, and rehabilitation activities of sewer infrastructure. In the current operation procedure, all the captured videos need to be inspected by experts frame by frame to recognize defects, yielding a significant low inspection rate with a significant amount of time. Previous work has attempted to employ traditional image processing methods for automated sewer defect classification. However, these methods get poor generalization capabilities since they use pre-engineered features. In most cases, sewerage inspection companies have to hire numerous professional inspectors to do this job, thereby consuming a lot of human and material resources. To address this problem, the authors propose an automatic detection and classification method for sewer defects based on hierarchical deep learning. Demonstrated by various experiments, the designed framework achieves a high defect classification accuracy, which can be easily integrated into an automatic sewer defect inspection system. 1 https://github.com/NUAAXQ/SewerDefectDetection

104 citations

Journal ArticleDOI
TL;DR: The results reinforce the remarkable diversity of the TcR repertoire, resulting in many diverse private TcRs contributing to the T-cell response even in genetically identical mice responding to the same antigen.
Abstract: Motivation: The clonal theory of adaptive immunity proposes that immunological responses are encoded by increases in the frequency of lymphocytes carrying antigen-specific receptors. In this study, we measure the frequency of different T-cell receptors (TcR) in CD4 + T cell populations of mice immunized with a complex antigen, killed Mycobacterium tuberculosis, using high throughput parallel sequencing of the TcRβ chain. Our initial hypothesis that immunization would induce repertoire convergence proved to be incorrect, and therefore an alternative approach was developed that allows accurate stratification of TcR repertoires and provides novel insights into the nature of CD4 + T-cell receptor recognition. Results: To track the changes induced by immunization within this heterogeneous repertoire, the sequence data were classified by counting the frequency of different clusters of short (3 or 4) continuous stretches of amino acids within the antigen binding complementarity determining region 3 (CDR3) repertoire of different mice. Both unsupervised (hierarchical clustering) and supervised (support vector machine) analyses of these different distributions of sequence clusters differentiated between immunized and unimmunized mice with 100% efficiency. The CD4 + TcR repertoires of mice 5 and 14 days postimmunization were clearly different from that of unimmunized mice but were not distinguishable from each other. However, the repertoires of mice 60 days postimmunization were distinct both from naive mice and the day 5/14 animals. Our results reinforce the remarkable diversity of the TcR repertoire, resulting in many diverse private TcRs contributing to the T-cell response even in genetically identical mice responding to the same antigen. However, specific motifs defined by short stretches of amino acids within the CDR3 region may determine TcR specificity and define a new approach to TcR sequence classification. Availability and implementation: The analysis was implemented in R and Python, and source code can be found in Supplementary Data. Contact: ku.ca.lcu@niahc.b Supplementary information: Supplementary data are available at Bioinformatics online.

104 citations


Cites background or methods from "Object recognition from local scale..."

  • ...These can be individual words of text, image features or any other simple descriptive features [see e.g. (Csurka et al., 2004; Joachims, 1998; Lowe, 1999)]....

    [...]

  • ...In this study, we develop an approach based on the well-studied bag-of-words (BOW) (Csurka et al., 2004; Joachims, 1998; Lowe, 1999) algorithm to categorize and classify sets of TcR sequences from immunized and unimmunized mice at different times postimmunization....

    [...]

01 Sep 2006
TL;DR: The research begins by rigorously describing the imaging and navigation problem and developing practical models of the sensors, then presenting a transformation technique to detect features within an image, which utilizes inertial measurements to predict vectors in the feature space between images.
Abstract: : The motivation of this research is to address the limitations of satellite-based navigation by fusing imaging and inertial systems. The research begins by rigorously describing the imaging and navigation problem and developing practical models of the sensors, then presenting a transformation technique to detect features within an image. Given a set of features, a statistical feature projection technique is developed which utilizes inertial measurements to predict vectors in the feature space between images. This coupling of the imaging and inertial sensors at a deep level is then used to aid the statistical feature matching function. The feature matches and inertial measurements are then used to estimate the navigation trajectory using an extended Kalman filter. After accomplishing a proper calibration, the image-aided inertial navigation algorithm is then tested using a combination of simulation and ground tests using both tactical and consumer- grade inertial sensors. While limitations of the Kalman filter are identified, the experimental results demonstrate a navigation performance improvement of at least two orders of magnitude over the respective inertial-only solutions.

103 citations


Cites methods from "Object recognition from local scale..."

  • ...An example is the scale-invariant feature tracker (SIFT) method developed by Lowe [33]....

    [...]

  • ...The method presented by Lowe 88 builds a histogram of gradient orientations around the feature, then selects a primary orientation of gradient vector which corresponds to the maximum histogram bin....

    [...]

  • ...For this work, this is accomplished using a variant of the scale-invariant feature tracking (SIFT) algorithm developed by Lowe [33]....

    [...]

  • ...An example is the scale-invariant feature tracker (SIFT) method developed by Lowe [33]....

    [...]

  • ...For more information of the SIFT feature transformation algorithm, see [33], [34] and [27]....

    [...]

Proceedings ArticleDOI
01 Jun 2016
TL;DR: A novel and publicly available dataset acquired during actual driving that contains drivers' gaze fixations and their temporal integration providing task-specific saliency maps and can foster new discussions on better understanding, exploiting and reproducing the driver's attention process in the autonomous and assisted cars of future generations.
Abstract: Autonomous and assisted driving are undoubtedly hot topics in computer vision. However, the driving task is extremely complex and a deep understanding of drivers' behavior is still lacking. Several researchers are now investigating the attention mechanism in order to define computational models for detecting salient and interesting objects in the scene. Nevertheless, most of these models only refer to bottom up visual saliency and are focused on still images. Instead, during the driving experience the temporal nature and peculiarity of the task influence the attention mechanisms, leading to the conclusion that real life driving data is mandatory. In this paper we propose a novel and publicly available dataset acquired during actual driving. Our dataset, composed by more than 500,000 frames, contains drivers' gaze fixations and their temporal integration providing task-specific saliency maps. Geo-referenced locations, driving speed and course complete the set of released data. To the best of our knowledge, this is the first publicly available dataset of this kind and can foster new discussions on better understanding, exploiting and reproducing the driver's attention process in the autonomous and assisted cars of future generations.

103 citations


Cites methods from "Object recognition from local scale..."

  • ...To estimate this transformation, Scale Invariant Feature Transform (SIFT) keypoints are extracted from the two frames [18] and a first, tentative nearest-neighbor matching is performed....

    [...]

Journal ArticleDOI
TL;DR: A new method for rapid 3D object indexing that combines feature-based methods with coarse alignment-based matching techniques is proposed, achieving a sublinear complexity on the number of models and maintaining at the same time a high degree of performance for real 3D sensed data that is acquired in largely uncontrolled settings.
Abstract: We propose a new method for rapid 3D object indexing that combines feature-based methods with coarse alignment-based matching techniques. Our approach achieves a sublinear complexity on the number of models, maintaining at the same time a high degree of performance for real 3D sensed data that is acquired in largely uncontrolled settings. The key component of our method is to first index surface descriptors computed at salient locations from the scene into the whole model database using the locality sensitive hashing (LSH), a probabilistic approximate nearest neighbor method. Progressively complex geometric constraints are subsequently enforced to further prune the initial candidates and eliminate false correspondences due to inaccuracies in the surface descriptors and the errors of the LSH algorithm. The indexed models are selected based on the MAP rule using posterior probability of the models estimated in the joint 3D-signature space. Experiments with real 3D data employing a large database of vehicles, most of them very similar in shape, containing 1,000,000 features from more than 365 models demonstrate a high degree of performance in the presence of occlusion and obscuration, unmodeled vehicle interiors and part articulations, with an average processing time between 50 and 100 seconds per query

103 citations


Cites methods from "Object recognition from local scale..."

  • ...Since, in our case, the number of possible pose candidate can be OðQ2Þ for each pair of scene features, we employ importance sampling of matches based on a similarity measure in feature space and we progressively enforce geometric constraints between the descriptors base points in order to retain…...

    [...]

References
More filters
Journal ArticleDOI
TL;DR: In this paper, color histograms of multicolored objects provide a robust, efficient cue for indexing into a large database of models, and they can differentiate among a large number of objects.
Abstract: Computer vision is moving into a new era in which the aim is to develop visual skills for robots that allow them to interact with a dynamic, unconstrained environment. To achieve this aim, new kinds of vision algorithms need to be developed which run in real time and subserve the robot's goals. Two fundamental goals are determining the identity of an object with a known location, and determining the location of a known object. Color can be successfully used for both tasks. This dissertation demonstrates that color histograms of multicolored objects provide a robust, efficient cue for indexing into a large database of models. It shows that color histograms are stable object representations in the presence of occlusion and over change in view, and that they can differentiate among a large number of objects. For solving the identification problem, it introduces a technique called Histogram Intersection, which matches model and image histograms and a fast incremental version of Histogram Intersection which allows real-time indexing into a large database of stored models. It demonstrates techniques for dealing with crowded scenes and with models with similar color signatures. For solving the location problem it introduces an algorithm called Histogram Backprojection which performs this task efficiently in crowded scenes.

5,672 citations

Journal ArticleDOI
TL;DR: It is shown how the boundaries of an arbitrary non-analytic shape can be used to construct a mapping between image space and Hough transform space, which makes the generalized Houghtransform a kind of universal transform which can beused to find arbitrarily complex shapes.

4,310 citations

Journal ArticleDOI
TL;DR: A near real-time recognition system with 20 complex objects in the database has been developed and a compact representation of object appearance is proposed that is parametrized by pose and illumination.
Abstract: The problem of automatically learning object models for recognition and pose estimation is addressed. In contrast to the traditional approach, the recognition problem is formulated as one of matching appearance rather than shape. The appearance of an object in a two-dimensional image depends on its shape, reflectance properties, pose in the scene, and the illumination conditions. While shape and reflectance are intrinsic properties and constant for a rigid object, pose and illumination vary from scene to scene. A compact representation of object appearance is proposed that is parametrized by pose and illumination. For each object of interest, a large set of images is obtained by automatically varying pose and illumination. This image set is compressed to obtain a low-dimensional subspace, called the eigenspace, in which the object is represented as a manifold. Given an unknown input image, the recognition system projects the image to eigenspace. The object is recognized based on the manifold it lies on. The exact position of the projection on the manifold determines the object's pose in the image. A variety of experiments are conducted using objects with complex appearance characteristics. The performance of the recognition and pose estimation algorithms is studied using over a thousand input images of sample objects. Sensitivity of recognition to the number of eigenspace dimensions and the number of learning samples is analyzed. For the objects used, appearance representation in eigenspaces with less than 20 dimensions produces accurate recognition results with an average pose estimation error of about 1.0 degree. A near real-time recognition system with 20 complex objects in the database has been developed. The paper is concluded with a discussion on various issues related to the proposed learning and recognition methodology.

2,037 citations

Journal ArticleDOI
TL;DR: This paper addresses the problem of retrieving images from large image databases with a method based on local grayvalue invariants which are computed at automatically detected interest points and allows for efficient retrieval from a database of more than 1,000 images.
Abstract: This paper addresses the problem of retrieving images from large image databases. The method is based on local grayvalue invariants which are computed at automatically detected interest points. A voting algorithm and semilocal constraints make retrieval possible. Indexing allows for efficient retrieval from a database of more than 1,000 images. Experimental results show correct retrieval in the case of partial visibility, similarity transformations, extraneous features, and small perspective deformations.

1,756 citations


"Object recognition from local scale..." refers background or methods in this paper

  • ...This allows for the use of more distinctive image descriptors than the rotation-invariant ones used by Schmid and Mohr, and the descriptor is further modified to improve its stability to changes in affine projection and illumination....

    [...]

  • ...For the object recognition problem, Schmid & Mohr [19] also used the Harris corner detector to identify interest points, and then created a local image descriptor at each interest point from an orientation-invariant vector of derivative-of-Gaussian image measurements....

    [...]

  • ..., Schmid & Mohr [19]) has shown that efficient recognition can often be achieved by using local image descriptors sampled at a large number of repeatable locations....

    [...]

  • ...However, recent research on the use of dense local features (e.g., Schmid & Mohr [19]) has shown that efficient recognition can often be achieved by using local image descriptors sampled at a large number of repeatable locations....

    [...]

Journal ArticleDOI
TL;DR: A robust approach to image matching by exploiting the only available geometric constraint, namely, the epipolar constraint, is proposed and a new strategy for updating matches is developed, which only selects those matches having both high matching support and low matching ambiguity.

1,574 citations


"Object recognition from local scale..." refers methods in this paper

  • ...[23] used the Harris corner detector to identify feature locations for epipolar alignment of images taken from differing viewpoints....

    [...]