scispace - formally typeset
Search or ask a question

Showing papers on "3D single-object recognition published in 1999"


Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations


Journal ArticleDOI
TL;DR: In this paper, a 3D shape-based object recognition system for simultaneous recognition of multiple objects in scenes containing clutter and occlusion is presented, which is based on matching surfaces by matching points using the spin image representation.
Abstract: We present a 3D shape-based object recognition system for simultaneous recognition of multiple objects in scenes containing clutter and occlusion. Recognition is based on matching surfaces by matching points using the spin image representation. The spin image is a data level shape descriptor that is used to match surfaces represented as surface meshes. We present a compression scheme for spin images that results in efficient multiple object recognition which we verify with results showing the simultaneous recognition of multiple objects from a library of 20 models. Furthermore, we demonstrate the robust performance of recognition in the presence of clutter and occlusion through analysis of recognition trials on 100 scenes.

2,798 citations


Book
31 Aug 1999
TL;DR: Pattern Recognition, Cluster Analysis for Object Data, Classifier Design, and Image Processing and Computer Vision are studied.
Abstract: Pattern Recognition.- Cluster Analysis for Object Data.- Cluster Analysis for Relational Data.- Classifier Design.- Image Processing and Computer Vision.

1,133 citations


Book ChapterDOI
01 Jan 1999
TL;DR: This paper attempts to show that for recognizing simple objects with high shape variability such as handwritten characters, it is possible, and even advantageous, to feed the system directly with minimally processed images and to rely on learning to extract the right set of features.
Abstract: Finding an appropriate set of features is an essential problem in the design of shape recognition systems. This paper attempts to show that for recognizing simple objects with high shape variability such as handwritten characters, it is possible, and even advantageous, to feed the system directly with minimally processed images and to rely on learning to extract the right set of features. Convolutional Neural Networks are shown to be particularly well suited to this task. We also show that these networks can be used to recognize multiple objects without requiring explicit segmentation of the objects from their surrounding. The second part of the paper presents the Graph Transformer Network model which extends the applicability of gradient-based learning to systems that use graphs to represents features, objects, and their combinations.

863 citations


Patent
12 Apr 1999
TL;DR: In this article, an image processing technique based on model graphs and bunch graphs that efficiently represent image features as jets is described. And the jets are composed of wavelet transforms and are processed at nodes or landmark locations on an image corresponding to readily identifiable features.
Abstract: The present invention is embodied in an apparatus, and related method, for detecting and recognizing an object in an image frame. The object may be, for example, a head having particular facial characteristics. The object detection process uses robust and computationally efficient techniques. The object identification and recognition process uses an image processing technique based on model graphs and bunch graphs that efficiently represent image features as jets. The jets are composed of wavelet transforms and are processed at nodes or landmark locations on an image corresponding to readily identifiable features. The system of the invention is particularly advantageous for recognizing a person over a wide variety of pose angles.

379 citations


Book
03 Apr 1999
TL;DR: In this paper, the most viable model of object recognition may be one that incorporates the most appealing aspects of both image-based and structural-description theories, as well as some of their computational advantages and limitations.
Abstract: Theories of visual object recognition must solve the problem of recognizing 3D objects given that perceivers only receive 2D patterns of light on their retinae. Recent findings from human psychophysics, neurophysiology and machine vision provide converging evidence for ‘image-based’ models in which objects are represented as collections of viewpoint-specific local features. This approach is contrasted with ‘structural-description’ models in which objects are represented as configurations of 3D volumes or parts. We then review recent behavioral results that address the biological plausibility of both approaches, as well as some of their computational advantages and limitations. We conclude that, although the image-based approach holds great promise, it has potential pitfalls that may be best overcome by including structural information. Thus, the most viable model of object recognition may be one that incorporates the most appealing aspects of both image-based and structural-description theories.  1998 Elsevier Science B.V. All rights reserved

337 citations


Proceedings ArticleDOI
01 Jan 1999
TL;DR: This work introduces a framework for recognizing actions and objects by measuring image-, object- and action-based information from video, which is appropriate for locating and classifying objects under a variety of conditions including full occlusion.
Abstract: Our goal is to exploit human motion and object context to perform action recognition and object classification. Towards this end, we introduce a framework for recognizing actions and objects by measuring image-, object- and action-based information from video. Hidden Markov models are combined with object context to classify hand actions, which are aggregated by a Bayesian classifier to summarize activities. We also use Bayesian methods to differentiate the class of unknown objects by evaluating detected actions along with low-level, extracted object features. Our approach is appropriate for locating and classifying objects under a variety of conditions including full occlusion. We show experiments where both familiar and previously unseen objects are recognized using action and context information.

319 citations


BookDOI
01 Jan 1999
TL;DR: An Empirical-Statistical Agenda for Recognition and a Cooperating Strategy for Objects Recognition are presented.
Abstract: An Empirical-Statistical Agenda for Recognition.- A Formal-Physical Agenda for Recognition.- Shape.- Shape Models and Object Recognition.- Order Structure, Correspondence, and Shape Based Categories.- Quasi-Invariant Parameterisations and Their Applications in Computer Vision.- Shading.- Representations for Recognition Under Variable Illumination.- Shadows, Shading, and Projective Ambiguity.- Grouping.- Grouping in the Normalized Cut Framework.- Geometric Grouping of Repeated Elements within Images.- Constrained Symmetry for Change Detection.- Grouping Based on Coupled Diffusion Maps.- Representation and Recognition.- Integrating Geometric and Photometric Information for Image Retrieval.- Towards the Integration of Geometric and Appearance-Based Object Recognition.- Recognizing Objects Using Color-Annotated Adjacency Graphs.- A Cooperating Strategy for Objects Recognition.- Statistics, Learning and Recognition.- Model Selection for Two View Geometry:A Review.- Finding Objects by Grouping Primitives.- Object Recognition with Gradient-Based Learning.

223 citations


Journal Article
Z.M. Hefed1
TL;DR: Object tracking means tracing the progress of objects (or object features) as they move about in a visual scene, which involves processing spatial and temporal changes.
Abstract: Object tracking means tracing the progress of objects (or object features) as they move about in a visual scene. It involves processing spatial and temporal changes. Some approaches are discussed together with applications and challenges.

194 citations


Patent
30 Nov 1999
TL;DR: Watermarks and related machine-readable coding techniques are used to embed data within the information content on object surfaces as mentioned in this paper, which may be used as a substitute for (or in combination with) standard machinereadable coding methods such as bar codes, magnetic stripes, etc.
Abstract: Watermarks and related machine-readable coding techniques are used to embed data within the information content on object surfaces. These techniques may be used as a substitute for (or in combination with) standard machine-readable coding methods such as bar codes, magnetic stripes, etc. As such, the coding techniques extend to many applications, such as linking objects with network resources, retail point of sale applications, object tracking and counting, production control, object sorting, etc. Object message data, including information about the object, machine instructions, or an index, may be hidden in the surface media of the object. An object messaging system includes an embedder and reader. The embedder converts an object message to an object reference, and encodes this reference in a watermarked signal applied to the object. The reader detects the presence of a watermark and decodes the watermark signal to extract the object reference.

153 citations


Journal ArticleDOI
TL;DR: A system for real-time object recognition and tracking for remote video surveillance with a unique feature, i.e., the statistical morphological skeleton, which achieves low computational complexity, accuracy of localization, and noise robustness has been presented.
Abstract: A system for real-time object recognition and tracking for remote video surveillance is presented. In order to meet real-time requirements, a unique feature, i.e., the statistical morphological skeleton, which achieves low computational complexity, accuracy of localization, and noise robustness has been considered for both object recognition and tracking. Recognition is obtained by comparing an analytical approximation of the skeleton function extracted from the analyzed image with that obtained from model objects stored into a database. Tracking is performed by applying an extended Kalman filter to a set of observable quantities derived from the detected skeleton and other geometric characteristics of the moving object. Several experiments are shown to illustrate the validity of the proposed method and to demonstrate its usefulness in video-based applications.

Patent
07 Sep 1999
TL;DR: In this article, a controller halts or reverses the movement of a power window to prevent trapping an object between the closing window and the window frame, and another field of view includes a vehicle entry point such as a vehicle door.
Abstract: An object detection system includes a sensor in communication with a controller to identify an object within a field of view. Pattern recognition algorithms are incorporated into the controller, and particular objects are predefined to minimize false detection and sift predefined objects such as vehicles from background clutter. Upon recognition of the object, an indicator in communication with the controller provides an alert to the operator who can then take corrective action. By defining another field of view the controller halts or reverses the movement of a power window to prevent trapping an object between the closing window and the window frame. Yet another field of view includes a vehicle entry point such as a vehicle door. Movement of the vehicle door will be identified by the controller and will provide an alert such as activation of the vehicle alarm system.

Proceedings ArticleDOI
01 Jan 1999
TL;DR: This work reports high correct classification of unseen views, especially considering that no domain knowledge is including into the proposed system, and suggests an active learning algorithm to reduce further the required number of training views.
Abstract: Support vector machines have demonstrated excellent results in pattern recognition tasks and 3D object recognition. We confirm some of the results in 3D object recognition and compare it to other object recognition systems. We use different pixel-level representations to perform the experiments, while we extend the setting to the more challenging and practical case when only a limited number of views of the object are presented during training. We report high correct classification of unseen views, especially considering that no domain knowledge is including into the proposed system. Finally, we suggest an active learning algorithm to reduce further the required number of training views.

Book
03 Apr 1999
TL;DR: The view-combination approach as mentioned in this paper combines a small number of object views to deal with the effects of viewing direction, distance, and illumination for visual object recognition, and it is shown that the approach can use views of different class members rather than multiple views of a single object to obtain class-based generalization.
Abstract: Visual object recognition is complicated by the fact that the same 3D object can give rise to a large variety of projected images that depend on the viewing conditions, such as viewing direction, distance, and illumination. This paper describes a computational approach that uses combinations of a small number of object views to deal with the effects of viewing direction. The first part of the paper is an overview of the approach based on previous work. It is then shown that, in agreement with psychophysical evidence, the view-combinations approach can use views of different class members rather than multiple views of a single object, to obtain class-based generalization. A number of extensions to the basic scheme are considered, including the use of non-linear combinations, using 3D versus 2D information, and the role of coarse classification on the way to precise identification. Finally, psychophysical and biological aspects of the view-combination approach are discussed. Compared with approaches that treat object recognition as a symbolic high-level activity, in the view-combination approach the emphasis is on processes that are simpler and pictorial in nature. © 1998 Elsevier Science B.V. All rights reserved

Patent
23 Dec 1999
TL;DR: An apparatus and method for identifying and classifying objects represented in computed tomography (CT) (400) data for a region using the shape of the objects are disclosed as discussed by the authors.
Abstract: An apparatus and method for identifying and classifying objects represented in computed tomography (CT) (400) data for a region using the shape (402) of the objects are disclosed. An object represented by CT data for a region is identified (404). Next, a two-dimensional projection of the object along a principal axis of the object is generated (415). The principal axis is identified by computing eigenvectors of a covariance matrix of spatial locations of voxels in the CT data that are associated with the object (408). The smallest eigenvector can be selected as the principal axis of the object along which the two-dimensional projection is generated (412). The identification of the object can be used in the classification of the object such as by altering one or more discrimination parameters (416).

Journal ArticleDOI
TL;DR: In this article, the human visual system's ability to achieve object constancy across plane rotation and depth rotation was investigated, which suggests that multiple, view-specific, stored representations of familiar objects are accessed in everyday, entry-level visual recognition.

Book ChapterDOI
13 Jan 1999
TL;DR: A theoretically sound method for constructing object recognition strategies by casting object recognition as a Markov Decision Problem (MDP) is presented, resulting in a system called ADORE (Adaptive Object Recognition) that automatically learns object recognition control policies from training data.
Abstract: Many modern computer vision systems are built by chaining together standard vision procedures, often in graphical programming environments such as Khoros, CVIPtools or IUE. Typically, these procedures are selected and sequenced by an ad-hoc combination of programmer's intuition and trial-and-error. This paper presents a theoretically sound method for constructing object recognition strategies by casting object recognition as a Markov Decision Problem (MDP). The result is a system called ADORE (Adaptive Object Recognition) that automatically learns object recognition control policies from training data. Experimental results are presented in which ADORE is trained to recognize five types of houses in aerial images, and where its performance can be (and is) compared to optimal.

Journal ArticleDOI
TL;DR: An automatic method for three-dimensional (3-D) shape recognition that combines the Fourier transform profilometry technique with a real-time recognition setup such as the joint transform correlator (JTC).
Abstract: An automatic method for three-dimensional (3-D) shape recognition is proposed. It combines the Fourier transform profilometry technique with a real-time recognition setup such as the joint transform correlator (JTC). A grating is projected onto the object surface resulting in a distorted grating pattern. Since this pattern carries information about the depth and the shape of the object, their comparison provides a method for recognizing 3-D objects in real time. A two-cycle JTC is used for this purpose. Experimental results demonstrate the theory and show the utility of the new proposed method.

Patent
09 Aug 1999
TL;DR: In this article, an image comparison and retrieval system compares objects and object clusters, or images, and an object characterization parameter is computed based on certain object pixel values that characterize one or more aspects of the defined object.
Abstract: An image comparison and retrieval system compares objects and object clusters, or images. User controlled or automatic filtering to enhance object features is performed, and object pixels are defined. An object characterization parameter is computed based on certain object pixel values that characterize one or more aspects of the defined object. Object characterization parameters are compared for objects in different images to produce a measure of object similarity in the same or different images. The query image may be substantially continuously displayed during the image filtering and object definition processes.

Journal ArticleDOI
TL;DR: This work uses an appearance based object representation, namely the parametric eigenspace, but the planning algorithm is actually independent of the details of the specific object recognition environment, so that the probabilistic implementation always outperforms the other approaches.
Abstract: One major goal of active object recognition systems is to extract useful information from multiple measurements. We compare three frameworks for information fusion and view-planning using different uncertainty calculi: probability theory, possibility theory and Dempster-Shafer theory of evidence. The system dynamically repositions the camera to capture additional views in order to improve the classification result obtained from a single view. The active recognition problem can be tackled successfully by all the considered approaches with sometimes only slight differences in performance. Extensive experiments confirm that recognition rates can be improved considerably by performing active steps. Random selection of the next action is much less efficient than planning, both in recognition rate and in the average number of steps required for recognition. As long as the rate of wrong object-pose classifications stays low the probabilistic implementation always outperforms the other approaches. If the outlier rate increases averaging fusion schemes outperform conjunctive approaches for information integration. We use an appearance based object representation, namely the parametric eigenspace, but the planning algorithm is actually independent of the details of the specific object recognition environment.

Proceedings ArticleDOI
23 Jun 1999
TL;DR: The appearance-based recognition scheme is extended to handle range (shape) data and a set of 'eigensurfaces' that capture the gross shape of the objects are trained.
Abstract: Much of the recent research in object recognition has adopted an appearance-based scheme, wherein objects to be recognized are represented as a collection of prototypes in a multidimensional space spanned by a number of characteristic vectors (eigen-images) obtained from training views. In this paper, we extend the appearance-based recognition scheme to handle range (shape) data. The result of training is a set of 'eigensurfaces' that capture the gross shape of the objects. These techniques are used to form a system that recognizes objects under an arbitrary rotational pose transformation. The system has been tested on a 20 object database including free-form objects and a 54 object database of manufactured parts. Experiments with the system point out advantages and also highlight challenges that must be studied in future research.

Proceedings ArticleDOI
04 Oct 1999
TL;DR: Extensions to the basic matching algorithm are described which will enable it to address several challenging and often overlooked problems encountered with real data, and which facilitate the use of 3D object recognition in cases in which the scene contains a large amount of clutter.
Abstract: We report on recent extensions to a surface matching algorithm based on local 3D signatures. This algorithm was previously shown to be effective in view registration of general surfaces and in object recognition from 3D model databases. We describe extensions to the basic matching algorithm which will enable it to address several challenging and often overlooked problems encountered with real data. First, we describe extensions that allow us to deal with data sets with large variations in resolution and with large data sets for which computational efficiency is a major issue. The applicability of the enhanced matching algorithm is illustrated by an example application: the construction of large terrain maps and the construction of accurate 3D models from unregistered views. Second, we describe extensions that facilitate the use of 3D object recognition in cases in which the scene contains a large amount of clutter (e.g., the object occupies 1% of the scene) and in which the scene presents a high degree of confusion (e.g., the model shape is close to other shapes in the scene). Those last two extensions involve learning recognition strategies from the description of the model and from the performance of the recognition algorithm using Bayesian and memory based learning techniques, respectively.

Proceedings ArticleDOI
Hiroshi Tanaka1, K. Nakajima, K. Ishigaki, K. Akiyama, Masaki Nakagawa 
20 Sep 1999
TL;DR: A hybrid handwritten character recognition system in which the recognition results of the offline and online recognizer are integrated to create an improved product.
Abstract: Describes a handwritten character recognition system that integrates offline recognition requiring a bitmap image and online recognition involving an input pattern as a sequence of x-y coordinates. Offline recognition performs well for painted or overwritten patterns (for which online recognition would not be suited), whereas online recognition is suitable for very deformed patterns (for which offline recognition is not suited). Because each method has different recognition capabilities, the methods complement each other when integrated together. We have implemented a hybrid handwritten character recognition system in which the recognition results of the offline and online recognizer are integrated to create an improved product. After testing several integration methods for a handwritten character database, we found that the best method increased the recognition rate from 73.8% (offline) and 84.8% (online) to 87.6% (integrated).

Proceedings ArticleDOI
24 Oct 1999
TL;DR: This paper presents an advanced face recognition system that is based on the use of Pseudo 2-D HMMs and coefficients of the2-D DCT as features that works directly with JPEG-compressed face images, without any necessity of completely decompressing the image before recognition.
Abstract: This paper presents an advanced face recognition system that is based on the use of Pseudo 2-D HMMs and coefficients of the 2-D DCT as features. A major advantage of our approach is the fact that our face recognition system works directly with JPEG-compressed face images, i.e. it uses directly the DCT-features provided by the JPEG standard, without any necessity of completely decompressing the image before recognition. The recognition rates on the Olivetti Research Laboratory (ORL) face database are 100% for the original images and 99.5% for JPEG compressed domain recognition. A comparison with other face recognition systems evaluated on the ORL database, shows that these are the best recognition results on this database.

Patent
28 Jan 1999
TL;DR: In this paper, the user can select one or more of a plurality of recognition constraints which temporarily modify the default recognition parameters to decode uncharacteristic and/or special data, enabling the recognition engine to utilize specific information to decode the special data.
Abstract: A data recognition system and method which allows a user to select between a “default recognition” mode and a “constrained recognition” mode via a user interface. In the default recognition mode, a recognition engine utilizes predetermined default recognition parameters to decode data (e.g., handwriting and speech). In the constrained recognition mode, the user can select one or more of a plurality of recognition constraints which temporarily modify the default recognition parameters to decode uncharacteristic and/or special data. The recognition parameters associated with the selected constraint enable the recognition engine to utilize specific information to decode the special data, thereby providing increased recognition accuracy.

Proceedings ArticleDOI
20 Sep 1999
TL;DR: This paper presents an approach to object detection which is based on recent work in statistical models for texture synthesis and recognition, and presents promising results in applying the technique to face detection and car detection.
Abstract: This paper presents an approach to object detection which is based on recent work in statistical models for texture synthesis and recognition. Our method follows the texture recognition work of De Bonet and Viola (1998). We use feature vectors which capture the joint occurrence of local features at multiple resolutions. The distribution of feature vectors for a set of training images of an object class is estimated by clustering the data and then forming a mixture of Gaussian models. The mixture model is further refined by determining which clusters are the most discriminative for the class and retaining only those clusters. After the model is learned, test images are classified by computing the likelihood of their feature vectors with respect to the model. We present promising results in applying our technique to face detection and car detection.

Proceedings ArticleDOI
23 Jun 1999
TL;DR: An algorithm for constructing object representations suitable for recognition by automatically selecting a representative subset of the views of the object while constructing the eigenspace basis is presented.
Abstract: This paper presents an algorithm for constructing object representations suitable for recognition. The system automatically selects a representative subset of the views of the object while constructing the eigenspace basis. These views are actively located for object identification and pose determination. All processing is performed on-line. The camera is actively positioned during both representation and recognition. When tested with 240 views for each of seven objects, the system achieves 100% accurate object recognition and pose determination. These results are shown to degrade gracefully as conditions deteriorate.

Journal ArticleDOI
TL;DR: A real time 2D object recognition algorithm is proposed that decomposed in the Fourier domain as linear combination of a set of representative objects, identified by multilevel clustering.

Journal ArticleDOI
TL;DR: A system that represents input shapes by their similarities to several prototypical objects is described, and it is shown that it can recognize new views of the familiar objects, discriminate among views of previously unseen shapes, and attribute the latter to familiar categories.
Abstract: One of the difficulties of object recognition stems from the need to overcome the variability in object appearance caused by pose and other factors, such as illumination The influence of these factors can be countered by learning to interpolate between stored views of the target object, taken under representative combinations of viewing conditions Difficulties of another kind arise in daily life situations that require categorization, rather than recognition, of objects Although categorization cannot rely on interpolation between stored examples, we show that knowledge of several representative members, or prototypes, of each of the categories of interest can provide the necessary computational substrate for the categorization of new instances We describe a system that represents input shapes by their similarities to several prototypical objects, and show that it can recognize new views of the familiar objects, discriminate among views of previously unseen shapes, and attribute the latter to familiar categories

Patent
09 Mar 1999
TL;DR: In this article, a modeless large vocabulary continuous speech recognition system is provided that represents an input utterance as a sequence of input vectors, and the system includes a common library of acoustic model states for arrangement in sequences that form acoustic models.
Abstract: A modeless large vocabulary continuous speech recognition system is provided that represents an input utterance as a sequence of input vectors. The system includes a common library of acoustic model states for arrangement in sequences that form acoustic models. Each acoustic model is composed of a sequence of segment models and each segment model is composed of a sequence of model states. An input processor compares each vector in a sequence of input vectors to a set of model states in the common library to produce a match score for each model state in the set, reflecting the likelihood that a state is represented by a vector. The system also includes a plurality of recognition modules and associated recognition grammars. The recognition modules operate in parallel and use the match scores with the acoustic models to determine at least one recognition result in each of the recognition modules. The recognition modules includes a dictation module for producing at least one probable dictation recognition result, a select module for recognizing a portion of visually displayed text for processing with a command, and a command module for producing at least one probable command recognition result. An arbitrator uses an arbitration algorithm and a score ordered queue of recognition results, together with their associated recognition modules, to compare the recognition results of the recognition modules to select at least one system recognition result.