scispace - formally typeset
Search or ask a question

Showing papers on "3D single-object recognition published in 2000"


Journal ArticleDOI
TL;DR: This article presents a technique where appearances of objects are represented by the joint statistics of such local neighborhood operators, which represents a new class of appearance based techniques for computer vision.
Abstract: The appearance of an object is composed of local structure. This local structure can be described and characterized by a vector of local features measured by local operators such as Gaussian derivatives or Gabor filters. This article presents a technique where appearances of objects are represented by the joint statistics of such local neighborhood operators. As such, this represents a new class of appearance based techniques for computer vision. Based on joint statistics, the paper develops techniques for the identification of multiple objects at arbitrary positions and orientations in a cluttered scene. Experiments show that these techniques can identify over 100 objects in the presence of major occlusions. Most remarkably, the techniques have low complexity and therefore run in real-time.

480 citations


Journal ArticleDOI
TL;DR: A technique for performing three-dimensional pattern recognition by use of in-line digital holography, where the complex amplitude distribution generated by a 3D object at an arbitrary plane located in the Fresnel diffraction region is recorded by phase-shifting interferometry.
Abstract: We present a technique for performing three-dimensional (3D) pattern recognition by use of in-line digital holography. The complex amplitude distribution generated by a 3D object at an arbitrary plane located in the Fresnel diffraction region is recorded by phase-shifting interferometry. The digital hologram contains information about the 3D object's shape, location, and orientation. This information allows us to perform 3D pattern-recognition techniques with high discrimination and to measure 3D orientation changes. Experimental results are presented.

394 citations


Patent
18 Oct 2000
TL;DR: In this article, an object recognition system including a position sensor, an image sensor and a controller is provided, where the position sensor determines the position of an object, and the image sensor captures an image of the object.
Abstract: An object recognition system including a position sensor, an image sensor and a controller is provided. The position sensor determines the position of an object, and the image sensor captures an image of the object. The controller sets a processing area within the image captured by the image sensor based on the position of the object determined by the position sensor and a predetermined size for the object to be recognized. The controller extracts horizontal edges from the processing area, and identifies horizontal edges belonging to the outline of the object from the extracted edges. Thus, the object can be recognized based on the horizontal edges only. Similarly, the controller can extract vertical edges from the processing area and identify vertical edges belonging to the outline of the object. Preferably, the controller selects upper, lower, left, and right horizontal and vertical candidate ends of the object from the identified horizontal and vertical edges respectively, from which upper, lower, left, and right ends of the object are determined. If either one or both of the left and right horizontal candidate ends cannot be selected, candidate ends can be estimated based on the position of the object recognized in a previous recognition cycle, and the estimated candidate ends can be used in lieu of the candidate ends selected from the horizontal edges.

213 citations


Journal ArticleDOI
TL;DR: This paper defines the recognition process as a sequential decision problem with the objective to disambiguate initial object hypotheses, and Reinforcement learning provides an efficient method to autonomously develop near-optimal decision strategies in terms of sensorimotor mappings.

155 citations


Proceedings ArticleDOI
01 Jan 2000
TL;DR: The SNoW-based method is shown to outperform other methods in terms of recognition rates; its performance degrades gracefully when the training data contains fewer views and in the presence of occlusion noise.
Abstract: A learning account for the problem of object recognition is developed within the PAC (Probably Approximately Correct) model of learnability The proposed approach makes no assumptions on the distribution of the observed objects, but quantifies success relative to its past experience Most importantly, the success of learning an object representation is naturally tied to the ability to represent at as a function of some intermediate representations extracted from the image We evaluate this approach an a large scale experimental study in which the SNoW learning architecture is used to learn representations for the 100 objects in the Columbia Object Image Database (COIL-100) The SNoW-based method is shown to outperform other methods in terms of recognition rates; its performance degrades gracefully when the training data contains fewer views and in the presence of occlusion noise

140 citations


Journal ArticleDOI
01 Nov 2000
TL;DR: A general approach to image segmentation and object recognition that can adapt the image segmentsation algorithm parameters to the changing environmental conditions is presented and the performance improvement over time is shown.
Abstract: The paper presents a general approach to image segmentation and object recognition that can adapt the image segmentation algorithm parameters to the changing environmental conditions. Segmentation parameters are represented by a team of generalized stochastic learning automata and learned using connectionist reinforcement learning techniques. The edge-border coincidence measure is first used as reinforcement for segmentation evaluation to reduce computational expenses associated with model matching during the early stage of adaptation. This measure alone, however, cannot reliably predict the outcome of object recognition. Therefore, it is used in conjunction with model matching where the matching confidence is used as a reinforcement signal to provide optimal segmentation evaluation in a closed-loop object recognition system. The adaptation alternates between global and local segmentation processes in order to achieve optimal recognition performance. Results are presented for both indoor and outdoor color images where the performance improvement over time is shown for both image segmentation and object recognition.

138 citations


Patent
16 Feb 2000
TL;DR: In this article, a system and method for referencing object instances of an application program and invoking methods on those object instances from within a recognition grammar is presented, where a mapping is maintained between at least one string formed using characters in the character set of the recognition grammar and instances of objects in the application program.
Abstract: A system and method for referencing object instances of an application program, and invoking methods on those object instances from within a recognition grammar. A mapping is maintained between at least one string formed using characters in the character set of the recognition grammar and instances of objects in the application program. During operation of the disclosed system, when either the application program or script within a recognition grammar creates an application object instance, a reference to the object instance is added to the mapping table, together with an associated unique string. The unique string may then be used within scripting language in tags of the rule grammar, in order to refer to the object instance that has been “registered” by the application program in this way. A tags parser program may be used to interpret such object instance names while interpreting the scripting language contained in tags included in a recognition result object. The tags parser program calls the methods on such object instances directly, eliminating the need for logic in the application program to make such calls in response to the result tag information.

134 citations


Patent
02 Mar 2000
TL;DR: In this article, a method for obtaining an object image of at least one object (40) was proposed, whereby at least two partial images of the object were recorded under different conditions for each image.
Abstract: The invention relates to a method for obtaining an object image of at least one object (40), whereby at least two partial images of the object (40) are recorded under different conditions for each image. Said conditions take the form of spatial patterns on the object, whereby for each point on the object there is a non-linear dependency of the light which is detected from the direction of said object point on the object conditions which exist at said point on the object and whereby the partial images contain varying amounts of different space frequency components of the object structure. The desired object image is determined from the partial images by reconstructing the share of space frequency components. The invention also describes optical systems (100) for implementing such a method.

126 citations


Patent
17 Mar 2000
TL;DR: In this paper, a 3D object recognition method based on a stereo image of the object is proposed. But the method is not suitable for the detection of objects in the 3D space.
Abstract: A 3-dimensional object recognition method, by use of which three-dimensional position and posture of an object can be accurately recognized at high speed, comprises the steps of (A) taking a pair of first and second images for making a stereo image of the object; (B) detecting a two-dimensional feature of the object on each of the first and second images; (C) evaluating a degree of reliability of the result of the step (B) by comparing with a model data of the object; (D) making a correspondence of the two-dimensional feature between the first and second images according to a stereoscopic measurement principle; (E) evaluating a degree of reliability of the result of the step (D) by comparing the two-dimensional feature detected on the first image with the corresponding two-dimensional feature detected on the second image; (F) recognizing the three-dimensional position and posture of the object according to information in three dimensions of the two-dimensional feature obtained by the correspondence; and (G) evaluating a degree of reliability of the recognized three-dimensional position and posture. It is preferred to use the 3-dimensional object recognition method to a bin-picking system for picking up an article from a bin, in which a plurality of articles are heaped up in confusion, and carrying the picked-up article to a required position.

109 citations


Journal ArticleDOI
TL;DR: This work describes how to model the appearance of a 3-D object using multiple views, learn such a model from training images, and use the model for object recognition, and demonstrates that OLIVER is capable of learning to recognize complex objects in cluttered images, while acquiring models that represent those objects using relatively few views.
Abstract: We describe how to model the appearance of a 3-D object using multiple views, learn such a model from training images, and use the model for object recognition. The model uses probability distributions to describe the range of possible variation in the object's appearance. These distributions are organized on two levels. Large variations are handled by partitioning training images into clusters corresponding to distinctly different views of the object. Within each cluster, smaller variations are represented by distributions characterizing uncertainty in the presence, position, and measurements of various discrete features of appearance. Many types of features are used, ranging in abstraction from edge segments to perceptual groupings and regions. A matching procedure uses the feature uncertainty information to guide the search for a match between model and image. Hypothesized feature pairings are used to estimate a viewpoint transformation taking account of feature uncertainty. These methods have been implemented in an object recognition system, OLIVER. Experiments show that OLIVER is capable of learning to recognize complex objects in cluttered images, while acquiring models that represent those objects using relatively few views.

108 citations


Patent
Toshihiko Suzuki1, Takeo Kanade1
22 Mar 2000
TL;DR: In this paper, an Extended Kalman Filter is used for both the determination of motion and pose and for object recognition for a single camera in motion, where the optical flow parameters obtained from the photographed images are converted into physical parameters in three-dimensional space.
Abstract: Images are captured using a single camera in motion. A recognition process section detects a possible object in a photographed image, tracks the possible object within the moving image, and generates object shape information from the trajectory information. A motion and pose determination section determines camera motion and pose using the photographed images for recognition processing. The determined data are used for object recognition along with the tracking information. The motion and pose determination section converts the optical flow parameters obtained from the photographed images into physical parameters in three-dimensional space. An Extended Kalman Filter is used for both the determination of motion and pose and for object recognition.

Journal ArticleDOI
TL;DR: The present study assessed recognition across depth rotations of a single general class of novel objects in three contexts that varied in difficulty, suggesting differences in the geometry of stimulus objects lie at the heart of previously discrepant findings.
Abstract: In an attempt to reconcile results of previous studies, several theorists have suggested that object recognition performance should range from viewpoint invariant to highly viewpoint dependent depending on how easy it is to differentiate the objects in a given recognition situation. The present study assessed recognition across depth rotations of a single general class of novel objects in three contexts that varied in difficulty. In an initial experiment, recognition in the context involving the most discriminable object differences was viewpoint invariant, but recognition in the least discriminable context and recognition in the intermediate context were equally viewpoint dependent. In a second experiment, utilizing gray-scale versions of the same stimuli, almost identical viewpoint-cost functions were obtained in all three contexts. These results suggest that differences in the geometry of stimulus objects, rather than task difficulty, lie at the heart of previously discrepant findings.

Book ChapterDOI
TL;DR: This verification procedure provides a model for the serial process of attention in human vision that integrates features belonging to a single object that can achieve rapid and robust object recognition in cluttered partially-occluded images.
Abstract: There is considerable evidence that object recognition in primates is based on the detection of local image features of intermediate complexity that are largely invariant to imaging transformations. A computer vision system has been developed that performs object recognition using features with similar properties. Invariance to image translation, scale and rotation is achieved by first selecting stable key points in scale space and performing feature detection only at these locations. The features measure local image gradients in a manner modeled on the response of complex cells in primary visual cortex, and thereby obtain partial invariance to illumination, affine change, and other local distortions. The features are used as input to a nearest-neighbor indexing method and Hough transform that identify candidate object matches. Final verification of each match is achieved by finding a best-fit solution for the unknown model parameters and integrating the features consistent with these parameter values. This verification procedure provides a model for the serial process of attention in human vision that integrates features belonging to a single object. Experimental results show that this approach can achieve rapid and robust object recognition in cluttered partially-occluded images.

Proceedings ArticleDOI
13 Jun 2000
TL;DR: A framework for tracking rigid objects based on an adaptive Bayesian recognition technique that incorporates dependencies between object features and forms a natural feedback loop between the recognition method and the filter that helps to explain robustness.
Abstract: We present a framework for tracking rigid objects based on an adaptive Bayesian recognition technique that incorporates dependencies between object features. At each frame we find a maximum a posteriori (MAP) estimate of the object parameters that include positioning and configuration of non-occluded features. This estimate may be rejected based on its quality. Our careful selection of data points in each frame allows temporal fusion via Kalman filtering. Despite "unimodality" of our tracking scheme, we demonstrate fairly robust results in highly cluttered aerial scenes. Our technique forms a natural feedback loop between the recognition method and the filter that helps to explain such robustness. We study this loop and derive a number of interesting properties. First, the effective threshold for recognition in each frame is adaptive. It depends on the current level of noise in the system. This allows the system to identify partially occluded or distorted objects as long as the predicted locations are accurate. But requires a very good match if there is uncertainty as to the object location. Second, the search area for the recognition method is automatically pruned based on the current system uncertainty, yielding an efficient overall method.

Proceedings ArticleDOI
01 Sep 2000
TL;DR: An extended tangent distance is incorporated in a kernel density based Bayesian classifier to compensate for affine image variations and an image distortion model for local variations is introduced.
Abstract: Invariance is an important aspect in image object recognition. We present results obtained with an extended tangent distance incorporated in a kernel density based Bayesian classifier to compensate for affine image variations. An image distortion model for local variations is introduced and its relationship to tangent distance is considered. The proposed classification algorithms are evaluated on databases of different domains. An excellent result of 2.2% error rate on the original USPS handwritten digits recognition task is obtained. On a database of radiographs from daily routine, best results are obtained by combining the tangent distance and the proposed distortion model.

Journal ArticleDOI
TL;DR: Results suggest that face identification involves a coordinate shape representation in which the precise locations of visual primitives are specified, whereas basic-level object recognition uses categorically coded relations.
Abstract: The purpose of this investigation was to determine if the relations among the primitives used in face identification and in basic-level object recognition are represented using coordinate or categorical relations. In 2 experiments the authors used photographs of famous people's faces as stimuli in which each face had been altered to have either 1 of its eyes moved up from its normal position or both of its eyes moved up. Participants performed either a face identification task or a basic-level object recognition task with these stimuli. In the face identification task, 1-eye-moved faces were easier to recognize than 2-eyes-moved faces, whereas the basic-level object recognition task showed the opposite pattern of results. Results suggest that face identification involves a coordinate shape representation in which the precise locations of visual primitives are specified, whereas basic-level object recognition uses categorically coded relations.

Journal ArticleDOI
TL;DR: The object recognition system named RIO (relational indexing of objects), which contains a number of new techniques, is able to recognize 3D objects having planar, cylindrical, and threaded surfaces in complex, multiobject scenes.

Journal ArticleDOI
01 Feb 2000
TL;DR: A method for object recognition that is invariant under translation, rotation and scaling is addressed, in which vectors obtained in the pre-processing step are used as inputs to it by using a holographic nearest-neighbor algorithm.
Abstract: A method for object recognition that is invariant under translation, rotation and scaling is addressed. The first step of the method (pre-processing) takes into account the invariant properties of the normalized moment of inertia and a novel coding that extracts topological object characteristics. The second step (recognition) is achieved by using a holographic nearest-neighbor (HHN) algorithm, in which vectors obtained in the pre-processing step are used as inputs to it. The algorithm is tested in character recognition, using the 26 upper-case letters of the alphabet. Only four different orientations and one size (for each letter) were used for training. Recognition was tested with 17 different sizes and 14 rotations. The results are encouraging, since we achieved 98% correct recognition. Tolerance to boundary deformations and random noise was tested. Results for character recognition in "real" images of car plates are presented as well.

Journal ArticleDOI
TL;DR: The proposed method considers data distortion factors such as uncertainty, occlusion, and clutter, in addition to model similarity, unlike previous approaches, which consider only a subset of these factors.
Abstract: We present a method for predicting fundamental performance of object recognition. We assume that both scene data and model objects are represented by 2D point features and a data/model match is evaluated using a vote-based criterion. The proposed method considers data distortion factors such as uncertainty, occlusion, and clutter, in addition to model similarity. This is unlike previous approaches, which consider only a subset of these factors. Performance is predicted in two stages. In the first stage, the similarity between every pair of model objects is captured by comparing their structures as a function of the relative transformation between them. In the second stage, the similarity information is used along with statistical models of the data-distortion factors to determine an upper bound on the probability of recognition error. This bound is directly used to determine a lower bound on the probability of correct recognition. The validity of the method is experimentally demonstrated using real synthetic aperture radar (SAR) data.

Patent
31 Oct 2000
TL;DR: A gesture recognition process includes tracking an object in two frames of video, determining differences between locations of the object in one frame of the video and a location of the same object in another frame as discussed by the authors.
Abstract: A gesture recognition process includes tracking an object in two frames of video, determining differences between a location of the object in one frame of the video and a location of the object in another frame of the video, obtaining a direction of motion of the object based on the differences, and recognizing a gesture of the object based, at least in part, on the direction of motion of the object.

Journal ArticleDOI
TL;DR: A Bayesian realization of the proposed methodology for recognition of targets in second generation forward looking infrared images (FLIR) is presented, in which the expert modules represent the probability density functions of each part, modeled as a mixture of densities to incorporate different views (aspects) of each parts.


Journal ArticleDOI
01 Jan 2000
TL;DR: A new online recognition scheme based on next view planning for the identification of an isolated 3D object using simple features using a probabilistic reasoning framework for recognition and planning is presented.
Abstract: In many cases, a single view of an object may not contain sufficient features to recognize it unambiguously. This paper presents a new online recognition scheme based on next view planning for the identification of an isolated 3D object using simple features. The scheme uses a probabilistic reasoning framework for recognition and planning. Our knowledge representation scheme encodes feature based information about objects as well as the uncertainty in the recognition process. This is used both in the probability calculations as well as in planning the next view. Results clearly demonstrate the effectiveness of our strategy for a reasonably complex experimental set.

Proceedings ArticleDOI
01 Sep 2000
TL;DR: Experimental result of object recognition shows effectiveness of the proposed method for object recognition using appearance models accumulated into a RFID (radio frequency identification) tag attached to the environment.
Abstract: Proposes a method of object recognition using appearance models accumulated into a RFID (radio frequency identification) tag attached to the environment. Robots recognize the object using appearance models accumulated in the tag on the object. If the robot fails in recognition, it acquires a model of the object and accumulates it to the tag. Since robots in the environment observe the object from different points of view at different time, various appearance models are accumulated as time passes. In order to accumulate many models, eigenspace analysis is applied. The eigenspace is reconstructed every time robots acquire the model. Experimental result of object recognition shows effectiveness of the proposed method.

Proceedings ArticleDOI
01 Sep 2000
TL;DR: A recognition system that classifies four kinds of human interactions: shaking hands, pointing at the opposite person, standing hand-in-hand, and an intermediate/transitional state between them with no parsing procedure for sequential data is presented.
Abstract: This paper presents a recognition system that classifies four kinds of human interactions: shaking hands, pointing at the opposite person, standing hand-in-hand, and an intermediate/transitional state between them. Our system achieves recognition by applying the K-nearest neighbor classifier to the parametric human-interaction model, which describes the interpersonal configuration with multiple features from gray scale images (i.e., binary blob, silhouette contour, and intensity distribution). Unlike the algorithms that use temporal information about motion, our system independently classifies each frame by estimating the relative poses of the interacting persons. The system provides a tool to detect the initiation and the termination of an interaction with no parsing procedure for sequential data. Experimental results are presented and illustrated.

Proceedings ArticleDOI
24 Apr 2000
TL;DR: A robot system for navigation in unknown environments, in which the robot navigates itself to the room designated by room number, and utilizes the model for the efficient recognition of the objects and the estimation of their positions.
Abstract: Navigation in unknown environments requires the robot to obtain the destination positions without a map. The utilization of model-based object recognition would be a solution, where the robot can estimate the destination positions from geometric relationships between the recognized objects and the robot. This paper presents a robot system for this kind of navigation, in which the robot navigates itself to the room designated by room number. The robot has an environment model including a corridor and a door with a room number plate, and utilizes the model for the efficient recognition of the objects and the estimation of their positions.

Journal ArticleDOI
TL;DR: A hybrid algorithm for coarse-to-fine matching of affine-invariant object features and B-spline object curves, and simultaneous estimation of transformation parameters is presented.

Book
01 Jan 2000
TL;DR: This book presents a selection of papers that summarises the main research activities in these areas developed in Spanish research centres in the fields of Pattern Recognition and Image Analysis, as well as in their Applications.
Abstract: This book deals with novel scientific and technology research in Pattern Recognition and Applications. It presents a selection of papers that summarises the main research activities in these areas developed in Spanish research centres. It includes thirty-one works organized into four categories reflecting the present areas of interest in the Spanish Pattern Recognition Community: Pattern Recognition: this Section includes new approaches related to classical pattern classification problems and methodologies like multi-edit algorithm, gradient-descent methods, hierarchical clustering, nearest neighbours rule, tree language compression, function described graphs, etc. Computer Vision: this Section presents new methods in colour segmentation, visual tracking, alignment in 3D reconstruction, trademark search techniques, visual behaviours for binocular navigation and active vision systems. Speech Recognition and Translation: this Section consists of five papers related to continuous speech recognition and statistical translation. They include new proposals in acoustic and language models, based on Connectionist and Syntactic Pattern Recognition approaches. Applications in Computer Vision, Speech Recognition and Translation: this Section deals with digital TV, biomedical images, mammography, trabecular bone patterns and new calibration methods for large surface topography. These papers are a good summary of the Spanish research in the fields of Pattern Recognition and Image Analysis, as well as in their Applications.

Proceedings ArticleDOI
03 Sep 2000
TL;DR: This paper investigates how the recognition rate is affected by the eventual contractivity factor, which is an indicator of guaranteed convergence after more than one iteration of the fractal code, and presents a novel method for calculating the eventualcontractivity factor for a general class of fractal codes.
Abstract: Fractal image coding has recently been used to perform object recognition, in particular human face recognition. It was shown that the transformations resulting from fractal image coding has invariant properties that can be exploited for recognition. Furthermore, the contractivity factor of a fractal code, which can be used to determine convergence using one code iteration, has a direct effect on the recognition rate. This paper investigates how this rate is affected by the eventual contractivity factor, which is an indicator of guaranteed convergence after more than one iteration of the fractal code. We demonstrate this by ensuring eventual convergence while permitting the contractivity factor to possess values larger than one the recognition rates can be improved. Experiments were performed on the ORL face database and an improved error rate of 1.1% was obtained. We also present a novel method for calculating the eventual contractivity factor for a general class of fractal codes.

Proceedings ArticleDOI
01 Jan 2000
TL;DR: A new object recognition method, the Invariant Pixel Set Signature (IPSS), is introduced and robustness to occlusion is shown using images with one half covered, finding that for a small change of viewpoint recognition of the occluded object is perfect.
Abstract: A new object recognition method, the Invariant Pixel Set Signature (IPSS), is introduced. Objects are represented with a probability density on the space of invariants computed from measurements (pixel values) inside convex hulls of n-tuples of interest points. Experimentally the method is tested on COIL‐ 20, a publicly available database of 72 views of 20 natural object rotating on a turntable. With a model built from a single view, recognition performance measured by the average match percentile is above for degrees and above for degrees. For some object, 100% first rank is achieved for all 72 views. Robustness to occlusion is shown using images with one half covered. For a small change of viewpoint ( degrees) recognition of the occluded object is perfect.