scispace - formally typeset
Search or ask a question

Showing papers on "3D single-object recognition published in 1998"


Book ChapterDOI
25 Dec 1998
TL;DR: An eigenspace manifold for the representation and recognition of pose-varying faces is described and a framework is proposed which can be used for both familiar and unfamiliar face recognition.
Abstract: We describe an eigenspace manifold for the representation and recognition of pose-varying faces. The distribution of faces in this manifold allows us to determine theoretical recognition characteristics which are then verified experimentally. Using this manifold a framework is proposed which can be used for both familiar and unfamiliar face recognition. A simple implementation demonstrates the pose dependent nature of the system over the transition from unfamiliar to familiar face recognition. Furthermore we show that multiple test images, whether real or virtual, can be used to augment the recognition process. The results compare favourably with reported human face recognition experiments. Finally, we describe how this framework can be used as a mechanism for characterising faces from video for general purpose recognition.

637 citations


Proceedings ArticleDOI
23 Jun 1998
TL;DR: An algorithm for object recognition that explicitly models and estimated the posterior probability function, P(object/image) in closed form is described, which captures the joint statistics of local appearance and position on the object as well as the statistics ofLocal appearance in the visual world at large.
Abstract: In this paper, we describe an algorithm for object recognition that explicitly models and estimated the posterior probability function, P(object/image). We have chosen a functional form of the posterior probability function that captures the joint statistics of local appearance and position on the object as well as the statistics of local appearance in the visual world at large. We use a discrete representation of local appearance consisting of approximately 10/sup 6/ patterns. We compute an estimate of P(object/image) in closed form by counting the frequency of occurrence of these patterns over various sets of training images. We have used this method for detecting human faces from frontal and profile views. The algorithm for frontal views has shown a detection rate of 93.0% with 88 false alarms on a set of 125 images containing 483 faces combining the MIT test set of Sung and Poggio with the CMU test sets of Rowley, Baluja, and Kanade. The algorithm for detection of profile views has also demonstrated promising results.

435 citations


Proceedings ArticleDOI
12 May 1998
TL;DR: A new method based on the extraction of 2D-DCT feature vectors is described, and the recognition results are compared with other face recognition approaches.
Abstract: The work presented in this paper focuses on the use of hidden Markov models for face recognition. A new method based on the extraction of 2D-DCT feature vectors is described, and the recognition results are compared with other face recognition approaches. The method introduced reduces significantly the computational complexity of previous HMM-based face recognition system, while preserving the same recognition rate.

341 citations


Journal ArticleDOI
TL;DR: The most viable model of object recognition may be one that incorporates the most appealing aspects of both image-based and structural description theories, and this approach holds great promise, but has potential pitfalls that may be best overcome by including structural information.

267 citations


Journal ArticleDOI
TL;DR: Experimental evidence is presented that it is found that observers' recognition of familiar dynamic three-dimensional objects is unaffected even when the objects' depth structure is scrambled, as long as their two-dimensional projections are unchanged.
Abstract: The interaction between depth perception and object recognition has important implications for the nature of mental object representations and models of hierarchical organization of visual processing. It is often believed that the computation of depth influences subsequent high-level object recognition processes, and that depth processing is an early vision task that is largely immune to 'top-down' object-specific influences, such as object recognition. Here we present experimental evidence that challenges both these assumptions in the specific context of stereoscopic depth-perception. We have found that observers' recognition of familiar dynamic three-dimensional (3D) objects is unaffected even when the objects' depth structure is scrambled, as long as their two-dimensional (2D) projections are unchanged. Furthermore, the observers seem perceptually unaware of the depth anomalies introduced by scrambling. We attribute the latter result to a top-down recognition-based influence whereby expectations about a familiar object's 3D structure override the true stereoscopic information.

175 citations


Proceedings ArticleDOI
23 Jun 1998
TL;DR: In this article, a 3D shape-based object recognition system for simultaneous recognition of multiple objects in scenes containing clutter and occlusion is presented, which is based on matching surfaces by matching points using the spin-image representation.
Abstract: We present a 3-D shape-based object recognition system for simultaneous recognition of multiple objects in scenes containing clutter and occlusion. Recognition is based on matching surfaces by matching points using the spin-image representation. The spin-image is a data level shape descriptor that is used to match surfaces represented as surface meshes. We present a compression scheme for spin-images that results in efficient multiple object recognition which we verify with results showing the simultaneous recognition of multiple objects from a library of 20 models. Furthermore, we demonstrate the robust performance of recognition in the presence of clutter and occlusion through analysis of recognition trials on 100 scenes.

142 citations


Proceedings ArticleDOI
04 Jan 1998
TL;DR: An appearance-based object recognition system using a keyed, multi-level contest representation reminiscent of certain aspects of cubist art, demonstrates good recognition of a variety of 3-D shapes, ranging from sports cars and fighter planes to snakes and lizards with full orthographic invariance.
Abstract: We describe an appearance-based object recognition system using a keyed, multi-level contest representation reminiscent of certain aspects of cubist art. Specifically, we utilize distinctive intermediate-level features in this case automatically extracted 2-D boundary fragments, as keys, which are then verified within a local contest, and assembled within a loose global contest to evoke an overall percept. This system demonstrates good recognition of a variety of 3-D shapes, ranging from sports cars and fighter planes to snakes and lizards with full orthographic invariance. We report the results of large-scale tests, involving over 2000 separate test images, that evaluate performance with increasing number of items in the database, in the presence of clutter, background change, and occlusion, and also the results of some generic classification experiments where the system is tested on objects never previously seen or modeled. To our knowledge, the results we report are the best in the literature for full-sphere tests of general shapes with occlusion and clutter resistance.

117 citations


Journal ArticleDOI
TL;DR: In this article, the confidence level of model matching is used as a reinforcement signal for a team of learning automata to search for segmentation parameters during training, which gives rise to significant improvement of the system performance by automatic generation of recognition strategies.
Abstract: Current computer vision systems whose basic methodology is open-loop or filter type typically use image segmentation followed by object recognition algorithms. These systems are not robust for most real-world applications. In contrast, the system presented here achieves robust performance by using reinforcement learning to induce a mapping from input images to corresponding segmentation parameters. This is accomplished by using the confidence level of model matching as a reinforcement signal for a team of learning automata to search for segmentation parameters during training. The use of the recognition algorithm as part of the evaluation function for image segmentation gives rise to significant improvement of the system performance by automatic generation of recognition strategies. The system is verified through experiments on sequences of indoor and outdoor color images with varying external conditions.

113 citations


Journal ArticleDOI
TL;DR: It is shown that, in agreement with psychophysical evidence, the view-combination approach can use views of different class members rather than multiple views of a single object, to obtain class-based generalization.

106 citations


Journal ArticleDOI
TL;DR: Evidence is presented that such spatiotemporal signatures are used in object recognition, and it is suggested that novel, three-dimensional, rotating objects are learned from image sequences in a continuous recognition task.

98 citations


Proceedings ArticleDOI
04 Jan 1998
TL;DR: An active object recognition algorithm is developed which is able to resolve ambiguities inherent in a single-view recognition algorithm and provides a means to quantitatively evaluate the contribution of individual receptive field vectors.
Abstract: This article develops an analogy between object recognition and the transmission of information through a channel based on the statistical representation of the appearances of 3D objects. This analogy provides a means to quantitatively evaluate the contribution of individual receptive field vectors, and to predict the performance of the object recognition process. Transinformation also provides a quantitative measure of the discrimination provided by each viewpoint, thus permitting the determination of the most discriminant viewpoints. As an application, the article develops an active object recognition algorithm which is able to resolve ambiguities inherent in a single-view recognition algorithm.

Patent
06 Feb 1998
TL;DR: In this paper, a method of recognizing graphical objects by subjecting the graphical information gathered through "spying" to a series of rules by which the object becomes understood or recognized as an instance of a standard logical object is presented.
Abstract: A method of recognizing graphical objects by subjecting the graphical information gathered through "spying" to a series of rules by which the object becomes understood or recognized as an instance of a standard logical object Before the rules are applied, graphical objects are first interpreted as primitives including groups of text, lines and images In order to recognize a graphical object as a logical object, the graphical information is subjected to the rules in an iterative process whereby an understanding of the object is continually refined As the rules are applied, the results are evaluated to determine whether the graphical object can be "mapped" to a standard logical object such as a textfield or listbox Once the object is understood as a logical element with which the user is accustomed, it is possible to interact with the object and obtain data from the object as if it were a standard object with a published interface By subjecting the graphical data to a series of rules designed specifically to recognize tables, the boundaries and the internal structure of rows and columns will be understood Once the graphical data is recognized as a table, the data which it contains in rows can then be accessed By classifying an object as an instance of a known object, assumptions can be made about the object so that it can be navigated or validated by sending events or messages

Proceedings ArticleDOI
01 Jan 1998
TL;DR: This paper presents a novel approach to selecting a minimised number of views that allow each object face to be adequately viewed according to specified constraints on viewpoints and other features.
Abstract: Many machine vision tasks, e.g. object recognition and object inspection, cannot be performed robustly from a single image. For certain tasks (e.g. 3D object recognition and automated inspection) the availability of multiple views of an object is a requirement. This paper presents a novel approach to selecting a minimised number of views that allow each object face to be adequately viewed according to specified constraints on viewpoints and other features. The planner is generic and can be employed for a wide range of multiple view acquisition systems, ranging from camera systems mounted on the end of a robot arm, i.e. an eye-in-hand camera setup, to a turntable and fixed stereo cameras to allow different views of an object to be obtained. The results (both simulated and real) given focus on planning with a fixed camera and turntable.

Journal ArticleDOI
TL;DR: An efficient approach to pose invariant pictorial object recognition employing spectral signatures of image patches that correspond to object surfaces which are roughly planar based on singular value decomposition (SVD).
Abstract: Describes an efficient approach to pose invariant pictorial object recognition employing spectral signatures of image patches that correspond to object surfaces which are roughly planar. Based on singular value decomposition (SVD), the affine transform is decomposed into slant, tilt, swing, scale, and 2D translation. Unlike previous log-polar representations which were not invariant to slant, our log-log sampling configuration in the frequency domain yields complete affine invariance. The images are preprocessed by a novel model-based segmentation scheme that detects and segments objects that are affine-similar to members of a model set of basic geometric shapes. The segmented objects are then recognized by their signatures using multidimensional indexing in a pictorial dataset represented in the frequency domain. Experimental results with a dataset of 26 models show 100 percent recognition rates in a wide range of 3D pose parameters and imaging degradations: 0-360/spl deg/ swing and tilt, 0-82/spl deg/ of slant, more than three octaves in scale change, window-limited translation, high noise levels (0 dB), and significantly reduced resolution (1:5).

Proceedings ArticleDOI
23 Jun 1998
TL;DR: Under the assumption that the feature positions of a planar object can be modeled using a jointly Gaussian density, the joint density over the corresponding set of affine coordinates is derived.
Abstract: Under a weak perspective camera model, the image plane coordinates in different views of a planar object are related by an affine transformation. Because of this property, researchers have attempted to use affine invariants for recognition. However, there are two problems with this approach: (1) objects or object classes with inherent variability cannot be adequately treated using invariants; and (2) in practice the calculated affine invariants can be quite sensitive to errors in the image plane measurements. In this paper we use probability distributions to address both of these difficulties. Under the assumption that the feature positions of a planar object can be modeled using a jointly Gaussian density, we have derived the joint density over the corresponding set of affine coordinates. Even when the assumptions of a planar object and a weak perspective camera model do not strictly hold, the results are useful because deviations from the ideal can be treated as deformability in the underlying object model.

Journal ArticleDOI
TL;DR: A novel method for representing 3D objects that unifies viewer and model centered object representations is presented, which encapsulates both the spatial structure of the object and a continuum of its views in the same data structure.
Abstract: A novel method for representing 3D objects that unifies viewer and model centered object representations is presented. A unified 3D frequency-domain representation, called volumetric frequency representation (VFR), encapsulates both the spatial structure of the object and a continuum of its views in the same data structure. The frequency-domain image of an object viewed from any direction can be directly extracted employing an extension of the projection slice theorem, where each Fourier-transformed view is a planar slice of the volumetric frequency representation. The VFR is employed for pose-invariant recognition of complex objects, such as faces. The recognition and pose estimation is based on an efficient matching algorithm in a four-dimensional Fourier space. Experimental examples of pose estimation and recognition of faces in various poses are also presented.

01 Jan 1998
TL;DR: The effectiveness of the object representation comes from its ability to combine the descriptive nature of global object properties with the robustness to partial views and clutter of local shape descriptions.
Abstract: We present an approach to recognition of complex objects in cluttered 3-D scenes that does not require feature extraction or segmentation. Our object representation comprises descriptive images associated with oriented points on the surface of an object. Using a single point basis constructed from an oriented point, the position of other points on the surface of the object can be described by two parameters. The accumulation of these parameters for many points on the surface of the object results in an image at each oriented point. These images, localized descriptions of the global shape of the object, are invariant to rigid transformations. Through correlation of images, point correspondences between a model and scene data are established. Geometric consistency is used to group the correspondences from which plausible rigid transformations that align the model with the scene are calculated. The transformations are then refined and verified using a modified iterative closest point algorithm. The effectiveness of our representation comes from its ability to combine the descriptive nature of global object properties with the robustness to partial views and clutter of local shape descriptions.The wide applicability of our algorithm is demonstrated with results showing recognition of complex objects in cluttered scenes with occlusion.

Journal ArticleDOI
Todd A. Cass1
TL;DR: A formal method which guarantees finding all feasible matchings in polynomial time is presented, and more computationally feasible algorithms are developed based on conservative approximations of the formal method.
Abstract: We consider model-based object localization based on local geometric feature matching between the model and the image data. The method is based on geometric constraint analysis, working in transformation space. We present a formal method which guarantees finding all feasible matchings in polynomial time. From there we develop more computationally feasible algorithms based on conservative approximations of the formal method. Additionally, our formalism relates object localization, affine model indexing, and structure from multiple views to one another.

Journal ArticleDOI
TL;DR: This research features the rapid recognition of three-dimensional objects, focusing on efficient indexing of model objects using a Bayesian framework and implemented a working prototype vision system using a feature structure called an LSG (local surface group) for generating object hypotheses.

Book ChapterDOI
01 Jan 1998
TL;DR: An automatic, real-time face recognition system based on a visual learning technique and its application to face detection in complex background, and accurate facial feature detection/tracking is reported.
Abstract: Two of the most important aspects in the general research framework of face recognition by computer are addressed here: face and facial feature detection, and face recognition — or rather face comparison. The best reported results of the mug-shot face recognition problem are obtained with elastic matching using jets. In this approach, the overall face detection, facial feature localization, and face comparison is carried out in a single step. This paper describes our research progress towards a different approach for face recognition. On the one hand, we describe a visual learning technique and its application to face detection in complex background, and accurate facial feature detection/tracking. On the other hand, a fast algorithm for 2D-template matching is presented as well as its application to face recognition. Finally, we report an automatic, real-time face recognition system.

Journal ArticleDOI
TL;DR: The GRUFF-I (Generic Recognition Using Form, Function and Interaction) system reasons about and generates plans for interaction with 3-D shapes for the purpose of generic object recognition, finding metrically accurate representations of the world can be built and used for higher level reasoning.

Journal ArticleDOI
01 Aug 1998
TL;DR: A robust closed-loop system based on "delayed" reinforcement learning that systematically controls feedback in a multilevel vision system and shows promise in approaching a long-standing problem in the field of computer vision and pattern recognition.
Abstract: Object recognition is a multilevel process requiring a sequence of algorithms at low, intermediate, and high levels. Generally, such systems are open loop with no feedback between levels and assuring their robustness is a key challenge in computer vision and pattern recognition research. A robust closed-loop system based on "delayed" reinforcement learning is introduced. The parameters of a multilevel system employed for model-based object recognition are learned. The method improves recognition results over time by using the output at the highest level as feedback for the learning system. It has been experimentally validated by learning the parameters of image segmentation and feature extraction and thereby recognizing 2D objects. The approach systematically controls feedback in a multilevel vision system and shows promise in approaching a long-standing problem in the field of computer vision and pattern recognition.

Patent
15 Jan 1998
TL;DR: In this paper, a system for enhancing the television presentation of an object that can display the object even if the object (804) is not visible to a camera (280) is presented.
Abstract: A system for enhancing the television presentation of an object (804) that can display the object (804) even if the object (804) is not visible to a camera (280). The system determines whether the object (804) is visible to the camera (280) broadcasting the event. If the object (804) is not visible to the camera (280), the video image captured by the camera (280) is edited to show the object (804), not show the object (804) or enhance the video in a different manner. The object (804) is placed in the captured video image at the position the object (804) would be in the camera's field of view if there was no barrier (796) between the object (804) and the camera (280).

Proceedings ArticleDOI
01 Jan 1998
TL;DR: It is shown that scale-trees, obtained from greyscale images, approximate such a tree, and it is shown how they may be modified using other attributes to more closely become object trees.
Abstract: A useful representation of an image would be an object tree in which nodes represent objects, or parts of objects, and which includes at least one node that, together with its children, represents each object: a grandmothernode. It is shown that scale-trees, obtained from greyscale images, approximate such a tree. It is then shown how they may be modified using other attributes to more closely become object trees. The result is a data structure that provides “handles” for every element of the image that can be used for manipulating the image. This segmentation has potential for object recognition.

Journal ArticleDOI
TL;DR: The Knowledge Primitives From knowledge Primitives to Functional Properties From Functional Properties to Category Definitions Flow of Control in Reasoning About Object Shape Recognition Results from Completely Known 3-D Shapes Recognition results from Partial 3- D Shape Information Discussion is studied.
Abstract: The Knowledge Primitives From Knowledge Primitives to Functional Properties From Functional Properties to Category Definitions Flow of Control in Reasoning About Object Shape Recognition Results from Completely Known 3-D Shapes Recognition Results from Partial 3-D Shape Information Discussion.

Journal ArticleDOI
TL;DR: Experimental results showing the qualitative recognition of aircraft in perspective, aerial images are presented.

Proceedings ArticleDOI
04 Jan 1998
TL;DR: Experiments on a database of 500 images show that object recognition based on composite color and shape invariant features provides excellent recognition accuracy and very high recognition accuracy whereas object recognitionbased entirely on shape invariants yields very poor discriminative power.
Abstract: New sets of color models are proposed for object recognition invariant to a change in view point, object geometry and illumination. Further, computational methods are presented to combine color and shape invariants to produce a high-dimensional invariant feature set for discriminatory object recognition. Experiments on a database of 500 images show that object recognition based on composite color and shape invariant features provides excellent recognition accuracy. Furthermore, object recognition based on color invariants provides very high recognition accuracy whereas object recognition based entirely on shape invariants yields very poor discriminative power. The image database and the performance of the recognition scheme can be experienced within PicToSeek: on-line as part of the ZOMAX system at: http://www.wins.uva.nl/research/isis/zomax/.

Proceedings ArticleDOI
04 Jan 1998
TL;DR: A new method to recognize 3D free-form objects from their apparent contours, which is the extension of the established method to recognizing objects with fixed edges and shows the effectiveness of the method.
Abstract: We propose a new method to recognize 3D free-form objects from their apparent contours. It is the extension of our established method to recognize objects with fixed edges. Object models are compared with 3D boundaries which are extracted by segment-based stereo vision. Based on the local shapes of the boundaries, candidate transformations are generated. The candidates are verified and adjusted based on the whole shapes of the boundaries. The models are built from all-around range data of the objects. Experimental results show the effectiveness of the method.

01 Jan 1998
TL;DR: The aim of this paper is to formulate recognition algorithms explicitly in terms of uncertain geometric features (such as points, lines, oriented points or frames) in 3D object recognition.
Abstract: The recognition problem is probably one of the most studied in computer vision. However, most techniques were developed on point features and were not designed to cope explicitly with uncertainty in measurements. The aim of this paper is to formulate recognition algorithms explicitly in terms of uncertain geometric features (such as points, lines, oriented points or frames). In the first part we review the principal matching algorithms and adapt them to work with generic geometric features. Then we analyze how to handle uncertainty on geometric features and the influence it has on the matching algorithms. Last but not least, we analyse four key problems for the implementation of these generic algorithms. Key Words: 3D Object Recognition, Invariants of 3D objects. 1 Introduction The recognition problem is probably one of the most studied in computer vision (see for instance [BJ85, CD86]) and many algorithms were developed to compare two images or to recognize objects with an a prior...

Proceedings ArticleDOI
23 Jun 1998
TL;DR: The discriminatory power of the proposed features of the new object representation is introduced and how to use it to organize large databases of objects is described.
Abstract: Previously a new object representation using appearance-based parts and relations to recognize 3D objects from 2D images, in the presence of occlusion and background clutter, was introduced. Appearance-based parts and relations are defined in terms of closed regions and the union of these regions, respectively. The regions are segmented using the MDL principle, and their appearance is obtained from collection of images and compactly represented by parametric manifolds in the eigenspaces spanned by the parts and the relations. In this paper we introduce the discriminatory power of the proposed features and describe how to use it to organize large databases of objects.