scispace - formally typeset
Search or ask a question

Showing papers on "3D single-object recognition published in 1995"


Journal ArticleDOI
TL;DR: A near real-time recognition system with 20 complex objects in the database has been developed and a compact representation of object appearance is proposed that is parametrized by pose and illumination.
Abstract: The problem of automatically learning object models for recognition and pose estimation is addressed. In contrast to the traditional approach, the recognition problem is formulated as one of matching appearance rather than shape. The appearance of an object in a two-dimensional image depends on its shape, reflectance properties, pose in the scene, and the illumination conditions. While shape and reflectance are intrinsic properties and constant for a rigid object, pose and illumination vary from scene to scene. A compact representation of object appearance is proposed that is parametrized by pose and illumination. For each object of interest, a large set of images is obtained by automatically varying pose and illumination. This image set is compressed to obtain a low-dimensional subspace, called the eigenspace, in which the object is represented as a manifold. Given an unknown input image, the recognition system projects the image to eigenspace. The object is recognized based on the manifold it lies on. The exact position of the projection on the manifold determines the object's pose in the image. A variety of experiments are conducted using objects with complex appearance characteristics. The performance of the recognition and pose estimation algorithms is studied using over a thousand input images of sample objects. Sensitivity of recognition to the number of eigenspace dimensions and the number of learning samples is analyzed. For the objects used, appearance representation in eigenspaces with less than 20 dimensions produces accurate recognition results with an average pose estimation error of about 1.0 degree. A near real-time recognition system with 20 complex objects in the database has been developed. The paper is concluded with a discussion on various issues related to the proposed learning and recognition methodology.

2,037 citations


Journal ArticleDOI
TL;DR: A review of recent developments in the computer vision aspect of motionbased recognition and several methods for the recognition of objects and motions, including cyclic motion detection and recognition, lipreading, hand gestures interpretation, motion verb recognition and temporal textures classification are reported.

489 citations


Journal ArticleDOI
Michael J. Tarr1
TL;DR: Findings reveal a prominent role for viewpointdependent mechanisms and provide support for themultiple-views approach, in which objects are encoded as a set of view-specific representations that are matched to percepts using normalization procedures.
Abstract: Successful object recognition is essential for finding food, identifying kin, and avoiding danger, as well as many other adaptive behaviors. To accomplish this feat, the visual system must reconstruct 3-D interpretations from 2-D “snapshots” falling on the retina. Theories of recognition address this process by focusing on the question of how object representations are encoded with respect to viewpoint. Although empirical evidence has been equivocal on this question, a growing body of surprising results, including those obtained in the experiments presented in this case study, indicates that recognition is often viewpoint dependent. Such findings reveal a prominent role for viewpointdependent mechanisms and provide support for themultiple-views approach, in which objects are encoded as a set of view-specific representations that are matched to percepts using normalization procedures.

446 citations


Patent
20 Apr 1995
TL;DR: In this paper, a method for detecting instances of a selected object or object feature in a digitally represented scene utilize analysis of probability densities to determine whether an input image (or portion thereof) represents such an instance.
Abstract: Methods and apparatus for detecting instances of a selected object or object feature in a digitally represented scene utilize analysis of probability densities to determine whether an input image (or portion thereof) represents such an instance. The invention filters images of objects that, although in some ways similar to the object under study, fail to qualify as typical instances of that object. The invention is useful in the detection and recognition of virtually any multifeatured entity such as human faces, features thereof (e.g., eyes), as well as non-rigid and articulated objects such as human hands.

174 citations


Journal ArticleDOI
TL;DR: It is shown how these shape primitives and relations can be easily recovered from superquadric ellipsoids which, in turn, can be recovered from either range or intensity images of occluded scenes.

150 citations


Journal ArticleDOI
TL;DR: This work describes a model based recognition system, called LEWIS, for the identification of planar objects based on a projectively invariant representation of shape, and provides an analysis of the combinatorial advantages of using index functions.
Abstract: We describe a model based recognition system, called LEWIS, for the identification of planar objects based on a projectively invariant representation of shape. The advantages of this shape description include simple model acquisition (direct from images), no need for camera calibration or object pose computation, and the use of index functions. We describe the feature construction and recognition algorithms in detail and provide an analysis of the combinatorial advantages of using index functions. Index functions are used to select models from a model base and are constructed from projective invariants based on algebraic curves and a canonical projective coordinate frame. Examples are given of object recognition from images of real scenes, with extensive object libraries. Successful recognition is demonstrated despite partial occlusion by unmodelled objects, and realistic lighting conditions.

146 citations


Journal ArticleDOI
TL;DR: The systems and concepts described in this paper document the evolution of the geometric invariance approach to object recognition over the last five years and provide a principled basis for the other stages of the recognition process such as feature grouping and hypothesis verification.

122 citations


01 Feb 1995
TL;DR: A method for directly learning and combining algorithms in a new way that imposes little burden on or bias from the humans involved is developed, and the new results it brings to the problem of natural image object recognition are brought to the focus of this report.
Abstract: Most artificial intelligence systems today work on simple problems and artificial domains because they rely on the accurate sensing of the task world. Object recognition is a crucial part of the sensing challenge and machine learning stands in a position to catapult object recognition into real world domains. Given that, to date, machine learning has not delivered general object recognition, we propose a different point of attack: the learning architectures themselves. We have developed a method for directly learning and combining algorithms in a new way that imposes little burden on or bias from the humans involved. This learning architecture, PADO, and the new results it brings to the problem of natural image object recognition is the focus of this report.

105 citations


Patent
02 Jun 1995
TL;DR: In this paper, an image signal provided by an image pickup device, such as a CCD, is converted into a video signal by a signal processor and an extraction means extracts a candidate object region from the current video signal and an object region determined a certain time before by a decision means is read from a memory.
Abstract: An image signal provided by an image pickup device, such as a CCD, is converted into a video signal by a signal processor. An extraction means extracts a candidate object region from the current video signal and an object region determined a certain time before by a decision means is read from a memory. For example, the decision means compares an overlapping region in which the candidate object region and the object region read from the memory overlap each other and provides a new object region a size larger than the overlapping region and replace the object region with the new object region to update the contents of the memory. On the other hand, a calculating means calculates the features of the object region including the position of the centroid of the object region on a screen. A control means, such as a microcomputer, controls the image pick up device on the basis of information about the features so that a region including the target object substantially in its central part is cut out as a video signal. Consequently, the target object can be surely tracked regardless of objects other than the target object included in a scene.

98 citations


Journal ArticleDOI
TL;DR: A general approach for deriving 3D invariants that can be used as input to a statistical classifier, such as a k-nearest-neighbours algorithm or a neural network.

81 citations


Book
06 Apr 1995
TL;DR: In this article, the recognition architecture is based on algebraic invariants and projective invariants of smooth plane curves, which are then used for 3D object segmentation and grouping.
Abstract: Introduction 1. Object recognition 2. Plane algebraic projective invariants 3. The recognition architecture: algebraic invariants: LEWIS1 4. Projective invariants of smooth plane curves: LEWIS2 5. Segmentation and grouping 6. Invariants for 3D objects Conclusions References Index

Patent
12 Sep 1995
TL;DR: In this paper, an object recognition apparatus and method for real-time training and recognition/inspection of test objects is presented. But the method is not suitable for the real-world environment.
Abstract: An object recognition apparatus and method for real-time training and recognition/inspection of test objects. To train the system, digital features of an object are captured as sub-frames extracted from a data stream. The data is thresholded and digitized and used to produce an address representing the digital feature. The address is used to write a value into a memory. During recognition or inspection, extracting digital features from a test object, converting the digital features extracted from the test object into addresses, and using the addresses developed from the test object to address the memory to correlate whether the same memory locations are addressed determines whether the test object matches the reference object.

Journal ArticleDOI
01 Jun 1995-Cortex
TL;DR: An opportunity to compare the merit of theories of object recognition has arisen in a patient who had a rare neuropsychological sign in which knowledge of the canonical upright of object drawings was profoundly disrupted, and whose drawings from memory and to copy, and in an orientation-matching task are discussed.

Proceedings ArticleDOI
25 Sep 1995
TL;DR: This paper describes a method for the interpretation of traffic scenes based on the detection and recognition of those objects, or classes of objects which are typically found in an urban scene, which provided a general-purpose reconstruction of the whole traffic scene as viewed by the driver.
Abstract: This paper describes a method for the interpretation of traffic scenes based on the detection and recognition of those objects, or classes of objects which are typically found in an urban scene. Since generic model-based recognition schemes are unsuitable for the analysis of traffic scenes and result in very poor performances, each of the different classes of objects which we expect to find in a typical scene is identified according to some selected features. After identifying the object, its main parameters are computed and, when needed, the object is further classified. The classes of objects we have considered included the roadbed, vehicles, buildings, trees, crosswalks and road signs. The method described here has been successfully tested on a wide set of images of traffic scenes and provided a general-purpose reconstruction of the whole traffic scene as viewed by the driver.

Dissertation
01 Jan 1995
TL;DR: The CAG is shown to be preserving and combining the best features of these two approaches while avoiding their drawbacks, and is tested on a range of diicult object recognition and localisation problems involving complex imagery of non rigid 3D objects under varied viewing conditions with excellent results.
Abstract: This thesis studies the use of colour information for object recognition. A new representation for objects with multiple colours-the colour adjacency graph (CAG)-is proposed. Each node of the CAG represents a single chromatic component of the image deened as a set of pixels forming a unimodal cluster in the chromatic scattergram. Edges encode information about adjacency of colour components and their reeectance ratio. The CAG is related to both the histogram and region adjacency graph representations. It is shown to be preserving and combining the best features of these two approaches while avoiding their drawbacks. The proposed approach is tested on a range of diicult object recognition and localisation problems involving complex imagery of non rigid 3D objects under varied viewing conditions with excellent results. Acknowledgements I would like to thank my supervisor Josef Kittler for his user-friendly guidance during the course of this work. \Very spatial thanks" go to Radek Ma r k. It would have been impossible to carry out all the experiments reported in this thesis without the software libraries he developed and programs he implemented. Lastly and most importantly I would like to thank my wife Romana for her support and understanding.

Proceedings ArticleDOI
23 Oct 1995
TL;DR: The results show that this approach is a viable method for successfully combining the image segmentation and object recognition steps for a computer vision module.
Abstract: A realworld computer vision module must deal with a wide variety of environmental parameters. Object recognition, one of the major tasks of this vision module, typically requires a preprocessing step to locate objects in the scenes that ought to be recognized. Genetic algorithms are a search technique for dealing with a very large search space, such as the one encountered in image segmentation or object recognition. The article describes a technique for using genetic algorithms to combine the image segmentation and object recognition steps for a complex scene. The results show that this approach is a viable method for successfully combining the image segmentation and object recognition steps for a computer vision module.


Proceedings ArticleDOI
20 Jun 1995
TL;DR: The key idea here is that a geometric class defined in 3D induces relationships in the image which must hold between points on the image outline (the perspective projection of the object) to enable both identification and grouping of image features belonging to objects of that class.
Abstract: In any object recognition system a major and primary task is to associate those image features, within an image of a complex scene, that arise from an individual object. The key idea here is that a geometric class defined in 3D induces relationships in the image which must hold between points on the image outline (the perspective projection of the object). The resulting image constraints enable both identification and grouping of image features belonging to objects of that class. The classes include surfaces of revolution, canal surfaces (pipes) and polyhedra. Recognition proceeds by first recognising an object as belonging to one of the classes (for example a surface of revolution) and subsequently identifying the object (for example as a particular vase). This differs from conventional object recognition systems where recognition is generally targetted at particular objects. These classes also support the computation of 3D invariant descriptions including symmetry axes, canonical coordinate frames and projective signatures. The constraints and grouping methods are viewpoint invariant, and proceed with no information on object pose. We demonstrate the effectiveness of this class-based grouping on real, cluttered scenes using grouping algorithms developed for rotationally symmetric surfaces, canal-surfaces and polyhedra. >

Patent
Allen Gee1, David M. Doria1
13 Jan 1995
TL;DR: In this paper, a method for determining the pose (translation, rotation, and scale) of a model object that best matches a target object located in image data is presented, where small adjustments are made to the original position and orientation of the model object until it converges to a state that best match the target object contained in the image data.
Abstract: Disclosed are a system and method for determining the pose (translation, rotation, and scale), or position and orientation, of a model object that best matches a target object located in image data. Through an iterative process small adjustments are made to the original position and orientation of the model object until it converges to a state that best matches the target object contained in the image data. Edge data representative of edges of the target object and edge data representative of the model object are processed for each data point in the model object relative to each point in the target object to produce a set of minimum distance vectors between the model object and the target object. A neural network estimates translation, rotation, and scaling adjustments that are to be made to the model object. Pose of the model object is adjusted relative to the target object based upon the estimated translation, rotation, and scaling adjustments provided by the neural network. Iterative calculation of the minimum distance vectors, estimation of the translation, rotation, and scaling adjustments, and adjustment of the position and orientation of the model object is adapted to reposition the model object until it substantially overlays the target object. Final position of the model object provides an estimate of the position and orientation of the target object in the digitized image.

Journal ArticleDOI
TL;DR: Several aspects related to the application of active vision techniques to object recognition are discussed and the face recognition problem based on the face-space approach is considered to demonstrate the advantage of adopting an active retina in recognition tasks.

Dissertation
01 Jan 1995
TL;DR: The essential scale-space causality property for local extrema of a signal under this operation is proved and it is shown that structuring functions from the "elliptic poweroids" lead to favourable dimensionality and semi-group properties.
Abstract: This thesis develops and demonstrates an original approach to scale-space theory. A new scale-space theory based on a unified multiscale morphological dilation-erosion smoothing operator is presented. The essential scale-space causality property for local extrema of a signal under this operation is proved. This result holds for signals on $\IR\sp2$ and higher dimensions and for negative as well as positive scales. When applied to grayscale images we show that structuring functions from the "elliptic poweroids" lead to favourable dimensionality and semi-group properties. Paraboloids, in particular, allow efficient computation of the scale-space, and such an algorithm is presented. The generalised frequency response of this signal smoother, which is similar to that of a Butterworth filter (with an amplitude dependent corner frequency), is obtained. The filter is statistically characterised by obtaining second-order statistical properties of the output signal with independent and identically distributed uniform noise input. Similar scale-space results are obtained for the multiscale morphological closing-opening operator, and we show that the resulting scale-space fingerprints are identical to those of the dilation-erosion. To demonstrate the utility of the new theory, we present an approach for the recognition of multiple 3-D objects in range data via the local matching of surfaces. In this approach the reduced morphological scale-space fingerprint is used as the primitive for matching. The resulting recognition process is invariant to translation, rotation, limited scaling, and partial occlusion. The results of the proposed object recognition method showing the recognition of a scene containing nine faces at various positions, angles and scales is presented. In a second demonstration we show the recognition of eight mountains in a digital elevation map.

Proceedings ArticleDOI
23 Oct 1995
TL;DR: Why the complete object geometry can be captured by the geometry of pairs of patches, how to design mutual invariants, and how to match patches in the data with those in the database at a low computational cost are discussed.
Abstract: An effective approach has appeared in the literature for recognizing a 2D curve or 3D surface objects of modest complexity based on representing an object by a single implicit polynomial of 3/sup rd/ or 4/sup th/ degree, computing a vector of Euclidean or affine invariants which are functions of the polynomial coefficients, followed by Bayesian object recognition of the invariants, thus producing a low computational cost robust recognition. This paper extends the approach, as well as an initial work on mutual invariants recognizers, to the recognition of objects too complicated to be represented by a single polynomial. Hence, an object to be recognized is partitioned into patches, each patch is represented by a single implicit polynomial, mutual invariants are computed for pairs of polynomials for pairs of patches, and the object recognition is via a Bayesian recognition of vectors of self and mutual invariants. We discuss why the complete object geometry can be captured by the geometry of pairs of patches, how to design mutual invariants, and how to match patches in the data with those in the database at a low computational cost. The approach is a low computational cost recognition of partially occluded articulated objects in an arbitrary position and in noise by recognizing the self or joint geometry of one or more patches.

Proceedings ArticleDOI
M. Bichsel1
23 Oct 1995
TL;DR: A face recognition experiment demonstrates that theoretical results, based on the law of incoherent light superposition, provide the solid ground on which a new illumination invariant recognition algorithm is derived and shows improved recognition performance.
Abstract: Varying illumination is severe problem for existing face recognition algorithms. Altering the light direction from left to right, for example, causes a change of contrast in large face regions and causes most face recognition algorithms to fail. Theoretical results, based on the law of incoherent light superposition, provide the solid ground on which a new illumination invariant recognition algorithm is derived. A face recognition experiment demonstrates that this algorithm indeed shows improved recognition performance even if the conditions, for which the theoretical results were derived, do not hold exactly.

Proceedings ArticleDOI
12 Sep 1995
TL;DR: A novel technique for matching images of object shapes which have been subject to affine transformation caused by variations in the camera position is reported, based on the genetic algorithm, and is more efficient and reliable than conventional approaches that rely on dominant points to determine the best alignment between object boundaries.
Abstract: In this paper, a novel technique for matching images of object shapes which have been subject to affine transformation caused by variations in the camera position is reported. The method is based on the genetic algorithm, and is more efficient and reliable than conventional approaches that rely on dominant points to determine the best alignment between object boundaries. Experimental results are reported which demonstrate the feasibility of the approach and its potential in practical applications.

Patent
22 Aug 1995
TL;DR: In this article, an object comparison part 3d judges whether or not an object number received from the object recognition part 3c matches the object number at contacting time which is stored in a RAM 4 and outputs an event processing signal to corresponding to the object numbers to an event process part 3g when they match each other.
Abstract: PROBLEM TO BE SOLVED: To prevent operator's misoperation by performing an event processing only when an object at contacting time is the same as an object at releasing time. SOLUTION: A signal judgement part 3a outputs coordinates C1 from a 1st object recognition part 3b when receiving coordinates C1 and a contact signal from a communication I/F 5 and to an object recognition part 3c when receiving coordinates C2 and a release signal. Then an object comparison part 3d judges whether or not an object number received from the object recognition part 3c matches the object number at contacting time which is stored in a RAM 4 and outputs an event processing signal to corresponding to the object number to an event processing part 3g when they match each other. Consequently, even if the operator touches a screen with a finger by mistake although the operator does not intend or selects a wrong object by touching a wrong screen position with a finger at the time operation, the choice can easily be canceled.

Journal ArticleDOI
TL;DR: Two approaches for utilizing the information in multiple entity groups and multiple views to reduce the number of hypotheses passed to the verification stage in a model-based object recognition system employing invariant feature indexing are proposed.


Journal ArticleDOI
TL;DR: A method of applying n-tuple recognition techniques to handwritten OCR, which involves scanning an n-Tuple classifier over a chain-code of the image, is described, offering superior recognition accuracy, as demonstrated by results on three widely used data sets.
Abstract: A method of applying n-tuple recognition techniques to handwritten OCR, which involves scanning an n-tuple classifier over a chain-code of the image, is described. The traditional advantages of n-tuple recognition, i.e. training and recognition speed, are retained, while offering superior recognition accuracy, as demonstrated by results on three widely used data sets.

Proceedings ArticleDOI
04 Jul 1995
TL;DR: The present authors demonstrate the application of PGHs to recognition tasks involving very large model training sets and conclude that it is suitable for the recognition of very large numbers of objects.
Abstract: Pairwise geometric histogram (PGH) based algorithms have previously been shown to be a robust solution for the recognition of arbitrary 2D shapes in the presence of occlusion and scene clutter (Evans et al., 1993). The method is both statistically founded and complete in the sense that a shape may be reconstructed from its PGH representation (Riocreuz et al., 1994). The generality of this method has been further reinforced by an analysis of its scaleability which concludes that, if used appropriately, it is suitable for the recognition of very large numbers of objects (Ashbrook et al., 1995). The present authors demonstrate the application of PGHs to recognition tasks involving very large model training sets.

Journal ArticleDOI
TL;DR: An approach to the integration of off-line and on-line recognition of unconstrained handwritten characters by adapting an on-LINE recognition algorithm to off- line recognition, based on high-quality thinning algorithms is presented.