scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

An active, appearance-based approach to the pose estimation of complex objects

J.L. Edwards1
04 Nov 1996-Vol. 3, pp 1458-1465
TL;DR: This work addresses the pose estimation problem for complex objects by investigating the feasibility of combining an active camera with a purely appearance-based approach to this problem.
Abstract: Addresses the pose estimation problem for complex objects. Specifically, we investigate the feasibility of combining an active camera with a purely appearance-based approach to this problem. Experimental results are presented for the case of a single non-occluded object against a uniform background.
Citations
More filters
Journal ArticleDOI
01 Feb 2003
TL;DR: A fast and precise range sensor based on active depth from defocus (DFD) is developed in conjunction with a three-component vision system, which is able to recognize and evaluate the attitude of 3-D objects.
Abstract: It is generally accepted that to develop versatile bin-picking systems capable of grasping and manipulation operations, accurate 3-D information is required. To accomplish this goal, we have developed a fast and precise range sensor based on active depth from defocus (DFD). This sensor is used in conjunction with a three-component vision system, which is able to recognize and evaluate the attitude of 3-D objects. The first component performs scene segmentation using an edge-based approach. Since edges are used to detect the object boundaries, a key issue consists of improving the quality of edge detection. The second component attempts to recognize the object placed on the top of the object pile using a model-driven approach in which the segmented surfaces are compared with those stored in the model database. Finally, the attitude of the recognized object is evaluated using an eigenimage approach augmented with range data analysis. The full bin-picking system will be outlined, and a number of experimental results will be examined.

56 citations


Cites background from "An active, appearance-based approac..."

  • ...For instance, (Edwards 1996) pointed out that sampling the object pose at a rate of 10 samples/DOF requires 10 images....

    [...]

  • ...For instance, Edwards (1996) points out that sampling the object pose at a rate of 10 samples/DOF requires 106 images....

    [...]

Proceedings ArticleDOI
07 Jun 2004
TL;DR: Since appearance-based methods do not require customized feature extractions, the new methods present a more flexible alternative, especially in situations where extracting features is not simple due to cluttered background, complex and irregular features, etc.
Abstract: This paper presents two different algorithms for object tracking and pose estimation. Both methods are based on an appearance model technique called Active Appearance Model (AAM). The key idea of the first method is to utilize two instances of the AAM to track landmark points in a stereo pair of images and perform 3D reconstruction of the landmarks followed by 3 D pose estimation. The second method, the AAM matching algorithm is an extension of the original AAM that incorporates the full 6 DOF pose parameters as part of the minimization parameters. This extension allows for the estimation of the 3D pose of any object, without any restriction on its geometry. We compare both algorithms with a previously developed algorithm using a geometric-based approach [14]. The results show that the accuracy in pose estimation of our new appearance-based methods is better than using the geometric-based approach. Moreover, since appearance-based methods do not require customized feature extractions, the new methods present a more flexible alternative, especially in situations where extracting features is not simple due to cluttered background, complex and irregular features, etc.

49 citations


Cites background from "An active, appearance-based approac..."

  • ...Some authors [4, 5] have proposed a simple normalization and centering of the image samples before the appearance matching can be performed....

    [...]

  • ...Later, [5] extended this idea to the full 6-DOF pose of an object but with limited accuracy....

    [...]

Patent
Rui Ishiyama1
02 Aug 2002
TL;DR: In this article, a 3D image data is formulated and saved in a memory for indicating a three-dimensional shape of an object and reflectivity or color at every point of the object.
Abstract: A three-dimensional image data is formulated and saved in a memory for indicating a three-dimensional shape of an object and reflectivity or color at every point of the object. For each of multiple pose candidates, an image space is created for representing brightness values of a set of two-dimensional images of the object which is placed in the same position and orientation as the each pose candidate. The brightness values are those which would be obtained if the object is illuminated under varying lighting conditions. For each pose candidate, an image candidate is detected within the image space using the 3D model data and a distance from the image candidate to an input image is determined. Corresponding to the image candidate whose distance is smallest, one of the pose candidates is selected. The image space is preferably created from each of a set of pose variants of each pose candidate.

30 citations

Book ChapterDOI
01 Nov 2008
TL;DR: Visual learning methods based on eigenimage analysis have been proposed as an alternative solution to address the object recognition and pose estimation for objects with complex appearances in highly adaptive manufacturing environments.
Abstract: In recent times the presence of vision and robotic systems in industry has become common place, but in spite of many achievements a large range of industrial tasks still remain unsolved due to the lack of flexibility of the vision systems when dealing with highly adaptive manufacturing environments. An important task found across a broad range of modern flexible manufacturing environments is the need to present parts to automated machinery from a supply bin. In order to carry out grasping and manipulation operations safely and efficiently we need to know the identity, location and spatial orientation of the objects that lie in an unstructured heap in a bin. Historically, the bin picking problem was tackled using mechanical vibratory feeders where the vision feedback was unavailable. This solution has certain problems with parts jamming and more important they are highly dedicated. In this regard if a change in the manufacturing process is required, the changeover may include an extensive re-tooling and a total revision of the system control strategy (Kelley et al., 1982). Due to these disadvantages modern bin picking systems perform grasping and manipulation operations using vision feedback (Yoshimi & Allen, 1994). Vision based robotic bin picking has been the subject of research since the introduction of the automated vision controlled processes in industry and a review of existing systems indicates that none of the proposed solutions were able to solve this classic vision problem in its generality. One of the main challenges facing such a bin picking system is its ability to deal with overlapping objects. The object recognition in cluttered scenes is the main objective of these systems and early approaches attempted to perform bin picking operations for similar objects that are jumbled together in an unstructured heap using no knowledge about the pose or geometry of the parts (Birk et al., 1981). While these assumptions may be acceptable for a restricted number of applications, in most practical cases a flexible system must deal with more than one type of object with a wide scale of shapes. A flexible bin picking system has to address three difficult problems: scene interpretation, object recognition and pose estimation. Initial approaches to these tasks were based on modeling parts using the 2D surface representations. Typical 2D representations include invariant shape descriptors (Zisserman et al., 1994), algebraic curves (Tarel & Cooper, 2000), 2 Name of the book (Header position 1,5) conics (Bolles & Horaud, 1986; Forsyth et al., 1991) and appearance based models (Murase & Nayar, 1995; Ohba & Ikeuchi, 1997). These systems are generally better suited to planar object recognition and they are not able to deal with severe viewpoint distortions or objects with complex shapes/textures. Also the spatial orientation cannot be robustly estimated for objects with free-form contours. To address this limitation most bin picking systems attempt to recognize the scene objects and estimate their spatial orientation using the 3D information (Fan et al., 1989; Faugeras & Hebert, 1986). Notable approaches include the use of 3D local descriptors (Ansar & Daniilidis, 2003; Campbell & Flynn, 2001; Kim & Kak, 1991), polyhedra (Rothwell & Stern, 1996), generalized cylinders (Ponce et al., 1989; Zerroug & Nevatia, 1996), super-quadrics (Blane et al., 2000) and visual learning methods (Johnson & Hebert, 1999; Mittrapiyanuruk et al., 2004). The most difficult problem for 3D bin picking systems that are based on a structural description of the objects (local descriptors or 3D primitives) is the complex procedure required to perform the scene to model feature matching. This procedure is usually based on complex graph-searching techniques and is increasingly more difficult when dealing with object occlusions, a situation when the structural description of the scene objects is incomplete. Visual learning methods based on eigenimage analysis have been proposed as an alternative solution to address the object recognition and pose estimation for objects with complex appearances. In this regard, Johnson and Hebert (Johnson & Hebert, 1999) developed an object recognition scheme that is able to identify multiple 3D objects in scenes affected by clutter and occlusion. They proposed an eigenimage analysis approach that is applied to match surface points using the spin image representation. The main attraction of this approach resides in the use of spin images that are local surface descriptors; hence they can be easily identified in real scenes that contain clutter and occlusions. This approach returns accurate results but the pose estimation cannot be inferred, as the spin images are local descriptors and they are not robust to capture the object orientation. In general the pose sampling for visual learning methods is a problem difficult to solve as the numbers of views required to sample the full 6 degree of freedom for object pose is prohibitive. This issue was addressed in the paper by Edwards (Edwards, 1996) when he applied eigenimage analysis to a one-object scene and his approach was able to estimate the pose only in cases where the tilt angle was limited to 30 degrees with respect to the optical axis of the sensor. In this chapter we describe the implementation of a vision sensor for robotic bin picking where we attempt to eliminate the main problem faced by the visual learning methods, namely the pose sampling problem. This paper is organized as follows. Section 2 outlines the overall system. Section 3 describes the implementation of the range sensor while Section 4 details the edge-based segmentation algorithm. Section 5 presents the viewpoint correction algorithm that is applied to align the detected object surfaces perpendicular on the optical axis of the sensor. Section 6 describes the object recognition algorithm. This is followed in Section 7 by an outline of the pose estimation algorithm. Section 8 presents a number of experimental results illustrating the benefits of the approach outlined in this chapter.

9 citations


Cites methods from "An active, appearance-based approac..."

  • ...Edwards, J. (1996)....

    [...]

  • ...This issue was addressed in the paper by Edwards (Edwards, 1996) when he applied eigenimage analysis to a one-object scene and his approach was able to estimate the pose only in cases where the tilt angle was limited to 30 degrees with respect to the optical axis of the sensor....

    [...]

Journal ArticleDOI
14 Nov 2007
TL;DR: The feasibility of estimating the object pose using an approach that combines the standard eigenspace analysis technique with range data analysis is investigated and the proposed pose estimation scheme has been successfully applied to scenes defined by polyhedral objects.
Abstract: In this paper we present a novel method for estimating the object pose for 3D objects with well-defined planar surfaces. Specifically, we investigate the feasibility of estimating the object pose using an approach that combines the standard eigenspace analysis technique with range data analysis. In this sense, eigenspace analysis was employed to constrain one object rotation and reject surfaces that are not compatible with a model object. The remaining two object rotations are estimated by computing the normal to the surface from the range data. The proposed pose estimation scheme has been successfully applied to scenes defined by polyhedral objects and experimental results are reported.

5 citations


Cites background or methods from "An active, appearance-based approac..."

  • ...per by Edwards [11] where an active pre-normalisation scheme was applied to reduce the object space from 6 DOF to 3 DOF....

    [...]

  • ...Visual learning methods based on eigenimage analysis [5,11,20,22,26] have been also proposed to esti-...

    [...]

  • ...object’s pose by matching its appearance [5,11,20,24, 34]....

    [...]

  • ...This problem was specifically addressed in the paper by Edwards [11] where an active pre-normalisation scheme was applied to reduce the object space from 6 DOF to 3 DOF....

    [...]

  • ...For example to sample the object pose at a rate of 10 samples/DOF requires 106 images [11]....

    [...]

References
More filters
Book
01 Jan 1988
TL;DR: Probabilistic Reasoning in Intelligent Systems as mentioned in this paper is a complete and accessible account of the theoretical foundations and computational methods that underlie plausible reasoning under uncertainty, and provides a coherent explication of probability as a language for reasoning with partial belief.
Abstract: From the Publisher: Probabilistic Reasoning in Intelligent Systems is a complete andaccessible account of the theoretical foundations and computational methods that underlie plausible reasoning under uncertainty. The author provides a coherent explication of probability as a language for reasoning with partial belief and offers a unifying perspective on other AI approaches to uncertainty, such as the Dempster-Shafer formalism, truth maintenance systems, and nonmonotonic logic. The author distinguishes syntactic and semantic approaches to uncertainty—and offers techniques, based on belief networks, that provide a mechanism for making semantics-based systems operational. Specifically, network-propagation techniques serve as a mechanism for combining the theoretical coherence of probability theory with modern demands of reasoning-systems technology: modular declarative inputs, conceptually meaningful inferences, and parallel distributed computation. Application areas include diagnosis, forecasting, image interpretation, multi-sensor fusion, decision support systems, plan recognition, planning, speech recognition—in short, almost every task requiring that conclusions be drawn from uncertain clues and incomplete information. Probabilistic Reasoning in Intelligent Systems will be of special interest to scholars and researchers in AI, decision theory, statistics, logic, philosophy, cognitive psychology, and the management sciences. Professionals in the areas of knowledge-based systems, operations research, engineering, and statistics will find theoretical and computational tools of immediate practical use. The book can also be used as an excellent text for graduate-level courses in AI, operations research, or applied probability.

15,671 citations

Proceedings ArticleDOI
03 Jun 1991
TL;DR: An approach to the detection and identification of human faces is presented, and a working, near-real-time face recognition system which tracks a subject's head and then recognizes the person by comparing characteristics of the face to those of known individuals is described.
Abstract: An approach to the detection and identification of human faces is presented, and a working, near-real-time face recognition system which tracks a subject's head and then recognizes the person by comparing characteristics of the face to those of known individuals is described. This approach treats face recognition as a two-dimensional recognition problem, taking advantage of the fact that faces are normally upright and thus may be described by a small set of 2-D characteristic views. Face images are projected onto a feature space ('face space') that best encodes the variation among known face images. The face space is defined by the 'eigenfaces', which are the eigenvectors of the set of faces; they do not necessarily correspond to isolated features such as eyes, ears, and noses. The framework provides the ability to learn to recognize new faces in an unsupervised manner. >

5,489 citations

Journal ArticleDOI
TL;DR: A near real-time recognition system with 20 complex objects in the database has been developed and a compact representation of object appearance is proposed that is parametrized by pose and illumination.
Abstract: The problem of automatically learning object models for recognition and pose estimation is addressed. In contrast to the traditional approach, the recognition problem is formulated as one of matching appearance rather than shape. The appearance of an object in a two-dimensional image depends on its shape, reflectance properties, pose in the scene, and the illumination conditions. While shape and reflectance are intrinsic properties and constant for a rigid object, pose and illumination vary from scene to scene. A compact representation of object appearance is proposed that is parametrized by pose and illumination. For each object of interest, a large set of images is obtained by automatically varying pose and illumination. This image set is compressed to obtain a low-dimensional subspace, called the eigenspace, in which the object is represented as a manifold. Given an unknown input image, the recognition system projects the image to eigenspace. The object is recognized based on the manifold it lies on. The exact position of the projection on the manifold determines the object's pose in the image. A variety of experiments are conducted using objects with complex appearance characteristics. The performance of the recognition and pose estimation algorithms is studied using over a thousand input images of sample objects. Sensitivity of recognition to the number of eigenspace dimensions and the number of learning samples is analyzed. For the objects used, appearance representation in eigenspaces with less than 20 dimensions produces accurate recognition results with an average pose estimation error of about 1.0 degree. A near real-time recognition system with 20 complex objects in the database has been developed. The paper is concluded with a discussion on various issues related to the proposed learning and recognition methodology.

2,037 citations

Book
01 Dec 1993

696 citations

Journal ArticleDOI
TL;DR: The approach uses two different types of primitives for matching: small surface patches, where differential properties can be reliably computed, and lines corresponding to depth or orientation discontinuities, which are represented by splashes and 3-D curves, respectively.
Abstract: The authors present an approach for the recognition of multiple 3-D object models from three 3-D scene data. The approach uses two different types of primitives for matching: small surface patches, where differential properties can be reliably computed, and lines corresponding to depth or orientation discontinuities. These are represented by splashes and 3-D curves, respectively. It is shown how both of these primitives can be encoded by a set of super segments, consisting of connected linear segments. These super segments are entered into a table and provide the essential mechanism for fast retrieval and matching. The issues of robustness and stability of the features are addressed in detail. The acquisition of the 3-D models is performed automatically by computing splashes in highly structured areas of the objects and by using boundary and surface edges for the generation of 3-D curves. The authors present results with the current system (3-D object recognition based on super segments) and discuss further extensions. >

577 citations