scispace - formally typeset
Search or ask a question

Showing papers on "Pose published in 2002"


Journal ArticleDOI
TL;DR: In this article, the authors categorize and evaluate face detection algorithms and discuss relevant issues such as data collection, evaluation metrics and benchmarking, and conclude with several promising directions for future research.
Abstract: Images containing faces are essential to intelligent vision-based human-computer interaction, and research efforts in face processing include face recognition, face tracking, pose estimation and expression recognition. However, many reported methods assume that the faces in an image or an image sequence have been identified and localized. To build fully automated systems that analyze the information contained in face images, robust and efficient face detection algorithms are required. Given a single image, the goal of face detection is to identify all image regions which contain a face, regardless of its 3D position, orientation and lighting conditions. Such a problem is challenging because faces are non-rigid and have a high degree of variability in size, shape, color and texture. Numerous techniques have been developed to detect faces in a single image, and the purpose of this paper is to categorize and evaluate these algorithms. We also discuss relevant issues such as data collection, evaluation metrics and benchmarking. After analyzing these algorithms and identifying their limitations, we conclude with several promising directions for future research.

3,894 citations


Proceedings ArticleDOI
07 Aug 2002
TL;DR: Control methods for an autonomous four-rotor helicopter, called a quadrotor, using visual feedback as the primary sensor are presented, and initial flight experiments are presented where the helicopter is restricted to vertical and yaw motions.
Abstract: We present control methods for an autonomous four-rotor helicopter, called a quadrotor, using visual feedback as the primary sensor The vision system uses aground camera to estimate the pose (position and orientation) of the helicopter Two methods of control are studied - one using a series of mode-based, feedback linearizing controllers, and the other using a backstepping-like control law Various simulations of the model demonstrate the implementation of feedback linearization and the backstepping controllers Finally, we present initial flight experiments where the helicopter is restricted to vertical and yaw motions

513 citations


Proceedings ArticleDOI
11 Aug 2002
TL;DR: An interactive vision system for a robot that finds an object specified by a user and brings it to the user and the user may provide additional information via speech such as pointing out mistakes and choosing the correct object from multiple candidates.
Abstract: This paper describes an interactive vision system for a robot that finds an object specified by a user and brings it to the user. The system first registers object models automatically. When the user specifies an object, the system tries to recognize the object automatically. When the recognition result is shown to the user, the user may provide additional information via speech such as pointing out mistakes, choosing the correct object from multiple candidates, or giving the relative position of the object. Based on the advice, the system tries again to recognize the object. Experiments are described using real-world refrigerator scenes.

241 citations


Book ChapterDOI
28 May 2002
TL;DR: A general framework is presented which allows for a novel set of linear solutions to the pose estimation problem for both n points and n lines and an analysis of the sensitivity of the algorithms to image noise is presented.
Abstract: Estimation of camera pose from an image of n points or lines with known correspondence is a thoroughly studied problem in computer vision. Most solutions are iterative and depend on nonlinear optimization of some geometric constraint, either on the world coordinates or on the projections to the image plane. For real-time applications we are interested in linear or closed-form solutions free of initialization. We present a general framework which allows for a novel set of linear solutions to the pose estimation problem for both n points and n lines. We present a number of simulations which compare our results to two other recent linear algorithm as well as to iterative approaches. We conclude with tests on real imagery in an augmented reality setup. We also present an analysis of the sensitivity of our algorithms to image noise.

180 citations


Book ChapterDOI
28 May 2002
TL;DR: A new algorithm, called SoftPOSIT, for determining the pose of a 3D object from a single 2D image in the case that correspondences between model points and image points are unknown, which has a run-time complexity that is better than previous methods by a factor equal to the number of image points.
Abstract: The problem of pose estimation arises in many areas of computer vision, including object recognition, object tracking, site inspection and updating, and autonomous navigation using scene models. We present a new algorithm, called SoftPOSIT, for determining the pose of a 3D object from a single 2D image in the case that correspondences between model points and image points are unknown. The algorithm combines Gold's iterative SoftAssign algorithm [19, 20] for computing correspondences and DeMenthon's iterative POSIT algorithm [13] for computing object pose under a full-perspective camera model. Our algorithm, unlike most previous algorithms for this problem, does not have to hypothesize small sets of matches and then verify the remaining image points. Instead, all possible matches are treated identically throughout the search for an optimal pose. The performance of the algorithm is extensively evaluated in Monte Carlo simulations on synthetic data under a variety of levels of clutter, occlusion, and image noise. These tests show that the algorithm performs well in a variety of difficult scenarios, and empirical evidence suggests that the algorithm has a run-time complexity that is better than previous methods by a factor equal to the number of image points. The algorithm is being applied to the practical problem of autonomous vehicle navigation in a city through registration of a 3D architectural models of buildings to images obtained from an on-board camera.

166 citations


Patent
Jiebo Luo1
13 Feb 2002
TL;DR: In this paper, a method for determining the orientation of a digital image, including the steps of: employing a semantic object detection method to detect the presence and orientation of an object, employing a scene layout detection method, and employing an arbitration method to produce an estimate of the image orientation from the estimated position of the detected object and the detected orientation of the scene layout, is presented.
Abstract: A method for determining the orientation of a digital image, includes the steps of: employing a semantic object detection method to detect the presence and orientation of a semantic object; employing a scene layout detection method to detect the orientation of a scene layout; and employing an arbitration method to produce an estimate of the image orientation from the orientation of the detected semantic object and the detected orientation of the scene layout.

126 citations


Proceedings ArticleDOI
30 Sep 2002
TL;DR: A general system that tracks the position and orientation of a camera observing a scene without visual markers and can employ any available feature tracking and pose estimation system for learning and tracking is described.
Abstract: Estimating the pose of a camera (virtual or real) in which some augmentation takes place is one of the most important parts of an augmented reality (AR) system. Availability of powerful processors and fast frame grabber shave made vision-based trackers commonly used due to their accuracy as well as flexibility and ease of use. Current vision-based trackers are based on tracking of markers. The use of markers increases robustness and reduces computational requirements. However, their use can be very complicated, as they require certain maintenance. Direct use of scene features for tracking, therefore, is desirable. To this end, we describe a general system that tracks the position and orientation of a camera observing a scene without any visual markers. Our method is base don a two-stage process. In the first stage, a set of features is learned with the help of an external tracking system while in action. The second stage uses these learned features for camera tracking when the system in thefirst stage decides that it is possible to do so. The system is very general so that it can employ any available feature tracking and pose estimation system for learning and tracking. We experimentally demonstrate the viability of the method in real-life examples.

126 citations


Patent
24 May 2002
TL;DR: In this article, a method of three-dimensional handling of an object by a robot using a tool and one camera mounted on the robot is disclosed in which at least six target features which are normal features of the object are selected on the object.
Abstract: A method of three-dimensional handling of an object by a robot using a tool and one camera mounted on the robot is disclosed in which at least six target features which are normal features of the object are selected on the object.

122 citations


Proceedings Article
04 Feb 2002
TL;DR: This work introduces an entirely continuous formulation which enforces model estimation consistency by means of an attraction/explanation silhouette-based term pair and presents a novel similarity measure (likelihood) for estimating three-dimensional human pose from image silhouettes in model-based vision applications.
Abstract: We present a novel similarity measure (likelihood) for estimating three-dimensional human pose from image silhouettes in model-based vision applications. One of the challenges in such approaches is the construction of a model-to-image likelihood that truly reflects the good configurations of the problem. This is hard, commonly due to the violation of consistency principle resulting in the introduction of spurious, unrelated peaks/minima that make the search for model localization difficult. We introduce an entirely continuous formulation which enforces model estimation consistency by means of an attraction/explanation silhouette-based term pair. We subsequently show how the proposed method provides significant consolidation and improved attraction zone around the desired likelihood configurations and elimination of some of the spurious ones. Finally, we present a skeleton-based smoothing method for the image silhouettes that stabilizes and accelerates the search process.

115 citations


Journal ArticleDOI
TL;DR: A new approach for estimating and tracking three-dimensional pose of a human face from the face images obtained from a single monocular view with full perspective projection, which is more robust than the existing feature-based approaches for face pose estimation.

111 citations


Proceedings Article
01 Jan 2002
TL;DR: This paper proposes the new simultaneous registration technique of several images and geometric model based on 2D-3D edge correspondence and the epipolar constraint between images.
Abstract: Texture mapping on scanned objects, that is, the method to map current color images on a 3D geometric model measured by a range sensor, is a key technique of photometric modeling for virtual reality. Usually range and color images are obtained from different viewing positions, through two independent range and color sensors. Thus, in order to map those color images on the geometric model, it is necessary to determine relative relations between these two viewpoints. In this paper, we propose a new calibration method for the texture mapping; the method utilizes reflectance images and iterative pose estimation based on a robust Mestimator. Moreover, since a 2D texture image taken from one viewing point is a partial view of an object, several images must be mapped onto the object in order to cover the entire 3D geometric model. In this paper, we propose the new simultaneous registration technique of several images and geometric model based on 2D-3D edge correspondence and the epipolar constraint between images.

Journal ArticleDOI
TL;DR: A new scheme that enables us to apply a filter mask (or a convolution filter) to orientation data to give time-domain filters for orientation data that are computationally efficient and satisfy such important properties as coordinate invariance, time invariance and symmetry.
Abstract: Capturing live motion has gained considerable attention in computer animation as an important motion generation technique. Canned motion data are comprised of both position and orientation components. Although a great number of signal processing methods are available for manipulating position data, the majority of these methods cannot be generalized easily to orientation data due to the inherent nonlinearity of the orientation space. In this paper, we present a new scheme that enables us to apply a filter mask (or a convolution filter) to orientation data. The key idea is to transform the orientation data into their analogues in a vector space, to apply a filter mask on them, and then to transform the results back to the orientation space. This scheme gives time-domain filters for orientation data that are computationally efficient and satisfy such important properties as coordinate invariance, time invariance and symmetry. Experimental results indicate that our scheme is useful for various purposes, including smoothing and sharpening.

Journal ArticleDOI
TL;DR: This work extends SVMs to model the appearance of human faces which undergo non-linear change across multiple views and uses inherent factors in the nature of the input images and the SVM classification algorithm to perform both multi-view face detection and pose estimation.

Proceedings ArticleDOI
11 Aug 2002
TL;DR: This work demonstrates their use in head pose estimation from head and shoulders video sequences and demonstrates that very few eigenspaces are necessary for a rough estimate of head pose.
Abstract: View based eigenspaces can improve the performance of face recognition algorithms. In this work, we demonstrate their use in head pose estimation from head and shoulders video sequences. Our method compares the projected energies of the test image in multiple eigenspaces. We also demonstrate that very few eigenspaces are necessary for a rough estimate of head pose. The method is robust and computationally inexpensive.

Journal ArticleDOI
TL;DR: A new method for 3D object recognition which uses segment-based stereo vision to identify objects in a cluttered environment and its position and orientation are determined accurately enabling a robot to pick up the object and manipulate it.
Abstract: We propose a new method for 3D object recognition which uses segment-based stereo vision. An object is identified in a cluttered environment and its position and orientation (6 dof) are determined accurately enabling a robot to pick up the object and manipulate it. The object can be of any shape (planar figures, polyhedra, free-form objects) and partially occluded by other objects. Segment-based stereo vision is employed for 3D sensing. Both CAD-based and sensor-based object modeling subsystems are available. Matching is performed by calculating candidates for the object position and orientation using local features, verifying each candidate, and improving the accuracy of the position and orientation by an iteration method. Several experimental results are presented to demonstrate the usefulness of the proposed method.

Proceedings ArticleDOI
Lisa M. Brown1, Yingli Tian1
05 Dec 2002
TL;DR: This work evaluates two coarse pose estimation schemes, based on (1) a probabilistic model approach and (2) a neural network approach, and compares the results of the two techniques for varying resolution, head localization accuracy and required pose accuracy.
Abstract: For many practical applications, it is sufficient to estimate a coarse head pose to infer gaze direction. Indeed, for any application in which the camera is situated unobtrusively in an overhead corner, the only possible inference is coarse pose because of the limitations of the quality and resolution of the incoming data. However, the vast majority of research in head pose estimation deals with tracking full rigid body motion (6 degrees of freedom) for a limited range of motion (typically /spl plusmn/45 degrees out-of-plane) and relatively high resolution data (usually 64/spl times/64 or more). We review the smaller body of research on coarse pose estimation. This work involves image-based learning, estimation of a wide range of pose, and is capable of real-time performance for low-resolution imagery. We evaluate two coarse pose estimation schemes, based on (1) a probabilistic model approach and (2) a neural network approach. We compare the results of the two techniques for varying resolution, head localization accuracy and required pose accuracy. We conclude with details for the implementation specifications for resolution and localization accuracy depending on system accuracy requirements.

Journal ArticleDOI
TL;DR: This paper first introduces the Gabor wavelet network (GWN) as a model-based approach for effective and efficient object representation and presents an approach for the estimation of head pose based on the GWNs.

Journal ArticleDOI
TL;DR: The registration problem for interactive AR applications is addressed, and an efficient solution to real-time camera tracking for scenes that contain planar structures is proposed, which can consider many types of scene with its method.
Abstract: We address the registration problem for interactive AR applications. Such applications require a real-time registration process. Although the registration problem has received a lot of attention in the computer vision community, it's far from being solved. Ideally, an AR system should work in all environments without the need to prepare the scene ahead of time, and users should be able to walk anywhere they want. In the past, several AR systems have achieved accurate and fast tracking and registration, putting dots over objects and tracking the dots with a camera. We can also achieve registration by identifying features in the scene that we can carefully measure for real-world coordinates. However, such methods restrict the system's flexibility. Hence, we need to investigate registration methods that work in unprepared environments and reduce the need to know the objects' geometry in the scene. We propose an efficient solution to real-time camera tracking for scenes that contain planar structures. We can consider many types of scene with our method. We show that our system is reliable and we can use it for real-time applications. We also present results demonstrating real-time camera tracking on indoor and outdoor scenes.

Proceedings ArticleDOI
07 Nov 2002
TL;DR: The main novelty of the proposed system, with respect to other 3D gesture recognition techniques, is the capability for robust recognition of complex hand postures such as those encountered in sign language alphabets.
Abstract: In this paper a gesture recognition system using 3D data is described The system relies on a novel 3D sensor that generates a dense range image of the scene. The main novelty of the proposed system, with respect to other 3D gesture recognition techniques, is the capability for robust recognition of complex hand postures such as those encountered in sign language alphabets. This is achieved by explicitly employing 3D hand features. Moreover the proposed approach does not rely on colour information, and guarantees robust segmentation of the hand under various illumination conditions, and content of the scene. Several novel 3D image analysis algorithms are presented covering the complete processing chain: 3D image acquisition, arm segmentation, hand-forearm segmentation, hand pose estimation, 3D feature extraction, and gesture classification. The proposed system is tested in an application scenario involving the recognition of sign-language postures.

Journal ArticleDOI
TL;DR: A unified approach to optimal pose trajectory planning for robot manipulators in Cartesian space through a genetic algorithm (GA) enhanced optimization of the pose ruled surface is presented.

Proceedings ArticleDOI
TL;DR: A vision based parking assistant system for autonomous parking which detects parking lots and precisely determines their pose using a stereo camera system is described.
Abstract: Parking is a challenging task for many drivers. A vision based parking assistant system for autonomous parking which detects parking lots and precisely determines their pose using a stereo camera system is described. Careful calibration of the cameras and the fitting of vehicle models to the stereo data in 3D space enable a high precision estimation of the parameters which are necessary for autonomous parking.

Patent
02 Jul 2002
TL;DR: In this article, a method and system for tracking a position and orientation (pose) of a camera using real scene features is presented, which includes the steps of capturing a video sequence by the camera, extracting features from the video sequence, estimating a first pose of the camera by an external tracking system, constructing a model of the features from first pose, and estimating a second pose by tracking the model of features.
Abstract: A method and system for tracking a position and orientation (pose) of a camera using real scene features is provided. The method includes the steps of capturing a video sequence by the camera; extracting features from the video sequence; estimating a first pose of the camera by an external tracking system; constructing a model of the features from the first pose; and estimating a second pose by tracking the model of the features, wherein after the second pose is estimated, the external tracking system is eliminated. The system includes an external tracker for estimating a reference pose; a camera for capturing a video sequence; a feature extractor for extracting features from the video sequence; a model builder for constructing a model of the features from the reference pose; and a pose estimator for estimating a pose of the camera by tracking the model of the features.

Journal ArticleDOI
TL;DR: A novel approach for creating a three-dimensional (3-D) face structure from multiple image views of a human face taken at a priori unknown poses by appropriately morphing a generic 3-D face into the specific face structure is described.
Abstract: We describe a novel approach for creating a three-dimensional (3-D) face structure from multiple image views of a human face taken at a priori unknown poses by appropriately morphing a generic 3-D face. A cubic explicit polynomial in 3-D is used to morph a generic face into the specific face structure. The 3-D face structure allows for accurate pose estimation as well as the synthesis of virtual images to be matched with a test image for face identification. The estimation of a 3-D person's face and pose estimation is achieved through the use of a distance map metric. This distance map residual error (geometric-based face classifier) and the image intensity residual error are fused in identifying a person in the database from one or more arbitrary image view(s). Experimental results are shown on simulated data in the presence of noise, as well as for images of real faces, and promising results are obtained.

Journal ArticleDOI
TL;DR: This work model the ground plane based on disparity and analyze its uncertainty in order to develop a stereo-based mobility aid for the partially sighted, and shows that obstacles and curbs are detected.

Journal ArticleDOI
07 Aug 2002
TL;DR: The proposed WLS approach yields optimal estimates in the least-squares sense, is applicable to heterogeneous geometrical features decomposed in points and lines, and has an O(N) computation time.
Abstract: We propose a weighted least-squares (WLS) algorithm for optimal pose estimation of mobile robots using geometrical maps as environment models. Pose estimation is achieved from feature correspondences in a nonlinear framework without linearization. The proposed WLS approach yields optimal estimates in the least-squares sense, is applicable to heterogeneous geometrical features decomposed in points and lines, and has an O(N) computation time.

Journal ArticleDOI
TL;DR: A survey of classical and modern methods for the determination of the exterior parameters in photogrammetry, available for some of them as software packages with practical examples on the Internet can be found in this article.
Abstract: The determination of the attitude, the position and the intrinsic geometric characteristics of the camera is known as the fundamental photogrammetric problem. It can be summarised in the determination of camera interior and exterior orientation parameters, as well as the determination of 3D coordinates of object points. The term "exterior orientation" of an image refers to its position and orientation related to an exterior coordinate system. Several methods can be applied to determine the parameters of the orientation of one, two or more photos. The orientation can be processed in steps (as relative and absolute orientation) but simultaneous methods (as bundle adjustments) are now available in a majority of software packages. Several methods have also been developed for the orientation of single images. They are based in general on geometric and topologic characteristics of imaged objects. In this paper we present a survey of classical and modern methods for the determination of the exterior parameters in photogrammetry, available for some of them as software packages with practical examples on the Internet. The presented methods are classified in three principal groups. In the first one, a selection of approximate methods for applications that do not require grand accuracy are presented. They are also used to calculate values required for iterative methods. In the second group, standard point-based methods derived from collinearity, coplanarity or coangularity conditions are shortly reviewed with an extend to line-based approaches. The third group represents orientation methods based on constraints and projective geometry concepts, which are more and more of interest for photogrammetrists. In the last section, the paper gives a summary of existing strategies for automatic exterior orientation in aerial photogrammetry.

Patent
30 Aug 2002
TL;DR: In this article, the authors proposed a hierarchical model for the recognition of objects in an image, where the objects may consist of an arbitrary number of parts that are allowed to move with respect to each other.
Abstract: The present invention provides a method for the recognition of objects in an image, where the objects may consist of an arbitrary number of parts that are allowed to move with respect to each other. In the offline phase the invention automatically learns the relative movements of the single object parts from a sequence of example images and builds a hierarchical model that incorporates a description of the single object parts, the relations between the parts, and an efficient search strategy. This is done by analyzing the pose variations (e.g., variations in position, orientation, and scale) of the single object parts in the example images. The poses can be obtained by an arbitrary similarity measure for object recognition, e.g., normalized cross correlation, Hausdorff distance, generalized Hough transform, the modification of the generalized Hough transform, or the similarity measure. In the online phase the invention uses the hierarchical model to efficiently find the entire object in the search image. During the online phase only valid instances of the object are found, i.e., the object parts are not searched for in the entire image but only in a restricted portion of parameter space that is defined by the relations between the object parts within the hierarchical model, what facilitates an efficient search and makes a subsequent validation step unnecessary.

Journal ArticleDOI
TL;DR: This paper describes and analyzes techniques which have been developed for object representation and recognition and proposes a set of specifications, which all object recognition systems should strive to meet.

Journal ArticleDOI
TL;DR: A new approach to parts orienting through the manipulation of pose distributions is presented and simulation and experimental results that show how dynamic simulation can be used to find optimal shapes and drop heights for a given part are presented.
Abstract: For assembly tasks parts often have to be oriented before they can be put in an assembly. The results presented in this paper are a com- ponent of the automated design of parts orienting devices. The focus is on orienting parts with minimal sensing and manipulation. We present a new approach to parts orienting through the manipulation of pose distributions. Through dynamic simulation we can determine the pose distribution for an object being dropped from an arbitrary height on an arbitrary surface. By varying the drop height and the shape of the support surface we can find the initial conditions that will result in a pose distribution with minimal entropy. We are trying to uniquely orient a part with high probability just by varying the initial conditions. We will derive a condition on the pose and veloc- ity of a simple planar object in contact with a sloped surface that will allow us to quickly determine the final resting configuration of the object. This condition can then be used to quickly compute the pose distribution. We also present simulation and experimental re- sults that show how dynamic simulation can be used to find optimal shapes and drop heights for a given part. KEY WORDS—parts orienting, feeder design, pose statistics

Proceedings ArticleDOI
30 Sep 2002
TL;DR: This work proposes a semiautomatic image-based calibration method requiring only minimal interaction within the workflow, and shows reasonably good accuracy and convergence with workflow interruption of less than one second per incremental step.
Abstract: Industrial augmented reality (AR) applications require fast, robust, and precise tracking. In environments where conventional high-end tracking systems cannot be applied for certain reasons, marker-based tracking can be used with success as a substitute if care is taken about (1) calibration and (2) run-time tracking fidelity. In out-of-the-laboratory environments multi-marker tracking is needed because the pose estimated from a single marker is not stable enough. The overall pose estimation can be dramatically improved by fusing information from several markers fixed relative to each other compared to a single marker only. To achieve results applicable in an industrial context relative marker poses need to be properly calibrated. We propose a semiautomatic image-based calibration method requiring only minimal interaction within the workflow. Our method can be used off-line, or preferably incrementally online. When used online, our method shows reasonably good accuracy and convergence with workflow interruption of less than one second per incremental step. Thus, it can be interactively used. We illustrate our method with an industrial application scenario.