scispace - formally typeset
Search or ask a question

Showing papers on "Pose published in 2007"


Proceedings ArticleDOI
10 Apr 2007
TL;DR: The primary contribution of this work is the derivation of a measurement model that is able to express the geometric constraints that arise when a static feature is observed from multiple camera poses, and is optimal, up to linearization errors.
Abstract: In this paper, we present an extended Kalman filter (EKF)-based algorithm for real-time vision-aided inertial navigation. The primary contribution of this work is the derivation of a measurement model that is able to express the geometric constraints that arise when a static feature is observed from multiple camera poses. This measurement model does not require including the 3D feature position in the state vector of the EKF and is optimal, up to linearization errors. The vision-aided inertial navigation algorithm we propose has computational complexity only linear in the number of features, and is capable of high-precision pose estimation in large-scale real-world environments. The performance of the algorithm is demonstrated in extensive experimental results, involving a camera/IMU system localizing within an urban area.

1,435 citations


Journal ArticleDOI
TL;DR: The characteristics of human motion analysis are discussed to highlight trends in the domain and to point out limitations of the current state of the art.

908 citations


Journal ArticleDOI
TL;DR: A literature review on the second research direction, which aims to capture the real 3D motion of the hand, which is a very challenging problem in the context of HCI.

901 citations


Proceedings ArticleDOI
17 Jun 2007
TL;DR: Each action is modeled as a series of synthetic 2D human poses rendered from a wide range of viewpoints and the constraints on transition of the synthetic poses is represented by a graph model called Action Net.
Abstract: 3D human pose recovery is considered as a fundamental step in view-invariant human action recognition. However, inferring 3D poses from a single view usually is slow due to the large number of parameters that need to be estimated and recovered poses are often ambiguous due to the perspective projection. We present an approach that does not explicitly infer 3D pose at each frame. Instead, from existing action models we search for a series of actions that best match the input sequence. In our approach, each action is modeled as a series of synthetic 2D human poses rendered from a wide range of viewpoints. The constraints on transition of the synthetic poses is represented by a graph model called Action Net. Given the input, silhouette matching between the input frames and the key poses is performed first using an enhanced Pyramid Match Kernel algorithm. The best matched sequence of actions is then tracked using the Viterbi algorithm. We demonstrate this approach on a challenging video sets consisting of 15 complex action classes.

484 citations


Proceedings ArticleDOI
26 Dec 2007
TL;DR: This work proposes a novel and robust model to represent and learn generic 3D object categories, and proposes a framework in which learning is done via minimal supervision compared to previous works.
Abstract: We propose a novel and robust model to represent and learn generic 3D object categories. We aim to solve the problem of true 3D object categorization for handling arbitrary rotations and scale changes. Our approach is to capture a compact model of an object category by linking together diagnostic parts of the objects from different viewing points. We emphasize on the fact that our "parts" are large and discriminative regions of the objects that are composed of many local invariant features. Instead of recovering a full 3D geometry, we connect these parts through their mutual homographic transformation. The resulting model is a compact summarization of both the appearance and geometry information of the object class. We propose a framework in which learning is done via minimal supervision compared to previous works. Our results on categorization show superior performances to state-of-the-art algorithms such as (Thomas et al., 2006). Furthermore, we have compiled a new 3D object dataset that consists of 10 different object categories. We have tested our algorithm on this dataset and have obtained highly promising results.

476 citations


Proceedings ArticleDOI
26 Dec 2007
TL;DR: The alignment method improves performance on a face recognition task, both over unaligned images and over images aligned with a face alignment algorithm specifically developed for and trained on hand-labeled face images.
Abstract: Many recognition algorithms depend on careful positioning of an object into a canonical pose, so the position of features relative to a fixed coordinate system can be examined. Currently, this positioning is done either manually or by training a class-specialized learning algorithm with samples of the class that have been hand-labeled with parts or poses. In this paper, we describe a novel method to achieve this positioning using poorly aligned examples of a class with no additional labeling. Given a set of unaligned examplars of a class, such as faces, we automatically build an alignment mechanism, without any additional labeling of parts or poses in the data set. Using this alignment mechanism, new members of the class, such as faces resulting from a face detector, can be precisely aligned for the recognition process. Our alignment method improves performance on a face recognition task, both over unaligned images and over images aligned with a face alignment algorithm specifically developed for and trained on hand-labeled face images. We also demonstrate its use on an entirely different class of objects (cars), again without providing any information about parts or pose to the learning algorithm.

375 citations


Journal ArticleDOI
TL;DR: In this article, a convolutional network is used to map images of faces to points on a low-dimensional manifold parametrized by pose, and images of non-faces to points far away from that manifold.
Abstract: We describe a novel method for simultaneously detecting faces and estimating their pose in real time The method employs a convolutional network to map images of faces to points on a low-dimensional manifold parametrized by pose, and images of non-faces to points far away from that manifold Given an image, detecting a face and estimating its pose is viewed as minimizing an energy function with respect to the face/non-face binary variable and the continuous pose parameters The system is trained to minimize a loss function that drives correct combinations of labels and pose to be associated with lower energy values than incorrect ones The system is designed to handle very large range of poses without retraining The performance of the system was tested on three standard data sets---for frontal views, rotated faces, and profiles---is comparable to previous systems that are designed to handle a single one of these data sets We show that a system trained simuiltaneously for detection and pose estimation is more accurate on both tasks than similar systems trained for each task separately

316 citations


Proceedings ArticleDOI
29 Jul 2007
TL;DR: A system for inserting new objects into existing photographs by querying a vast image-based object library, pre-computed using a publicly available Internet object database, to shield the user from all of the arduous tasks typically involved in image compositing.
Abstract: We present a system for inserting new objects into existing photographs by querying a vast image-based object library, pre-computed using a publicly available Internet object database. The central goal is to shield the user from all of the arduous tasks typically involved in image compositing. The user is only asked to do two simple things: 1) pick a 3D location in the scene to place a new object; 2) select an object to insert using a hierarchical menu. We pose the problem of object insertion as a data-driven, 3D-based, context-sensitive object retrieval task. Instead of trying to manipulate the object to change its orientation, color distribution, etc. to fit the new image, we simply retrieve an object of a specified class that has all the required properties (camera pose, lighting, resolution, etc) from our large object library. We present new automatic algorithms for improving object segmentation and blending, estimating true 3D object size and orientation, and estimating scene lighting conditions. We also present an intuitive user interface that makes object insertion fast and simple even for the artistically challenged.

287 citations


Proceedings ArticleDOI
11 Oct 2007
TL;DR: A robust real-time algorithm that recognizes fingertips to reconstruct the six-degree-of-freedom camera pose relative to the user's outstretched hand is presented, constructed in a one-time calibration step by measuring the fingertip positions in presence of ground-truth scale information.
Abstract: We present markerless camera tracking and user interface methodology for readily inspecting augmented reality (AR) objects in wearable computing applications. Instead of a marker, we use the human hand as a distinctive pattern that almost all wearable computer users have readily available. We present a robust real-time algorithm that recognizes fingertips to reconstruct the six-degree-of-freedom camera pose relative to the user's outstretched hand. A hand pose model is constructed in a one-time calibration step by measuring the fingertip positions in presence of ground-truth scale information. Through frame-by-frame reconstruction of the camera pose relative to the hand, we can stabilize 3D graphics annotations on top of the hand, allowing the user to inspect such virtual objects conveniently from different viewing angles in AR. We evaluate our approach with regard to speed and accuracy, and compare it to state-of-the-art marker-based AR systems. We demonstrate the robustness and usefulness of our approach in an example AR application for selecting and inspecting world-stabilized virtual objects.

265 citations


Proceedings ArticleDOI
22 Oct 2007
TL;DR: This work presents an identity-and lighting-invariant system to estimate a driver's head pose, which is fully autonomous and operates online in daytime and nighttime driving conditions, using a monocular video camera sensitive to visible and near-infrared light.
Abstract: Recognizing driver awareness is an important prerequisite for the design of advanced automotive safety systems. Since visual attention is constrained to a driver's field of view, knowing where a driver is looking provides useful cues about his activity and awareness of the environment. This work presents an identity-and lighting-invariant system to estimate a driver's head pose. The system is fully autonomous and operates online in daytime and nighttime driving conditions, using a monocular video camera sensitive to visible and near-infrared light. We investigate the limitations of alternative systems when operated in a moving vehicle and compare our approach, which integrates Localized Gradient Orientation histograms with support vector machines for regression. We estimate the orientation of the driver's head in two degrees-of-freedom and evaluate the accuracy of our method in a vehicular testbed equipped with a cinematic motion capture system.

235 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a locality preserving CCA (LPCCA) to discover the local manifold structure of the data and further apply it to data visualization and pose estimation.

Proceedings ArticleDOI
10 Apr 2007
TL;DR: This paper describes a new image-based approach to tracking the 6DOF trajectory of a stereo camera pair using a corresponding reference image pairs instead of explicit 3D feature reconstruction of the scene.
Abstract: This paper describes a new image-based approach to tracking the 6DOF trajectory of a stereo camera pair using a corresponding reference image pairs instead of explicit 3D feature reconstruction of the scene. A dense minimisation approach is employed which directly uses all grey-scale information available within the stereo pair (or stereo region) leading to very robust and precise results. Metric 3D structure constraints are imposed by consistently warping corresponding stereo images to generate novel viewpoints at each stereo acquisition. An iterative non-linear trajectory estimation approach is formulated based on a quadrifocal relationship between the image intensities within adjacent views of the stereo pair. A robust M-estimation technique is used to reject outliers corresponding to moving objects within the scene or other outliers such as occlusions and illumination changes. The technique is applied to recovering the trajectory of a moving vehicle in long and difficult sequences of images.

Proceedings ArticleDOI
17 Jun 2007
TL;DR: A novel supervised approach to manifold-based non-linear dimensionality reduction for head pose estimation using the Biased Manifold Embedding framework, which shows substantial reduction in the error of pose angle estimation, and robustness to variations in feature spaces, dimensionality of embedding and other parameters.
Abstract: The estimation of head pose angle from face images is an integral component of face recognition systems, human computer interfaces and other human-centered computing applications. To determine the head pose, face images with varying pose angles can be considered to be lying on a smooth low-dimensional manifold in high-dimensional feature space. While manifold learning techniques capture the geometrical relationship between data points in the high-dimensional image feature space, the pose label information of the training data samples are neglected in the computation of these embeddings. In this paper, we propose a novel supervised approach to manifold-based non-linear dimensionality reduction for head pose estimation. The Biased Manifold Embedding (BME) framework is pivoted on the ideology of using the pose angle information of the face images to compute a biased neighborhood of each point in the feature space, before determining the low-dimensional embedding. The proposed BME approach is formulated as an extensible framework, and validated with the Isomap, Locally Linear Embedding (LLE) and Laplacian Eigen-maps techniques. A Generalized Regression Neural Network (GRNN) is used to learn the non-linear mapping, and linear multi-variate regression is finally applied on the low-dimensional space to obtain the pose angle. We tested this approach on face images of 24 individuals with pose angles varying from -90deg to +90deg with a granularity of 2. The results showed substantial reduction in the error of pose angle estimation, and robustness to variations in feature spaces, dimensionality of embedding and other parameters.

Book ChapterDOI
28 Jun 2007
TL;DR: A dynamical model over the latent space is learned which allows us to disambiguate between ambiguous silhouettes by temporal consistency and is easily extended to multiple observation spaces without constraints on type.
Abstract: We describe a method for recovering 3D human body pose from silhouettes. Our model is based on learning a latent space using the Gaussian Process Latent Variable Model (GP-LVM) [1] encapsulating both pose and silhouette features Our method is generative, this allows us to model the ambiguities of a silhouette representation in a principled way. We learn a dynamical model over the latent space which allows us to disambiguate between ambiguous silhouettes by temporal consistency. The model has only two free parameters and has several advantages over both regression approaches and other generative methods. In addition to the application shown in this paper the suggested model is easily extended to multiple observation spaces without constraints on type.

Proceedings ArticleDOI
05 Dec 2007
TL;DR: This work addresses the problem of estimating human pose in video sequences, where rough location has been determined, by defining suitable features of an image and its temporal neighbors, and learning a regression map to the parameters of a model of the human body using boosting techniques.
Abstract: We address the problem of estimating human pose in video sequences, where rough location has been determined. We exploit both appearance and motion information by defining suitable features of an image and its temporal neighbors, and learning a regression map to the parameters of a model of the human body using boosting techniques. Our algorithm can be viewed as a fast initialization step for human body trackers, or as a tracker itself. We extend gradient boosting techniques to learn a multi-dimensional map from (rotated and scaled) Haar features to the entire set of joint angles representing the full body pose. We test our approach by learning a map from image patches to body joint angles from synchronized video and motion capture walking data. We show how our technique enables learning an efficient real-time pose estimator, validated on publicly available datasets.

Journal ArticleDOI
TL;DR: This article presents the integration of 3-D shape knowledge into a variational model for level set based image segmentation and contour based3-D pose tracking and proves that for each view the model can fit the data in the image very well.
Abstract: In this article we present the integration of 3-D shape knowledge into a variational model for level set based image segmentation and contour based 3-D pose tracking. Given the surface model of an object that is visible in the image of one or multiple cameras calibrated to the same world coordinate system, the object contour extracted by the segmentation method is applied to estimate the 3-D pose parameters of the object. Vice-versa, the surface model projected to the image plane helps in a top-down manner to improve the extraction of the contour. While common alternative segmentation approaches, which integrate 2-D shape knowledge, face the problem that an object can look very differently from various viewpoints, a 3-D free form model ensures that for each view the model can fit the data in the image very well. Moreover, one additionally solves the problem of determining the object's pose in 3-D space. The performance is demonstrated by numerous experiments with a monocular and a stereo camera system.

Proceedings ArticleDOI
26 Dec 2007
TL;DR: This work represents surfaces as triangulated meshes and, assuming the pose in the first frame to be known, disallow large changes of edge orientation between consecutive frames, which is a generally applicable constraint when tracking surfaces in a 25 frames- per-second video sequence.
Abstract: 3-D shape recovery of non-rigid surfaces from 3-D to 2-D correspondences is an under-constrained problem that requires prior knowledge of the possible deformations. State-of-the-art solutions involve enforcing smoothness constraints that limit their applicability and prevent the recovery of sharply folding and creasing surfaces. Here, we propose a method that does not require such smoothness constraints. Instead, we represent surfaces as triangulated meshes and, assuming the pose in the first frame to be known, disallow large changes of edge orientation between consecutive frames, which is a generally applicable constraint when tracking surfaces in a 25 frames- per-second video sequence. We will show that tracking under these constraints can be formulated as a Second Order Cone Programming feasibility problem. This yields a convex optimization problem with stable solutions for a wide range of surfaces with very different physical properties.

Patent
19 Mar 2007
TL;DR: In this article, the path and/or position of an object are tracked using two or more cameras which run asynchronously so there is need to provide a common timing signal to each camera.
Abstract: The path and/or position of an object is tracked using two or more cameras which run asynchronously so there is need to provide a common timing signal to each camera. Captured images are analyzed to detect a position of the object in the image. Equations of motion for the object are then solved based on the detected positions and a transformation which relates the detected positions to a desired coordinate system in which the path is to be described. The position of an object can also be determined from a position which meets a distance metric relative to lines of position from three or more images. The images can be enhanced to depict the path and/or position of the object as a graphical element. Further, statistics such as maximum object speed and distance traveled can be obtained. Applications include tracking the position of a game object at a sports event.

Proceedings ArticleDOI
17 Jun 2007
TL;DR: This paper proposes an approach for object class localization which goes beyond bounding boxes, as it also determines the outline of the object, and directly generates, evaluates and clusters shape masks.
Abstract: This paper proposes an approach for object class localization which goes beyond bounding boxes, as it also determines the outline of the object. Unlike most current localization methods, our approach does not require any hypothesis parameter space to be defined. Instead, it directly generates, evaluates and clusters shape masks. Thus, the presented framework produces more informative results for object class localization. For example, it easily learns and detects possible object viewpoints and articulations, which are often well characterized by the object outline. We evaluate the proposed approach on the challenging natural-scene Graz-02 object classes dataset. The results demonstrate the extended localization capabilities of our method.

Journal ArticleDOI
TL;DR: A new approach is proposed for estimating 3D head pose from a monocular image that employs general prior knowledge of face structure and the corresponding geometrical constraints provided by the location of a certain vanishing point to determine the pose of human faces.

Journal ArticleDOI
TL;DR: A 32-state extended Kalman filter is developed that processes angular measurements from an optical navigation camera along with gyro and star tracker data to estimate the inertial position, velocity, attitude, and angular rates of both a target and chaser vehicle.
Abstract: This paper explores the potential of using angles-only navigation to perform various autonomous orbital rendezvous and close proximity operations. A 32-state extended Kalman filter is developed that processes angular measurements from an optical navigation camera along with gyro and star tracker data to estimate the inertial position, velocity, attitude, and angular rates of both a target and chaser vehicle. The target satellite is assumed to be passive while the chaser may perform a variety of autonomous rendezvous and close proximity maneuvers. The navigation filter’s performance is evaluated and tested by running a coded prototype in a closed loop 6 degree-of-freedom simulation tool containing the various sensors, actuators, GN&C flight algorithms, and dynamics associated with a simple rendezvous scenario. The analysis performed for this study uses standard Monte-Carlo techniques. These results not only include the navigation errors associated with implementing an angles-only navigation scheme, but they also reveal the dispersions from the nominal trajectory associated with this particular technique in a closed-loop GN&C setting. The rendezvous scenario duplicates a similar close proximity scenario analyzed using linear covariance analysis. Both methods are compared to add validity to the results and highlight the advantages and potential of each approach for autonomous orbital rendezvous and close proximity operation analysis.

Journal ArticleDOI
TL;DR: This paper proposes novel ways to deal with pose variations in a 2-D face recognition scenario and shows that if only pose parameters are modified, client specific information remains in the warped image and discrimination between subjects is more reliable.
Abstract: This paper proposes novel ways to deal with pose variations in a 2-D face recognition scenario. Using a training set of sparse face meshes, we built a point distribution model and identified the parameters which are responsible for controlling the apparent changes in shape due to turning and nodding the head, namely the pose parameters. Based on them, we propose two approaches for pose correction: 1) a method in which the pose parameters from both meshes are set to typical values of frontal faces, and 2) a method in which one mesh adopts the pose parameters of the other one. Finally, we obtain pose corrected meshes and, taking advantage of facial symmetry, virtual views are synthesized via Thin Plate Splines-based warping. Given that the corrected images are not embedded into a constant reference frame, holistic methods are not suitable for feature extraction. Instead, the virtual faces are fed into a system that makes use of Gabor filtering for recognition. Unlike other approaches that warp faces onto a mean shape, we show that if only pose parameters are modified, client specific information remains in the warped image and discrimination between subjects is more reliable. Statistical analysis of the authentication results obtained on the XM2VTS database confirm the hypothesis. Also, the CMU PIE database is used to assess the performance of the proposed methods in an identification scenario where large pose variations are present, achieving state-of-the-art results and outperforming both research and commercial techniques.

Proceedings ArticleDOI
17 Jun 2007
TL;DR: This paper presents a robust method that addresses challenges with localization in GPS denied environments using a human wearable system with two pairs of backward and forward looking stereo cameras together with an inertial measurement unit (IMU).
Abstract: Over the past decade, tremendous amount of research activity has focused around the problem of localization in GPS denied environments. Challenges with localization are highlighted in human wearable systems where the operator can freely move through both indoors and outdoors. In this paper, we present a robust method that addresses these challenges using a human wearable system with two pairs of backward and forward looking stereo cameras together with an inertial measurement unit (IMU). This algorithm can run in real-time with 15 Hz update rate on a dual-core 2 GHz laptop PC and it is designed to be a highly accurate local (relative) pose estimation mechanism acting as the front-end to a simultaneous localization and mapping (SLAM) type method capable of global corrections through landmark matching. Extensive tests of our prototype system so far, reveal that without any global landmark matching, we achieve between 0.5% and 1% accuracy in localizing a person over a 500 meter travel indoors and outdoors. To our knowledge, such performance results with a real time system have not been reported before.

Proceedings ArticleDOI
01 Nov 2007
TL;DR: This paper presents a novel approach to estimate the head pose from monocular images using a compact representation of the face using only few distinctive features and is computationally highly efficient.
Abstract: Estimating the head pose is an important capability of a robot when interacting with humans since the head pose usually indicates the focus of attention. In this paper, we present a novel approach to estimate the head pose from monocular images. Our approach proceeds in three stages. First, a face detector roughly classifies the pose as frontal, left, or right profile. Then, classifiers trained with AdaBoost using Haar-like features, detect distinctive facial features such as the nose tip and the eyes. Based on the positions of these features, a neural network finally estimates the three continuous rotation angles we use to model the head pose. Since we have a compact representation of the face using only few distinctive features, our approach is computationally highly efficient. As we show in experiments with standard databases as well as with real-time image data, our system locates the distinctive features with a high accuracy and provides robust estimates of the head pose.

Journal ArticleDOI
TL;DR: 2D and 3D face models are compared along three axes: representational power, construction, and real-time fitting for each axis in turn to outline the differences that result from using a 2D or a3D face model.
Abstract: Model-based face analysis is a general paradigm with applications that include face recognition, expression recognition, lip-reading, head pose estimation, and gaze estimation. A face model is first constructed from a collection of training data, either 2D images or 3D range scans. The face model is then fit to the input image(s) and the model parameters used in whatever the application is. Most existing face models can be classified as either 2D (e.g. Active Appearance Models) or 3D (e.g. Morphable Models). In this paper we compare 2D and 3D face models along three axes: (1) representational power, (2) construction, and (3) real-time fitting. For each axis in turn, we outline the differences that result from using a 2D or a 3D face model.

Proceedings ArticleDOI
17 Jun 2007
TL;DR: This paper combines multilevel encodings with improved stability to geometric transformations, with metric learning and semi-supervised manifold regularization methods in order to further profile them for task-invariance -resistance to background clutter and within the same human pose class differences.
Abstract: Recent research in visual inference from monocular images has shown that discriminatively trained image-based predictors can provide fast, automatic qualitative 3D reconstructions of human body pose or scene structure in real-world environments. However, the stability of existing image representations tends to be perturbed by deformations and misalignments in the training set, which, in turn, degrade the quality of learning and generalization. In this paper we advocate the semi-supervised learning of hierarchical image descriptions in order to better tolerate variability at multiple levels of detail. We combine multilevel encodings with improved stability to geometric transformations, with metric learning and semi-supervised manifold regularization methods in order to further profile them for task-invariance -resistance to background clutter and within the same human pose class differences. We quantitatively analyze the effectiveness of both descriptors and learning methods and show that each one can contribute, sometimes substantially, to more reliable 3D human pose estimates in cluttered images.

Patent
02 Aug 2007
TL;DR: In this article, a system and method for identifying objects using a machine-vision based system is described, one embodiment is a method that captures a first image of at least one object with an image capture device, processes the first captured image to find an object region based on a reference two-dimensional model and determines a three-dimensional pose estimation.
Abstract: A system and method for identifying objects using a machine-vision based system are disclosed. Briefly described, one embodiment is a method that captures a first image of at least one object with an image capture device, processes the first captured image to find an object region based on a reference two-dimensional model and determines a three-dimensional pose estimation based on a reference three-dimensional model that corresponds to the reference two-dimensional model and a runtime three-dimensional representation of the object region where a point-to-point relationship between the reference three-dimensional models of the object and the runtime three-dimensional representation of the object region is not necessarily previously known. Thus, two-dimensional information or data is used to segment an image and three-dimensional information or data used to perform three-dimensional pose estimation on a segment of the image.

Journal ArticleDOI
TL;DR: A new robot motion model utilizing structure from motion (SFM) methods and a novel mixture proposal distribution that combines local and global pose estimation are introduced and compared under a wide variety of operating modalities.
Abstract: With recent advances in real-time implementations of filters for solving the simultaneous localization and mapping (SLAM) problem in the range-sensing domain, attention has shifted to implementing SLAM solutions using vision-based sensing. This paper presents and analyses different models of the Rao-Blackwellised particle filter (RBPF) for vision-based SLAM within a comprehensive application architecture. The main contributions of our work are the introduction of a new robot motion model utilizing structure from motion (SFM) methods and a novel mixture proposal distribution that combines local and global pose estimation. In addition, we compare these under a wide variety of operating modalities, including monocular sensing and the standard odometry-based methods. We also present a detailed study of the RBPF for SLAM, addressing issues in achieving real-time, robust and numerically reliable filter behavior. Finally, we present experimental results illustrating the improved accuracy of our proposed models and the efficiency and scalability of our implementation.

Proceedings ArticleDOI
02 Jul 2007
TL;DR: In this paper, a non-linear complementary filter is proposed that evolves on the Special Euclidean Group SE(3) to obtain high quality pose estimation (position and orientation) from a combination of low cost sensors, such as an inertial measurement unit and vision sensor.
Abstract: This paper considers the problem of obtaining high quality pose estimation (position and orientation) from a combination of low cost sensors, such as an inertial measurement unit and vision sensor. A non-linear complementary filter is proposed that evolves on the Special Euclidean Group SE(3). Exponential stability of the filter is proved. Simulation results are presented to illustrate simplicity and demonstrate the performance of the proposed approach. Experimental results reinforce the convergence of the filter.

Proceedings ArticleDOI
10 Dec 2007
TL;DR: Results are presented illustrating that the method presented can closely track torso and arm movements even with noisy and incomplete sensor data, and examples of body language primitives that can be observed from this orientation and positioning information.
Abstract: In this paper we present a method for determining body orientation and pose information from laser scanner data using particle filtering with an adaptive modeling algorithm. A parametric human shape model is recursively updated to fit observed data after each resampling step of the particle filter. This updated model is then used in the likelihood estimation step for the following iteration. This method has been implemented and tested by using a network of laser range finders to observe human subjects in a variety of interactions. We present results illustrating that our method can closely track torso and arm movements even with noisy and incomplete sensor data, and we show examples of body language primitives that can be observed from this orientation and positioning information.