scispace - formally typeset
Search or ask a question

Showing papers on "Motion estimation published in 2007"


Journal ArticleDOI
TL;DR: The first successful application of the SLAM methodology from mobile robotics to the "pure vision" domain of a single uncontrolled camera, achieving real time but drift-free performance inaccessible to structure from motion approaches is presented.
Abstract: We present a real-time algorithm which can recover the 3D trajectory of a monocular camera, moving rapidly through a previously unknown scene. Our system, which we dub MonoSLAM, is the first successful application of the SLAM methodology from mobile robotics to the "pure vision" domain of a single uncontrolled camera, achieving real time but drift-free performance inaccessible to structure from motion approaches. The core of the approach is the online creation of a sparse but persistent map of natural landmarks within a probabilistic framework. Our key novel contributions include an active approach to mapping and measurement, the use of a general motion model for smooth camera movement, and solutions for monocular feature initialization and feature orientation estimation. Together, these add up to an extremely efficient and robust algorithm which runs at 30 Hz with standard PC and camera hardware. This work extends the range of robotic systems in which SLAM can be usefully applied, but also opens up new areas. We present applications of MonoSLAM to real-time 3D localization and mapping for a high-performance full-size humanoid robot and live augmented reality with a hand-held camera

3,772 citations


Journal ArticleDOI
TL;DR: The characteristics of human motion analysis are discussed to highlight trends in the domain and to point out limitations of the current state of the art.

908 citations


Proceedings ArticleDOI
17 Jun 2007
TL;DR: This paper compares four 3D motion segmentation algorithms for affine cameras on a benchmark of 155 motion sequences of checkerboard, traffic, and articulated scenes.
Abstract: Over the past few years, several methods for segmenting a scene containing multiple rigidly moving objects have been proposed. However, most existing methods have been tested on a handful of sequences only, and each method has been often tested on a different set of sequences. Therefore, the comparison of different methods has been fairly limited. In this paper, we compare four 3D motion segmentation algorithms for affine cameras on a benchmark of 155 motion sequences of checkerboard, traffic, and articulated scenes.

757 citations


Journal ArticleDOI
TL;DR: A novel discriminative learning method over sets is proposed for set classification that maximizes the canonical correlations of within-class sets and minimizes thecanon correlations of between- class sets.
Abstract: We address the problem of comparing sets of images for object recognition, where the sets may represent variations in an object's appearance due to changing camera pose and lighting conditions. canonical correlations (also known as principal or canonical angles), which can be thought of as the angles between two d-dimensional subspaces, have recently attracted attention for image set matching. Canonical correlations offer many benefits in accuracy, efficiency, and robustness compared to the two main classical methods: parametric distribution-based and nonparametric sample-based matching of sets. Here, this is first demonstrated experimentally for reasonably sized data sets using existing methods exploiting canonical correlations. Motivated by their proven effectiveness, a novel discriminative learning method over sets is proposed for set classification. Specifically, inspired by classical linear discriminant analysis (LDA), we develop a linear discriminant function that maximizes the canonical correlations of within-class sets and minimizes the canonical correlations of between-class sets. Image sets transformed by the discriminant function are then compared by the canonical correlations. Classical orthogonal subspace method (OSM) is also investigated for the similar purpose and compared with the proposed method. The proposed method is evaluated on various object recognition problems using face image sets with arbitrary motion captured under different illuminations and image sets of 500 general objects taken at different views. The method is also applied to object category recognition using ETH-80 database. The proposed method is shown to outperform the state-of-the-art methods in terms of accuracy and efficiency

626 citations


Proceedings ArticleDOI
03 Sep 2007
TL;DR: An effective video denoising method based on highly sparse signal representation in local 3D transform domain that achieves state-of-the-art denoised performance in terms of both peak signal-to-noise ratio and subjective visual quality is proposed.
Abstract: We propose an effective video denoising method based on highly sparse signal representation in local 3D transform domain. A noisy video is processed in blockwise manner and for each processed block we form a 3D data array that we call “group” by stacking together blocks found similar to the currently processed one. This grouping is realized as a spatio-temporal predictive-search block-matching, similar to techniques used for motion estimation. Each formed 3D group is filtered by a 3D transform-domain shrinkage (hard-thresholding and Wiener filtering), the result of which are estimates of all grouped blocks. This filtering — that we term “collaborative filtering” — exploits the correlation between grouped blocks and the corresponding highly sparse representation of the true signal in the transform domain. Since, in general, the obtained block estimates are mutually overlapping, we aggregate them by a weighted average in order to form a non-redundant estimate of the video. Significant improvement of this approach is achieved by using a two-step algorithm where an intermediate estimate is produced by grouping and collaborative hard-thresholding and then used both for improving the grouping and for applying collaborative empirical Wiener filtering. We develop an efficient realization of this video denoising algorithm. The experimental results show that at reasonable computational cost it achieves state-of-the-art denoising performance in terms of both peak signal-to-noise ratio and subjective visual quality.

496 citations


Proceedings ArticleDOI
17 Jun 2007
TL;DR: This work represents the body using a recently proposed triangulated mesh model called SCAPE which employs a low-dimensional, but detailed, parametric model of shape and pose-dependent deformations that is learned from a database of range scans of human bodies.
Abstract: Much of the research on video-based human motion capture assumes the body shape is known a priori and is represented coarsely (e.g. using cylinders or superquadrics to model limbs). These body models stand in sharp contrast to the richly detailed 3D body models used by the graphics community. Here we propose a method for recovering such models directly from images. Specifically, we represent the body using a recently proposed triangulated mesh model called SCAPE which employs a low-dimensional, but detailed, parametric model of shape and pose-dependent deformations that is learned from a database of range scans of human bodies. Previous work showed that the parameters of the SCAPE model could be estimated from marker-based motion capture data. Here we go further to estimate the parameters directly from image data. We define a cost function between image observations and a hypothesized mesh and formulate the problem as optimization over the body shape and pose parameters using stochastic search. Our results show that such rich generative models enable the automatic recovery of detailed human shape and pose from images.

378 citations


Proceedings ArticleDOI
17 Jun 2007
TL;DR: This paper separates the image deblurring into filter estimation and image deconvolution processes, and proposes a novel algorithm to estimate the motion blur filter from a perspective of alpha values.
Abstract: One of the key problems of restoring a degraded image from motion blur is the estimation of the unknown shift-invariant linear blur filter. Several algorithms have been proposed using image intensity or gradient information. In this paper, we separate the image deblurring into filter estimation and image deconvolution processes, and propose a novel algorithm to estimate the motion blur filter from a perspective of alpha values. The relationship between the object boundary transparency and the image motion blur is investigated. We formulate the filter estimation as solving a maximum a posteriori (MAP) problem with the defined likelihood and prior on transparency. Our unified approach can be applied to handle both the camera motion blur and the object motion blur.

365 citations


Proceedings ArticleDOI
17 Jun 2007
TL;DR: A system that integrates fully automatic scene geometry estimation, 2D object detection, 3D localization, trajectory estimation, and tracking for dynamic scene interpretation from a moving vehicle and demonstrates the performance of this integrated system on challenging real-world data showing car passages through crowded city areas.
Abstract: In this paper, we present a system that integrates fully automatic scene geometry estimation, 2D object detection, 3D localization, trajectory estimation, and tracking for dynamic scene interpretation from a moving vehicle. Our sole input are two video streams from a calibrated stereo rig on top of a car. From these streams, we estimate structure-from-motion (SfM) and scene geometry in real-time. In parallel, we perform multi-view/multi-category object recognition to detect cars and pedestrians in both camera images. Using the SfM self-localization, 2D object detections are converted to 3D observations, which are accumulated in a world coordinate frame. A subsequent tracking module analyzes the resulting 3D observations to find physically plausible spacetime trajectories. Finally, a global optimization criterion takes object-object interactions into account to arrive at accurate 3D localization and trajectory estimates for both cars and pedestrians. We demonstrate the performance of our integrated system on challenging real-world data showing car passages through crowded city areas.

353 citations


Journal ArticleDOI
TL;DR: A new motion-compe (MC) interpolation algorithm to enhance the temporal resolution of video sequences and can overcome the limitations of the conventional OBMC, such as over-smoothing and poor de-blocking.
Abstract: In this work, we develop a new motion-compe (MC) interpolation algorithm to enhance the temporal resolution of video sequences. First, we propose the bilateral motion estimation scheme to obtain the motion field of an interpolated frame without yielding the hole and overlapping problems. Then, we partition a frame into several object regions by clustering motion vectors. We apply the variable-size block MC (VS-BMC) algorithm to object boundaries in order to reconstruct edge information with a higher quality. Finally, we use the adaptive overlapped block MC (OBMC), which adjusts the coefficients of overlapped windows based on the reliabilities of neighboring motion vectors. The adaptive OBMC (AOBMC) can overcome the limitations of the conventional OBMC, such as over-smoothing and poor de-blocking. Experimental results show that the proposed algorithm provides a better image quality than conventional methods both objectively and subjectively

348 citations


Journal ArticleDOI
TL;DR: Encouraging preliminary results on real-world video sequences are presented, particularly in the realm of transmission losses, where PRISM exhibits the characteristic of rapid recovery, in contrast to contemporary codecs, which renders PRISM as an attractive candidate for wireless video applications.
Abstract: We describe PRISM, a video coding paradigm based on the principles of lossy distributed compression (also called source coding with side information or Wyner-Ziv coding) from multiuser information theory. PRISM represents a major departure from conventional video coding architectures (e.g., the MPEGx, H.26x families) that are based on motion-compensated predictive coding, with the goal of addressing some of their architectural limitations. PRISM allows for two key architectural enhancements: (1) inbuilt robustness to "drift" between encoder and decoder and (2) the feasibility of a flexible distribution of computational complexity between encoder and decoder. Specifically, PRISM enables transfer of the computationally expensive video encoder motion-search module to the video decoder. Based on this capability, we consider an instance of PRISM corresponding to a near reversal in codec complexities with respect to today's codecs (leading to a novel light encoder and heavy decoder paradigm), in this paper. We present encouraging preliminary results on real-world video sequences, particularly in the realm of transmission losses, where PRISM exhibits the characteristic of rapid recovery, in contrast to contemporary codecs. This renders PRISM as an attractive candidate for wireless video applications.

338 citations


Journal ArticleDOI
TL;DR: The findings show that the results have higher precision after the m-D extraction rather than before it, since only the vibrational/rotational components are employed.
Abstract: This paper highlights the extraction of micro-Doppler (m-D) features from radar signal returns of helicopter and human targets using the wavelet transform method incorporated with time-frequency analysis. In order for the extraction of m-D features to be realised, the time domain radar signal is decomposed into a set of components that are represented at different wavelet scales. The components are then reconstructed by applying the inverse wavelet transform. After the separation of m-D features from the target's original radar return, time-frequency analysis is then used to estimate the target's motion parameters. The autocorrelation of the time sequence data is also used to measure motion parameters such as the vibration/rotation rate. The findings show that the results have higher precision after the m-D extraction rather than before it, since only the vibrational/rotational components are employed. This proposed method of m-D extraction has been successfully applied to helicopter and human data.

Journal ArticleDOI
TL;DR: A new variational method for multi-view stereovision and non-rigid three-dimensional motion estimation from multiple video sequences that minimizes the prediction error of the shape and motion estimates and results in a simpler, more flexible, and more efficient implementation than in existing methods.
Abstract: We present a new variational method for multi-view stereovision and non-rigid three-dimensional motion estimation from multiple video sequences. Our method minimizes the prediction error of the shape and motion estimates. Both problems then translate into a generic image registration task. The latter is entrusted to a global measure of image similarity, chosen depending on imaging conditions and scene properties. Rather than integrating a matching measure computed independently at each surface point, our approach computes a global image-based matching score between the input images and the predicted images. The matching process fully handles projective distortion and partial occlusions. Neighborhood as well as global intensity information can be exploited to improve the robustness to appearance changes due to non-Lambertian materials and illumination changes, without any approximation of shape, motion or visibility. Moreover, our approach results in a simpler, more flexible, and more efficient implementation than in existing methods. The computation time on large datasets does not exceed thirty minutes on a standard workstation. Finally, our method is compliant with a hardware implementation with graphics processor units. Our stereovision algorithm yields very good results on a variety of datasets including specularities and translucency. We have successfully tested our motion estimation algorithm on a very challenging multi-view video sequence of a non-rigid scene.

Proceedings ArticleDOI
29 Jul 2007
TL;DR: This work represents the desired motion as an interpolation of two time-scaled paths through a motion graph and uses an anytime version of A* search to find a globally optimal solution in this graph that satisfies the user's specification.
Abstract: Many compelling applications would become feasible if novice users had the ability to synthesize high quality human motion based only on a simple sketch and a few easily specified constraints. We approach this problem by representing the desired motion as an interpolation of two time-scaled paths through a motion graph. The graph is constructed to support interpolation and pruned for efficient search. We use an anytime version of A* search to find a globally optimal solution in this graph that satisfies the user's specification. Our approach retains the natural transitions of motion graphs and the ability to synthesize physically realistic variations provided by interpolation. We demonstrate the power of this approach by synthesizing optimal or near optimal motions that include a variety of behaviors in a single motion.

Journal ArticleDOI
TL;DR: This paper presents a joint formulation for a complex super-resolution problem in which the scenes contain multiple independently moving objects, built upon the maximum a posteriori (MAP) framework, which judiciously combines motion estimation, segmentation, and super resolution together.
Abstract: Super resolution image reconstruction allows the recovery of a high-resolution (HR) image from several low-resolution images that are noisy, blurred, and down sampled. In this paper, we present a joint formulation for a complex super-resolution problem in which the scenes contain multiple independently moving objects. This formulation is built upon the maximum a posteriori (MAP) framework, which judiciously combines motion estimation, segmentation, and super resolution together. A cyclic coordinate descent optimization procedure is used to solve the MAP formulation, in which the motion fields, segmentation fields, and HR images are found in an alternate manner given the two others, respectively. Specifically, the gradient-based methods are employed to solve the HR image and motion fields, and an iterated conditional mode optimization method to obtain the segmentation fields. The proposed algorithm has been tested using a synthetic image sequence, the "Mobile and Calendar" sequence, and the original "Motorcycle and Car" sequence. The experiment results and error analyses verify the efficacy of this algorithm

Proceedings ArticleDOI
26 Dec 2007
TL;DR: A novel extraction method which utilises global information from each video input so that moving parts such as a moving hand can be identified and are used to select relevant interest points for a condensed representation.
Abstract: Local spatiotemporal features or interest points provide compact but descriptive representations for efficient video analysis and motion recognition. Current local feature extraction approaches involve either local filtering or entropy computation which ignore global information (e.g. large blobs of moving pixels) in video inputs. This paper presents a novel extraction method which utilises global information from each video input so that moving parts such as a moving hand can be identified and are used to select relevant interest points for a condensed representation. The proposed method involves obtaining a small set of subspace images, which can synthesise frames in the video input from their corresponding coefficient vectors, and then detecting interest points from the subspaces and the coefficient vectors. Experimental results indicate that the proposed method can yield a sparser set of interest points for motion recognition than existing methods.

Proceedings ArticleDOI
30 Apr 2007
TL;DR: This paper presents an example-based motion synthesis technique that generates continuous streams of high-fidelity, controllable motion for interactive applications, such as video games, through a new data structure called a parametric motion graph.
Abstract: In this paper, we present an example-based motion synthesis technique that generates continuous streams of high-fidelity, controllable motion for interactive applications, such as video games. Our method uses a new data structure called a parametric motion graph to describe valid ways of generating linear blend transitions between motion clips dynamically generated through parametric synthesis in realtime. Our system specifically uses blending-based parametric synthesis to accurately generate any motion clip from an entire space of motions by blending together examples from that space. The key to our technique is using sampling methods to identify and represent good transitions between these spaces of motion parameterized by a continuously valued parameter. This approach allows parametric motion graphs to be constructed with little user effort. Because parametric motion graphs organize all motions of a particular type, such as reaching to different locations on a shelf, using a single, parameterized graph node, they are highly structured, facilitating fast decision-making for interactive character control. We have successfully created interactive characters that perform sequences of requested actions, such as cartwheeling or punching.

Proceedings ArticleDOI
10 Sep 2007
TL;DR: This paper presents a video stabilization algorithm based on the extraction and tracking of scale invariant feature transform features through video frames that confirms the effectiveness of this feature-based motion estimation algorithm.
Abstract: This paper presents a video stabilization algorithm based on the extraction and tracking of scale invariant feature transform features through video frames. Implementation of SIFT operator is analyzed and adapted to be used in a feature-based motion estimation algorithm. SIFT features are extracted from video frames and then their trajectory is evaluated to estimate interframe motion. A modified version of iterative least squares method is adopted to avoid estimation errors and features are tracked as they appear in nearby frames to improve video stability. Intentional camera motion is eventually filtered with adaptive motion vector integration. Results confirm the effectiveness of the method.

Journal ArticleDOI
TL;DR: This paper uses the fixed-point iteration method and preconditioning techniques to efficiently solve the associated nonlinear Euler-Lagrange equations of the corresponding variational problem in SR.
Abstract: Super-resolution (SR) reconstruction technique is capable of producing a high-resolution image from a sequence of low-resolution images. In this paper, we study an efficient SR algorithm for digital video. To effectively deal with the intractable problems in SR video reconstruction, such as inevitable motion estimation errors, noise, blurring, missing regions, and compression artifacts, the total variation (TV) regularization is employed in the reconstruction model. We use the fixed-point iteration method and preconditioning techniques to efficiently solve the associated nonlinear Euler-Lagrange equations of the corresponding variational problem in SR. The proposed algorithm has been tested in several cases of motion and degradation. It is also compared with the Laplacian regularization-based SR algorithm and other TV-based SR algorithms. Experimental results are presented to illustrate the effectiveness of the proposed algorithm.

Journal ArticleDOI
TL;DR: This is the first attempt to implement an approximate particle filtering algorithm in the geometric active contour framework that can be used for tracking moving and deforming objects on a (theoretically) infinite dimensional state space.
Abstract: Tracking deforming objects involves estimating the global motion of the object and its local deformations as a function of time. Tracking algorithms using Kalman filters or particle filters have been proposed for finite dimensional representations of shape, but these are dependent on the chosen parametrization and cannot handle changes in curve topology. Geometric active contours provide a framework which is parametrization independent and allow for changes in topology, in the present work, we formulate a particle filtering algorithm in the geometric active contour framework that can be used for tracking moving and deforming objects. To the best of our knowledge, this is the first attempt to implement an approximate particle filtering algorithm for tracking on a (theoretically) infinite dimensional state space.

Journal ArticleDOI
TL;DR: A new method to animate photos of 2D characters using 3D motion capture data that correctly handles projective shape distortion and works for images from arbitrary views and requires only a small amount of user interaction.
Abstract: This article presents a new method to animate photos of 2D characters using 3D motion capture data. Given a single image of a person or essentially human-like subject, our method transfers the motion of a 3D skeleton onto the subject's 2D shape in image space, generating the impression of a realistic movement. We present robust solutions to reconstruct a projective camera model and a 3D model pose which matches best to the given 2D image. Depending on the reconstructed view, a 2D shape template is selected which enables the proper handling of occlusions. After fitting the template to the character in the input image, it is deformed as-rigid-as-possible by taking the projected 3D motion data into account. Unlike previous work, our method thereby correctly handles projective shape distortion. It works for images from arbitrary views and requires only a small amount of user interaction. We present animations of a diverse set of human (and nonhuman) characters with different types of motions, such as walking, jumping, or dancing.

Journal ArticleDOI
TL;DR: A novel approach is proposed for the ground moving target imaging and motion parameter estimation using single channel SAR using second-order generalised keystone formatting method and Doppler parameters of moving targets obtained via spectral analysis.
Abstract: In recent years, ground moving target imaging in synthetic aperture radar (SAR) has attracted the attention of many researchers all over the world. A novel approach is proposed for the ground moving target imaging and motion parameter estimation using single channel SAR. First, a second-order generalised keystone formatting method is used to compensate for the range curvature. Secondly, the estimated slope of the target echo's envelope is used for the range walk compensation. Thirdly, Doppler parameters of moving targets obtained via spectral analysis are used for the imaging and positioning of ground moving targets. Finally, motion parameters of moving targets can be estimated on the basis of the relationship between Doppler and motion parameters. Both numerical and experimental results are provided to demonstrate the performance of the proposed approach.

Journal ArticleDOI
TL;DR: A systematic experimental evaluation on a large video database with human actions demonstrates that local spatio-temporal image descriptors can be defined to carry important information of space-time events for subsequent recognition and that local velocity adaptation is an important mechanism in situations when the relative motion between the camera and the interesting events in the scene is unknown.

Journal ArticleDOI
TL;DR: This work develops a completely automatic system that works in two stages; it first builds a model of appearance of each person in a video and then it tracks by detecting those models in each frame ("tracking by model-building and detection").
Abstract: An open vision problem is to automatically track the articulations of people from a video sequence. This problem is difficult because one needs to determine both the number of people in each frame and estimate their configurations. But, finding people and localizing their limbs is hard because people can move fast and unpredictably, can appear in a variety of poses and clothes, and are often surrounded by limb-like clutter. We develop a completely automatic system that works in two stages; it first builds a model of appearance of each person in a video and then it tracks by detecting those models in each frame ("tracking by model-building and detection"). We develop two algorithms that build models; one bottom-up approach groups together candidate body parts found throughout a sequence. We also describe a top-down approach that automatically builds people-models by detecting convenient key poses within a sequence. We finally show that building a discriminative model of appearance is quite helpful since it exploits structure in a background (without background-subtraction). We demonstrate the resulting tracker on hundreds of thousands of frames of unscripted indoor and outdoor activity, a feature-length film ("Run Lola Run"), and legacy sports footage (from the 2002 World Series and 1998 Winter Olympics). Experiments suggest that our system 1) can count distinct individuals, 2) can identify and track them, 3) can recover when it loses track, for example, if individuals are occluded or briefly leave the view, 4) can identify body configuration accurately, and 5) is not dependent on particular models of human motion

Journal ArticleDOI
01 Feb 2007
TL;DR: A new method based on a simple recursive non linear operator, the @[email protected] filter, which is used along with a spatiotemporal regularization algorithm to deal with complex scenes containing a wide range of motion models with very different time constants.
Abstract: Motion detection using a stationary camera can be done by estimating the static scene (background). In that purpose, we propose a new method based on a simple recursive non linear operator, the @[email protected] filter. Used along with a spatiotemporal regularization algorithm, it allows robust, computationally efficient and accurate motion detection. To deal with complex scenes containing a wide range of motion models with very different time constants, we propose a generalization of the basic model to multiple @[email protected] estimation.

Proceedings ArticleDOI
17 Jun 2007
TL;DR: The necessary steps to compute the Martin distance between kernel dynamic textures are derived, and the resulting kernel dynamic texture is capable of modeling a wider range of video motion, such as chaotic motion or camera motion.
Abstract: The dynamic texture is a stochastic video model that treats the video as a sample from a linear dynamical system. The simple model has been shown to be surprisingly useful in domains such as video synthesis, video segmentation, and video classification. However, one major disadvantage of the dynamic texture is that it can only model video where the motion is smooth, i.e. video textures where the pixel values change smoothly. In this work, we propose an extension of the dynamic texture to address this issue. Instead of learning a linear observation function with PCA, we learn a non-linear observation function using kernel-PCA. The resulting kernel dynamic texture is capable of modeling a wider range of video motion, such as chaotic motion (e.g. turbulent water) or camera motion (e.g. panning). We derive the necessary steps to compute the Martin distance between kernel dynamic textures, and then validate the new model through classification experiments on video containing camera motion.

Patent
21 May 2007
TL;DR: One or more techniques are provided for adapting a reconstruction process to account for the motion of an imaged object or organ, such as the heart as discussed by the authors, which is acquired using a slowly rotating CT gantry.
Abstract: One or more techniques are provided for adapting a reconstruction process to account for the motion of an imaged object or organ, such as the heart. In particular, projection data of the moving object or organ is acquired using a slowly rotating CT gantry. Motion data may be determined from the projection data or from images reconstructed from the projection data. The motion data may be used to reconstruct motion-corrected images from the projection data. The motion-corrected images may be associated to form motion-corrected volume renderings.

Journal ArticleDOI
TL;DR: A behavior-based similarity measure is introduced that tells us whether two different space-time intensity patterns of two different video segments could have resulted from a similar underlying motion field, thus allowing to correlate dynamic behaviors and actions.
Abstract: We introduce a behavior-based similarity measure that tells us whether two different space-time intensity patterns of two different video segments could have resulted from a similar underlying motion field. This is done directly from the intensity information, without explicitly computing the underlying motions. Such a measure allows us to detect similarity between video segments of differently dressed people performing the same type of activity. It requires no foreground/background segmentation, no prior learning of activities, and no motion estimation or tracking. Using this behavior-based similarity measure, we extend the notion of two-dimensional image correlation into the three-dimensional space-time volume and thus allowing to correlate dynamic behaviors and actions. Small space-time video segments (small video clips) are "correlated" against the entire video sequences in all three dimensions (x, y, and t). Peak correlation values correspond to video locations with similar dynamic behaviors. Our approach can detect very complex behaviors in video sequences (for example, ballet movements, pool dives, and running water), even when multiple complex activities occur simultaneously within the field of view of the camera. We further show its robustness to small changes in scale and orientation of the correlated behavior.

Proceedings ArticleDOI
13 Jun 2007
TL;DR: In this article, the authors proposed a control method for lower-limb assist that produces a virtual modification of the mechanical impedance of the human limbs by making the exoskeleton display active impedance properties.
Abstract: We propose a novel control method for lower-limb assist that produces a virtual modification of the mechanical impedance of the human limbs. This effect is accomplished by making the exoskeleton display active impedance properties. Active impedance control emphasizes control of the exoskeleton's dynamics and regulation of the transfer of energy between the exoskeleton and the user. Its goal is improving the dynamic response of the human limbs without sacrificing the user's control authority. The proposed method is an alternative to myoelectrical exoskeleton control, which is based on estimating muscle torques from electromyographical (EMG) activity. Implementation of an EMG-based controller is a complex task that involves modeling the user's musculoskeletal system and requires recalibration. In contrast, active impedance control is less dependent on estimation of the user's attempted motion, thereby avoiding conflicts resulting from inaccurate estimation. In this paper we also introduce a new form of human assist based on improving the kinematic response of the limbs. Reduction of average muscle torques is a common goal of research in human assist. However, less emphasis has been placed so far on improving the user's agility of motion. We aim to use active impedance control to attain such effects as increasing the user's average speed of motion, and improving their acceleration capabilities in order to compensate for perturbations from the environment.

Journal ArticleDOI
TL;DR: A novel space-time patch-based method based on the local analysis of the bias-variance trade-off that can be combined with motion estimation to cope with very large displacements due to camera motion.
Abstract: We present a novel space-time patch-based method for image sequence restoration. We propose an adaptive statistical estimation framework based on the local analysis of the bias-variance trade-off. At each pixel, the space-time neighborhood is adapted to improve the performance of the proposed patch-based estimator. The proposed method is unsupervised and requires no motion estimation. Nevertheless, it can also be combined with motion estimation to cope with very large displacements due to camera motion. Experiments show that this method is able to drastically improve the quality of highly corrupted image sequences. Quantitative evaluations on standard artificially noise-corrupted image sequences demonstrate that our method outperforms other recent competitive methods. We also report convincing results on real noisy image sequences

Proceedings ArticleDOI
17 Jun 2007
TL;DR: A novel algorithm to jointly capture the motion and the dynamic shape of humans from multiple video streams without using optical markers using a deformable high-quality mesh of a human as scene representation and an image-based 3D correspondence estimation algorithm and a fast Laplacian mesh deformation scheme.
Abstract: We present a novel algorithm to jointly capture the motion and the dynamic shape of humans from multiple video streams without using optical markers. Instead of relying on kinematic skeletons, as traditional motion capture methods, our approach uses a deformable high-quality mesh of a human as scene representation. It jointly uses an image-based 3D correspondence estimation algorithm and a fast Laplacian mesh deformation scheme to capture both motion and surface deformation of the actor from the input video footage. As opposed to many related methods, our algorithm can track people wearing wide apparel, it can straightforwardly be applied to any type of subject, e.g. animals, and it preserves the connectivity of the mesh over time. We demonstrate the performance of our approach using synthetic and captured real-world video sequences and validate its accuracy by comparison to the ground truth.