Showing papers on "Motion estimation published in 2010"

PDF

Open Access

Proceedings Article•DOI•

Monocular 3D pose estimation and tracking by detection

[...]

Mykhaylo Andriluka¹, Stefan Roth¹, Bernt Schiele¹•Institutions (1)

12 Jun 2010

TL;DR: In this paper, a three-stage process is proposed to recover 3D human pose from monocular image sequences in real-world scenarios, such as crowded street scenes, based on tracking-by-detection.

...read moreread less

Abstract: Automatic recovery of 3D human pose from monocular image sequences is a challenging and important research topic with numerous applications. Although current methods are able to recover 3D pose for a single person in controlled environments, they are severely challenged by real-world scenarios, such as crowded street scenes. To address this problem, we propose a three-stage process building on a number of recent advances. The first stage obtains an initial estimate of the 2D articulation and viewpoint of the person from single frames. The second stage allows early data association across frames based on tracking-by-detection. These two stages successfully accumulate the available 2D image evidence into robust estimates of 2D limb positions over short image sequences (= tracklets). The third and final stage uses those tracklet-based estimates as robust image observations to reliably recover 3D pose. We demonstrate state-of-the-art performance on the HumanEva II benchmark, and also show the applicability of our approach to articulated 3D tracking in realistic street conditions.

...read moreread less

632 citations

Journal Article•DOI•

Motion detail preserving optical flow estimation

[...]

Li Xu¹, Jiaya Jia¹, Yasuyuki Matsushita²•Institutions (2)

The Chinese University of Hong Kong¹, Microsoft²

13 Jun 2010

TL;DR: In this article, a novel optical flow estimation method is proposed, which reduces the reliance of the flow estimates on their initial values propagated from the coarser level and enables recovering many motion details in each scale.

...read moreread less

Abstract: We discuss the cause of a severe optical flow estimation problem that fine motion structures cannot always be correctly reconstructed in the commonly employed multi-scale variational framework. Our major finding is that significant and abrupt displacement transition wrecks small-scale motion structures in the coarse-to-fine refinement. A novel optical flow estimation method is proposed in this paper to address this issue, which reduces the reliance of the flow estimates on their initial values propagated from the coarser level and enables recovering many motion details in each scale. The contribution of this paper also includes adaption of the objective function and development of a new optimization procedure. The effectiveness of our method is borne out by experiments for both large- and small-displacement optical flow estimation.

...read moreread less

559 citations

Proceedings Article•DOI•

Visual odometry based on stereo image sequences with RANSAC-based outlier rejection scheme

[...]

Bernd Kitt¹, Andreas Geiger¹, Henning Lategahn¹•Institutions (1)

Karlsruhe Institute of Technology¹

21 Jun 2010

TL;DR: This paper proposes a novel approach for estimating the egomotion of the vehicle from a sequence of stereo images which is directly based on the trifocal geometry between image triples, thus no time expensive recovery of the 3-dimensional scene structure is needed.

...read moreread less

Abstract: A common prerequisite for many vision-based driver assistance systems is the knowledge of the vehicle's own movement. In this paper we propose a novel approach for estimating the egomotion of the vehicle from a sequence of stereo images. Our method is directly based on the trifocal geometry between image triples, thus no time expensive recovery of the 3-dimensional scene structure is needed. The only assumption we make is a known camera geometry, where the calibration may also vary over time. We employ an Iterated Sigma Point Kalman Filter in combination with a RANSAC-based outlier rejection scheme which yields robust frame-to-frame motion estimation even in dynamic environments. A high-accuracy inertial navigation system is used to evaluate our results on challenging real-world video sequences. Experiments show that our approach is clearly superior compared to other filtering techniques in terms of both, accuracy and run-time.

...read moreread less

456 citations

Book Chapter•DOI•

Large-Scale Visual Odometry for Rough Terrain

[...]

Kurt Konolige¹, Motilal Agrawal¹, Joan Sola¹•Institutions (1)

SRI International¹

01 Jan 2010

TL;DR: Using data with ground truth from an RTK GPS system, it is shown experimentally that the algorithms can track motion, in off-road terrain, over distances of 10 km, with an error of less than 10 m.

...read moreread less

Abstract: Motion estimation from stereo imagery, sometimes called visual odometry, is a well-known process. However, it is difficult to achieve good performance using standard techniques. We present the results of several years of work on an integrated system to localize a mobile robot in rough outdoor terrain using visual odometry, with an increasing degree of precision. We discuss issues that are important for real-time, high-precision performance: choice of features, matching strategies, incremental bundle adjustment, and filtering with inertial measurement sensors. Using data with ground truth from an RTK GPS system, we show experimentally that our algorithms can track motion, in off-road terrain, over distances of 10 km, with an error of less than 10 m (0.1%).

...read moreread less

413 citations

Proceedings Article•DOI•

Real time motion capture using a single time-of-flight camera

[...]

Varun Ganapathi¹, Christian Plagemann¹, Daphne Koller¹, Sebastian Thrun¹•Institutions (1)

Stanford University¹

13 Jun 2010

TL;DR: This paper derives an efficient filtering algorithm for tracking human pose using a stream of monocular depth images and describes a novel algorithm for propagating noisy evidence about body part locations up the kinematic chain using the un-scented transform.

...read moreread less

Abstract: Markerless tracking of human pose is a hard yet relevant problem. In this paper, we derive an efficient filtering algorithm for tracking human pose using a stream of monocular depth images. The key idea is to combine an accurate generative model — which is achievable in this setting using programmable graphics hardware — with a discriminative model that provides data-driven evidence about body part locations. In each filter iteration, we apply a form of local model-based search that exploits the nature of the kinematic chain. As fast movements and occlusion can disrupt the local search, we utilize a set of discriminatively trained patch classifiers to detect body parts. We describe a novel algorithm for propagating this noisy evidence about body part locations up the kinematic chain using the un-scented transform. The resulting distribution of body configurations allows us to reinitialize the model-based search. We provide extensive experimental results on 28 real-world sequences using automatic ground-truth annotations from a commercial motion capture system.

...read moreread less

406 citations

Book Chapter•DOI•

Single image deblurring using motion density functions

[...]

Ankit Gupta¹, Neel Joshi², C. Lawrence Zitnick², Michael F. Cohen², Brian Curless¹ - Show less +1 more•Institutions (2)

University of Washington¹, Microsoft²

05 Sep 2010

TL;DR: A novel single image deblurring method to estimate spatially non-uniform blur that results from camera shake that out-performs current approaches which make the assumption of spatially invariant blur.

...read moreread less

Abstract: We present a novel single image deblurring method to estimate spatially non-uniform blur that results from camera shake. We use existing spatially invariant deconvolution methods in a local and robust way to compute initial estimates of the latent image. The camera motion is represented as a Motion Density Function (MDF) which records the fraction of time spent in each discretized portion of the space of all possible camera poses. Spatially varying blur kernels are derived directly from the MDF. We show that 6D camera motion is well approximated by 3 degrees of motion (in-plane translation and rotation) and analyze the scope of this approximation. We present results on both synthetic and captured data. Our system out-performs current approaches which make the assumption of spatially invariant blur.

...read moreread less

388 citations

Journal Article•DOI•

Motion Segmentation in the Presence of Outlying, Incomplete, or Corrupted Trajectories

[...]

Shankar R. Rao¹, Roberto Tron², René Vidal², Yi Ma³•Institutions (3)

HRL Laboratories¹, Johns Hopkins University², University of Illinois at Urbana–Champaign³

01 Oct 2010-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A robust subspace separation scheme is developed that deals with practical issues in a unified mathematical framework and gives surprisingly good performance in the presence of the three types of pathological trajectories mentioned above.

...read moreread less

Abstract: In this paper, we study the problem of segmenting tracked feature point trajectories of multiple moving objects in an image sequence. Using the affine camera model, this problem can be cast as the problem of segmenting samples drawn from multiple linear subspaces. In practice, due to limitations of the tracker, occlusions, and the presence of nonrigid objects in the scene, the obtained motion trajectories may contain grossly mistracked features, missing entries, or corrupted entries. In this paper, we develop a robust subspace separation scheme that deals with these practical issues in a unified mathematical framework. Our methods draw strong connections between lossy compression, rank minimization, and sparse representation. We test our methods extensively on the Hopkins155 motion segmentation database and other motion sequences with outliers and missing data. We compare the performance of our methods to state-of-the-art motion segmentation methods based on expectation-maximization and spectral clustering. For data without outliers or missing information, the results of our methods are on par with the state-of-the-art results and, in many cases, exceed them. In addition, our methods give surprisingly good performance in the presence of the three types of pathological trajectories mentioned above. All code and results are publicly available at http://perception.csl.uiuc.edu/coding/motion/.

...read moreread less

348 citations

Journal Article•DOI•

Human Tracking Using Convolutional Neural Networks

[...]

Jialue Fan¹, Wei Xu, Ying Wu¹, Yihong Gong•Institutions (1)

Northwestern University¹

01 Oct 2010-IEEE Transactions on Neural Networks

TL;DR: This paper treats tracking as a learning problem of estimating the location and the scale of an object given its previous location, scale, as well as current and previous image frames, and introduces multiple path ways in CNN to better fuse local and global information.

...read moreread less

Abstract: In this paper, we treat tracking as a learning problem of estimating the location and the scale of an object given its previous location, scale, as well as current and previous image frames. Given a set of examples, we train convolutional neural networks (CNNs) to perform the above estimation task. Different from other learning methods, the CNNs learn both spatial and temporal features jointly from image pairs of two adjacent frames. We introduce multiple path ways in CNN to better fuse local and global information. A creative shift-variant CNN architecture is designed so as to alleviate the drift problem when the distracting objects are similar to the target in cluttered environment. Furthermore, we employ CNNs to estimate the scale through the accurate localization of some key points. These techniques are object-independent so that the proposed method can be applied to track other types of object. The capability of the tracker of handling complex situations is demonstrated in many testing sequences.

...read moreread less

346 citations

Journal Article•DOI•

A fast normalized cross-correlation calculation method for motion estimation

[...]

Jianwen Luo¹, Elisa E. Konofagou¹•Institutions (1)

Columbia University¹

07 Jun 2010-IEEE Transactions on Ultrasonics Ferroelectrics and Frequency Control

TL;DR: High quality, high spatial resolution, and high calculation speed can be all simultaneously obtained using the proposed methodology, and could prove very useful and flexible for real-time motion estimation as well as in other fields such as optical flow and image registration.

...read moreread less

Abstract: High-precision motion estimation has become essential in ultrasound-based techniques such as time-domain Doppler and elastography. Normalized cross-correlation (NCC) has been shown as one of the best motion estimators. However, a significant drawback is its associated computational cost, especially when RF signals are used. In this paper, a method based on sum tables developed elsewhere is adapted for fast NCC calculation in ultrasound-based motion estimation, and is tested with respect to the speed enhancement of the specific application of ultrasound-based motion estimation. Both the numerator and denominator in the NCC definition are obtained through pre-calculated sum tables to eliminate redundancy of repeated NCC calculations. Unlike a previously reported method, a search region following the principle of motion estimation is applied in the construction of sum tables. Because an exhaustive search and high window overlap are typically used for highest quality imaging, the computational cost of the proposed method is significantly lower than that of the direct method using the NCC definition, without increasing bias and variance characteristics of the motion estimation or sacrificing the spatial resolution. Therefore, high quality, high spatial resolution, and high calculation speed can be all simultaneously obtained using the proposed methodology. The high efficiency of this method was verified using RF signals from a human abdominal aorta in vivo. For the parameters typically used, a real-time, very high frame rate of 310 frames/s was achieved for the motion estimation. The proposed method was also extended to 2-D NCC motion estimation and motion estimation with other algorithms. The technique could thus prove very useful and flexible for real-time motion estimation as well as in other fields such as optical flow and image registration.

...read moreread less

325 citations

Journal Article•DOI•

Patch-Based Nonlocal Functional for Denoising Fluorescence Microscopy Image Sequences

[...]

Jérôme Boulanger¹, Charles Kervrann², Patrick Bouthemy³, Peter Elbau, Jean-Baptiste Sibarita¹, Jean Salamero¹ - Show less +2 more•Institutions (3)

Centre national de la recherche scientifique¹, Institut national de la recherche agronomique², French Institute for Research in Computer Science and Automation³

03 Feb 2010-IEEE Transactions on Medical Imaging

TL;DR: A nonparametric regression method for denoising 3-D image sequences acquired via fluorescence microscopy and an original statistical patch-based framework for noise reduction and preservation of space-time discontinuities are presented.

...read moreread less

Abstract: We present a nonparametric regression method for denoising 3-D image sequences acquired via fluorescence microscopy. The proposed method exploits the redundancy of the 3-D+time information to improve the signal-to-noise ratio of images corrupted by Poisson-Gaussian noise. A variance stabilization transform is first applied to the image-data to remove the dependence between the mean and variance of intensity values. This preprocessing requires the knowledge of parameters related to the acquisition system, also estimated in our approach. In a second step, we propose an original statistical patch-based framework for noise reduction and preservation of space-time discontinuities. In our study, discontinuities are related to small moving spots with high velocity observed in fluorescence video-microscopy. The idea is to minimize an objective nonlocal energy functional involving spatio-temporal image patches. The minimizer has a simple form and is defined as the weighted average of input data taken in spatially-varying neighborhoods. The size of each neighborhood is optimized to improve the performance of the pointwise estimator. The performance of the algorithm (which requires no motion estimation) is then evaluated on both synthetic and real image sequences using qualitative and quantitative criteria.

...read moreread less

299 citations

Journal Article•DOI•

Head Pose Estimation and Augmented Reality Tracking: An Integrated System and Evaluation for Monitoring Driver Awareness

[...]

Erik Murphy-Chutorian¹, Mohan M. Trivedi¹•Institutions (1)

University of California, San Diego¹

01 Jun 2010-IEEE Transactions on Intelligent Transportation Systems

TL;DR: A new procedure for static head-pose estimation and a new algorithm for visual 3-D tracking are presented and integrated into the novel real-time system for measuring the position and orientation of a driver's head.

...read moreread less

Abstract: Driver distraction and inattention are prominent causes of automotive collisions. To enable driver-assistance systems to address these problems, we require new sensing approaches to infer a driver's focus of attention. In this paper, we present a new procedure for static head-pose estimation and a new algorithm for visual 3-D tracking. They are integrated into the novel real-time (30 fps) system for measuring the position and orientation of a driver's head. This system consists of three interconnected modules that detect the driver's head, provide initial estimates of the head's pose, and continuously track its position and orientation in six degrees of freedom. The head-detection module consists of an array of Haar-wavelet Adaboost cascades. The initial pose estimation module employs localized gradient orientation (LGO) histograms as input to support vector regressors (SVRs). The tracking module provides a fine estimate of the 3-D motion of the head using a new appearance-based particle filter for 3-D model tracking in an augmented reality environment. We describe our implementation that utilizes OpenGL-optimized graphics hardware to efficiently compute particle samples in real time. To demonstrate the suitability of this system for real driving situations, we provide a comprehensive evaluation with drivers of varying ages, race, and sex spanning daytime and nighttime conditions. To quantitatively measure the accuracy of system, we compare its estimation results to a marker-based cinematic motion-capture system installed in the automotive testbed.

...read moreread less

Journal Article•DOI•

Cooperative Filters and Control for Cooperative Exploration

[...]

Fumin Zhang¹, Naomi Ehrich Leonard²•Institutions (2)

Georgia Institute of Technology¹, Princeton University²

26 Jan 2010-IEEE Transactions on Automatic Control

TL;DR: This work develops strategies for multiple sensor platforms to explore a noisy scalar field in the plane using provably convergent cooperative Kalman filters that apply to general cooperative exploration missions and presents a novel method to determine the shape of the platform formation to minimize error in the estimates.

...read moreread less

Abstract: Autonomous mobile sensor networks are employed to measure large-scale environmental fields. Yet an optimal strategy for mission design addressing both the cooperative motion control and the cooperative sensing is still an open problem. We develop strategies for multiple sensor platforms to explore a noisy scalar field in the plane. Our method consists of three parts. First, we design provably convergent cooperative Kalman filters that apply to general cooperative exploration missions. Second, we present a novel method to determine the shape of the platform formation to minimize error in the estimates and design a cooperative formation control law to asymptotically achieve the optimal formation shape. Third, we use the cooperative filter estimates in a provably convergent motion control law that drives the center of the platform formation to move along level curves of the field. This control law can be replaced by control laws enabling other cooperative exploration motion, such as gradient climbing, without changing the cooperative filters and the cooperative formation control laws. Performance is demonstrated on simulated underwater platforms in simulated ocean fields.

...read moreread less

Real-Time Monocular Visual Odometry for On-Road Vehicles with 1-Point RANSAC.

[...]

Davide Scaramuzza¹, Friedrich Fraundorfer², Roland Siegwart²•Institutions (2)

Institute of Robotics and Intelligent Systems¹, ETH Zurich²

01 Jan 2010

TL;DR: In this paper, a 1-point RANSAC algorithm was proposed to estimate the absolute scale of a moving object with only a single feature correspondence using nonholonomic constraints of wheeled vehicles (e.g., cars, bikes, mobile robots).

...read moreread less

Abstract: The first biggest problem in visual motion estimation is data association; matched points contain many outliers that must be detected and removed for the motion to be accurately estimated In the last few years, a very established method for removing outliers has been the "5-point RANSAC" algorithm which needs a minimum of 5 point correspondences to estimate the model hypotheses Because of this, however, it can require up to thousand iterations to find a set of points free of outliers In this talk, I will show that by exploiting the non-holonomic constraints of wheeled vehicles (eg cars, bikes, mobile robots) it is possible to use a restrictive motion model which allows us to parameterize the motion with only 1 point correspondence Using a single feature correspondence for motion estimation is the lowest model parameterization possible and results in the most efficient algorithm for removing outliers: 1-point RANSAC The second problem in monocular visual odometry is the estimation of the absolute scale I will show that vehicle non-holonomic constraints make it also possible to estimate the absolute scale completely automatically whenever the vehicle turns In this talk, I will give a mathematical derivation and provide experimental results on both simulated and real data over a large image dataset collected during a 25 Km path

...read moreread less

Journal Article•DOI•

A Dynamic Texture-Based Approach to Recognition of Facial Actions and Their Temporal Models

[...]

Sander Koelstra¹, Maja Pantic², Ioannis Patras¹•Institutions (2)

Queen Mary University of London¹, Imperial College London²

01 Nov 2010-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A dynamic texture-based approach to the recognition of facial Action Units (AUs, atomic facial gestures) and their temporal models (i.e., sequences of temporal segments: neutral, onset, apex, and offset) in near-frontal-view face videos is proposed.

...read moreread less

Abstract: In this work, we propose a dynamic texture-based approach to the recognition of facial Action Units (AUs, atomic facial gestures) and their temporal models (i.e., sequences of temporal segments: neutral, onset, apex, and offset) in near-frontal-view face videos. Two approaches to modeling the dynamics and the appearance in the face region of an input video are compared: an extended version of Motion History Images and a novel method based on Nonrigid Registration using Free-Form Deformations (FFDs). The extracted motion representation is used to derive motion orientation histogram descriptors in both the spatial and temporal domain. Per AU, a combination of discriminative, frame-based GentleBoost ensemble learners and dynamic, generative Hidden Markov Models detects the presence of the AU in question and its temporal segments in an input image sequence. When tested for recognition of all 27 lower and upper face AUs, occurring alone or in combination in 264 sequences from the MMI facial expression database, the proposed method achieved an average event recognition accuracy of 89.2 percent for the MHI method and 94.3 percent for the FFD method. The generalization performance of the FFD method has been tested using the Cohn-Kanade database. Finally, we also explored the performance on spontaneous expressions in the Sensitive Artificial Listener data set.

...read moreread less

Proceedings Article•DOI•

Motion estimation with non-local total variation regularization

[...]

Manuel Werlberger, Thomas Pock, Horst Bischof

13 Jun 2010

TL;DR: A new motion estimation algorithm is based on non-local total variation regularization which allows to integrate the low level image segmentation process in a unified variational framework and demonstrates that it can cope with the aforementioned problems.

...read moreread less

Abstract: State-of-the-art motion estimation algorithms suffer from three major problems: Poorly textured regions, occlusions and small scale image structures. Based on the Gestalt principles of grouping we propose to incorporate a low level image segmentation process in order to tackle these problems. Our new motion estimation algorithm is based on non-local total variation regularization which allows us to integrate the low level image segmentation process in a unified variational framework. Numerical results on the Mid-dlebury optical flow benchmark data set demonstrate that we can cope with the aforementioned problems.

...read moreread less

Journal Article•DOI•

Performance evaluation of parallel manipulators: Motion/force transmissibility and its index

[...]

Jinsong Wang¹, Chao Wu¹, Xin-Jun Liu¹•Institutions (1)

Tsinghua University¹

01 Oct 2010-Mechanism and Machine Theory

TL;DR: In this paper, a generalized transmission index that can evaluate the motion/force transmissibility of fully parallel manipulators is proposed based on the virtual coefficient and is relative to singularity.

...read moreread less

Journal Article•DOI•

Advances in View-Invariant Human Motion Analysis: A Review

[...]

Xiaofei Ji¹, Honghai Liu¹•Institutions (1)

University of Portsmouth¹

01 Jan 2010

TL;DR: Recent development in three major issues involved in a general human motion analysis system, namely, human detection, view-invariant pose representation and estimation, and behavior understanding are presented.

...read moreread less

Abstract: As viewpoint issue is becoming a bottleneck for human motion analysis and its application, in recent years, researchers have been devoted to view-invariant human motion analysis and have achieved inspiring progress. The challenge here is to find a methodology that can recognize human motion patterns to reach increasingly sophisticated levels of human behavior description. This paper provides a comprehensive survey of this significant research with the emphasis on view-invariant representation, and recognition of poses and actions. In order to help readers understand the integrated process of visual analysis of human motion, this paper presents recent development in three major issues involved in a general human motion analysis system, namely, human detection, view-invariant pose representation and estimation, and behavior understanding. Public available standard datasets are recommended. The concluding discussion assesses the progress so far, and outlines some research challenges and future directions, and solution to what is essential to achieve the goals of human motion analysis.

...read moreread less

Journal Article•DOI•

Patient-Specific Modeling and Quantification of the Aortic and Mitral Valves From 4-D Cardiac CT and TEE

[...]

Razvan Ioan Ionasec¹, Ingmar Voigt¹, Bogdan Georgescu¹, Yang Wang¹, Helene Houle², Fernando Vega-Higuera², Nassir Navab, Dorin Comaniciu¹ - Show less +4 more•Institutions (2)

Princeton University¹, Siemens²

03 May 2010-IEEE Transactions on Medical Imaging

TL;DR: This is the first time a patient-specific model of the aortic and mitral valves is automatically estimated from volumetric sequences, which operates on cardiac computed tomography (CT) and transesophageal echocardiogram (TEE) data.

...read moreread less

Abstract: As decisions in cardiology increasingly rely on noninvasive methods, fast and precise image processing tools have become a crucial component of the analysis workflow. To the best of our knowledge, we propose the first automatic system for patient-specific modeling and quantification of the left heart valves, which operates on cardiac computed tomography (CT) and transesophageal echocardiogram (TEE) data. Robust algorithms, based on recent advances in discriminative learning, are used to estimate patient-specific parameters from sequences of volumes covering an entire cardiac cycle. A novel physiological model of the aortic and mitral valves is introduced, which captures complex morphologic, dynamic, and pathologic variations. This holistic representation is hierarchically defined on three abstraction levels: global location and rigid motion model, nonrigid landmark motion model, and comprehensive aortic-mitral model. First we compute the rough location and cardiac motion applying marginal space learning. The rapid and complex motion of the valves, represented by anatomical landmarks, is estimated using a novel trajectory spectrum learning algorithm. The obtained landmark model guides the fitting of the full physiological valve model, which is locally refined through learned boundary detectors. Measurements efficiently computed from the aortic-mitral representation support an effective morphological and functional clinical evaluation. Extensive experiments on a heterogeneous data set, cumulated to 1516 TEE volumes from 65 4-D TEE sequences and 690 cardiac CT volumes from 69 4-D CT sequences, demonstrated a speed of 4.8 seconds per volume and average accuracy of 1.45 mm with respect to expert defined ground-truth. Additional clinical validations prove the quantification precision to be in the range of inter-user variability. To the best of our knowledge this is the first time a patient-specific model of the aortic and mitral valves is automatically estimated from volumetric sequences.

...read moreread less

Journal Article•DOI•

Markerless Motion Capture through Visual Hull, Articulated ICP and Subject Specific Model Generation

[...]

Stefano Corazza¹, Lars Mündermann¹, Emiliano Gambaretto², Giancarlo Ferrigno², Thomas P. Andriacchi³ - Show less +1 more•Institutions (3)

Stanford University¹, Polytechnic University of Milan², VA Palo Alto Healthcare System³

01 Mar 2010-International Journal of Computer Vision

TL;DR: An approach for accurately measuring human motion through Markerless Motion Capture (MMC) is presented that uses multiple color cameras and combines an accurate and anatomically consistent tracking algorithm with a method for automatically generating subject specific models.

...read moreread less

Abstract: An approach for accurately measuring human motion through Markerless Motion Capture (MMC) is presented. The method uses multiple color cameras and combines an accurate and anatomically consistent tracking algorithm with a method for automatically generating subject specific models. The tracking approach employed a Levenberg-Marquardt minimization scheme over an iterative closest point algorithm with six degrees of freedom for each body segment. Anatomical consistency was maintained by enforcing rotational and translational joint range of motion constraints for each specific joint. A subject specific model of the subjects was obtained through an automatic model generation algorithm (Corazza et al. in IEEE Trans. Biomed. Eng., 2009) which combines a space of human shapes (Anguelov et al. in Proceedings SIGGRAPH, 2005) with biomechanically consistent kinematic models and a pose-shape matching algorithm. There were 15 anatomical body segments and 14 joints, each with six degrees of freedom (13 and 12, respectively for the HumanEva II dataset). The overall method is an improvement over (Mundermann et al. in Proceedings of CVPR, 2007) in terms of both accuracy and robustness. Since the method was originally developed for ?8 cameras, the method performance was tested both (i) on the HumanEva II dataset (Sigal and Black, Technical Report CS-06-08, 2006) in a 4 camera configuration, (ii) on a series of motions including walking trials, a very challenging gymnastic motion and a dataset with motions similar to HumanEva II but with variable number of cameras.

...read moreread less

Book Chapter•DOI•

A high-quality video denoising algorithm based on reliable motion estimation

[...]

Ce Liu¹, William T. Freeman²•Institutions (2)

Microsoft¹, Massachusetts Institute of Technology²

05 Sep 2010

TL;DR: An adaptive video denosing framework that integrates robust optical flow into a nonlocal means (NLM) framework with noise level estimation and introduces approximate K-nearest neighbor matching to significantly reduce the complexity of classical NLM methods is proposed.

...read moreread less

Abstract: Although the recent advances in the sparse representations of images have achieved outstanding denosing results, removing real, structured noise in digital videos remains a challenging problem We show the utility of reliable motion estimation to establish temporal correspondence across frames in order to achieve high-quality video denoising In this paper, we propose an adaptive video denosing framework that integrates robust optical flow into a nonlocal means (NLM) framework with noise level estimation The spatial regularization in optical flow is the key to ensure temporal coherence in removing structured noise Furthermore, we introduce approximate K-nearest neighbor matching to significantly reduce the complexity of classical NLM methods Experimental results show that our system is comparable with the state of the art in removing AWGN, and significantly outperforms the state of the art in removing real, structured noise

...read moreread less

Journal Issue•DOI•

1-Point RANSAC for extended Kalman filtering: Application to real-time structure from motion and visual odometry

[...]

Javier Civera¹, Oscar G. Grasa¹, Andrew J. Davison², J. M. M. Montiel¹•Institutions (2)

University of Zaragoza¹, Imperial College London²

01 Sep 2010-Journal of Field Robotics

TL;DR: A novel combination of RANSAC plus extended Kalman filter (EKF) that uses the available prior probabilistic information from the EKF in the RANSac model hypothesize stage to allow the minimal sample size to be reduced to one, resulting in large computational savings without the loss of discriminative power.

...read moreread less

Abstract: Random sample consensus (RANSAC) has become one of the most successful techniques for robust estimation from a data set that may contain outliers. It works by constructing model hypotheses from random minimal data subsets and evaluating their validity from the support of the whole data. In this paper we present a novel combination of RANSAC plus extended Kalman filter (EKF) that uses the available prior probabilistic information from the EKF in the RANSAC model hypothesize stage. This allows the minimal sample size to be reduced to one, resulting in large computational savings without the loss of discriminative power. 1-Point RANSAC is shown to outperform both in accuracy and computational cost the joint compatibility branch and bound (JCBB) algorithm, a gold-standard technique for spurious rejection within the EKF framework. Two visual estimation scenarios are used in the experiments: first, six-degree-of-freedom (DOF) motion estimation from a monocular sequence (structure from motion). Here, a new method for benchmarking six-DOF visual estimation algorithms based on the use of high-resolution images is presented, validated, and used to show the superiority of 1-point RANSAC. Second, we demonstrate long-term robot trajectory estimation combining monocular vision and wheel odometry (visual odometry). Here, a comparison against global positioning system shows an accuracy comparable to state-of-the-art visual odometry methods. © 2010 Wiley Periodicals, Inc.

...read moreread less

Proceedings Article•DOI•

Label propagation in video sequences

[...]

Vijay Badrinarayanan¹, Fabio Galasso¹, Roberto Cipolla¹•Institutions (1)

University of Cambridge¹

13 Jun 2010

TL;DR: This paper proposes a probabilistic graphical model for the problem of propagating labels in video sequences, also termed the label propagation problem, and reports studies on a state of the art Random forest classifier based video segmentation scheme, trained using fully ground truth data and with data obtained from label propagation.

...read moreread less

Abstract: This paper proposes a probabilistic graphical model for the problem of propagating labels in video sequences, also termed the label propagation problem. Given a limited amount of hand labelled pixels, typically the start and end frames of a chunk of video, an EM based algorithm propagates labels through the rest of the frames of the video sequence. As a result, the user obtains pixelwise labelled video sequences along with the class probabilities at each pixel. Our novel algorithm provides an essential tool to reduce tedious hand labelling of video sequences, thus producing copious amounts of useable ground truth data. A novel application of this algorithm is in semi-supervised learning of discriminative classifiers for video segmentation and scene parsing. The label propagation scheme can be based on pixel-wise correspondences obtained from motion estimation, image patch based similarities as seen in epitomic models or even the more recent, semantically consistent hierarchical regions. We compare the abilities of each of these variants, both via quantitative and qualitative studies against ground truth data. We then report studies on a state of the art Random forest classifier based video segmentation scheme, trained using fully ground truth data and with data obtained from label propagation. The results of this study strongly support and encourage the use of the proposed label propagation algorithm.

...read moreread less

Journal Article•DOI•

Spatiotemporal motion estimation for respiratory-correlated imaging of the lungs.

[...]

Jef Vandemeulebroucke¹, Simon Rit¹, Jan Kybic², Patrick Clarysse¹, David Sarrut¹ - Show less +1 more•Institutions (2)

University of Lyon¹, École Polytechnique Fédérale de Lausanne²

17 Dec 2010-Medical Physics

TL;DR: Spatiotemporal registration can provide accurate motion estimation for 4D CT and improves the robustness to artifacts and is found most suitable to account for the sudden changes of motion at this breathing phase.

...read moreread less

Abstract: Purpose: Four-dimensional computed tomography (4D CT) can provide patient-specific motion information for radiotherapy planning and delivery. Motion estimation in 4D CT is challenging due to the reduced image quality and the presence of artifacts. We aim to improve the robustness of deformable registration applied to respiratory-correlated imaging of the lungs, by using a global problem formulation and pursuing a restrictive parametrization for the spatiotemporal deformation model.

...read moreread less

Proceedings Article•DOI•

Tracking via object reflectance using a hyperspectral video camera

[...]

Hien M. Nguyen¹, Amit Banerjee², Rama Chellappa¹•Institutions (2)

University of Maryland, College Park¹, Johns Hopkins University Applied Physics Laboratory²

13 Jun 2010

TL;DR: A new framework that incorporates radiative transfer theory to estimate object reflectance and the mean shift algorithm to simultaneously track the object based on its reflectance spectra is proposed and the combination of spectral detection and motion prediction enables the tracker to be robust against abrupt motions, and facilitate fast convergence of themean shift tracker.

...read moreread less

Abstract: Recent advances in electronics and sensor design have enabled the development of a hyperspectral video camera that can capture hyperspectral datacubes at near video rates The sensor offers the potential for novel and robust methods for surveillance by combining methods from computer vision and hyperspectral image analysis Here, we focus on the problem of tracking objects through challenging conditions, such as rapid illumination and pose changes, occlusions, and in the presence of confusers A new framework that incorporates radiative transfer theory to estimate object reflectance and the mean shift algorithm to simultaneously track the object based on its reflectance spectra is proposed The combination of spectral detection and motion prediction enables the tracker to be robust against abrupt motions, and facilitate fast convergence of the mean shift tracker In addition, the system achieves good computational efficiency by using random projection to reduce spectral dimension The tracker has been evaluated on real hyperspectral video data

...read moreread less

Journal Article•DOI•

Combined Region and Motion-Based 3D Tracking of Rigid and Articulated Objects

[...]

Thomas Brox¹, Bodo Rosenhahn², Juergen Gall³, Daniel Cremers⁴•Institutions (4)

University of California, Berkeley¹, Leibniz University of Hanover², Max Planck Society³, University of Bonn⁴

01 Mar 2010-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper shows that a sensible combination of complementary concepts for 3D tracking: region fitting on one side and dense optical flow as well as tracked SIFT features on the other yields a general tracking system that can be applied in a large variety of scenarios without the need to manually adjust weighting parameters.

...read moreread less

Abstract: In this paper, we propose the combined use of complementary concepts for 3D tracking: region fitting on one side and dense optical flow as well as tracked SIFT features on the other. Both concepts are chosen such that they can compensate for the shortcomings of each other. While tracking by the object region can prevent the accumulation of errors, optical flow and SIFT can handle larger transformations. Whereas segmentation works best in case of homogeneous objects, optical flow computation and SIFT tracking rely on sufficiently structured objects. We show that a sensible combination yields a general tracking system that can be applied in a large variety of scenarios without the need to manually adjust weighting parameters.

...read moreread less

Proceedings Article•DOI•

Removing rolling shutter wobble

[...]

Simon Baker¹, Eric P. Bennett¹, Sing Bing Kang¹, Richard Szeliski¹•Institutions (1)

Microsoft¹

13 Jun 2010

TL;DR: An algorithm is presented to remove wobble artifacts from a video captured with a rolling shutter camera undergoing large accelerations or jitter to show how estimating the rapid motion of the camera can be posed as a temporal super-resolution problem.

...read moreread less

Abstract: We present an algorithm to remove wobble artifacts from a video captured with a rolling shutter camera undergoing large accelerations or jitter. We show how estimating the rapid motion of the camera can be posed as a temporal super-resolution problem. The low-frequency measurements are the motions of pixels from one frame to the next. These measurements are modeled as temporal integrals of the underlying high-frequency jitter of the camera. The estimated high-frequency motion of the camera is then used to re-render the sequence as though all the pixels in each frame were imaged at the same time. We also present an auto-calibration algorithm that can estimate the time between the capture of subsequent rows in the camera.

...read moreread less

Journal Article•DOI•

Robust FFT-Based Scale-Invariant Image Registration with Image Gradients

[...]

Georgios Tzimiropoulos¹, Vasileios Argyriou², Stefanos Zafeiriou¹, Tania Stathaki¹•Institutions (2)

Imperial College London¹, Kingston University²

01 Oct 2010-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A robust FFT-based approach to scale-invariant image registration and introduces the normalized gradient correlation, which shows that, using image gradients to perform correlation, the errors induced by outliers are mapped to a uniform distribution for which it features robust performance.

...read moreread less

Abstract: We present a robust FFT-based approach to scale-invariant image registration. Our method relies on FFT-based correlation twice: once in the log-polar Fourier domain to estimate the scaling and rotation and once in the spatial domain to recover the residual translation. Previous methods based on the same principles are not robust. To equip our scheme with robustness and accuracy, we introduce modifications which tailor the method to the nature of images. First, we derive efficient log-polar Fourier representations by replacing image functions with complex gray-level edge maps. We show that this representation both captures the structure of salient image features and circumvents problems related to the low-pass nature of images, interpolation errors, border effects, and aliasing. Second, to recover the unknown parameters, we introduce the normalized gradient correlation. We show that, using image gradients to perform correlation, the errors induced by outliers are mapped to a uniform distribution for which our normalized gradient correlation features robust performance. Exhaustive experimentation with real images showed that, unlike any other Fourier-based correlation techniques, the proposed method was able to estimate translations, arbitrary rotations, and scale factors up to 6.

...read moreread less

Proceedings Article•DOI•

Efficient representation of traffic scenes by means of dynamic stixels

[...]

David Pfeiffer¹, Uwe Franke¹•Institutions (1)

Daimler AG¹

21 Jun 2010

TL;DR: The new dynamic Stixel World has proven to be well suited as a common basis for the scene understanding tasks of driver assistance and autonomous systems.

...read moreread less

Abstract: Correlation based stereo vision has proven its power in commercially available driver assistance systems. Recently, real-time dense stereo vision has become available on inexpensive FPGA hardware. In order to manage the huge amount of data, a medium-level representation named “Stixel World” has been proposed for further analysis. In this representation the free space in front of the vehicle is limited by adjacent rectangular sticks of a certain width. Distance and height of each so called stixel are determined by those parts of the obstacle it represents. This Stixel World is a compact but flexible representation of the three-dimensional traffic situation. The underlying model assumption is that objects stand on the ground and have approximately vertical pose with a flat surface. So far, this representation is static since it is computed for each frame independently. Driver assistance, however, is most interested in pose and motion of moving obstacles. For this reason, we introduce tracking of stixels in this paper. Using the 6D-Vision Kalman filter framework, lateral as well as longitudinal motion is estimated for each stixel. That way, the grouping of stixels based on similar motion as well as the detection of moving obstacles turns out to be significantly simplified. The new dynamic Stixel World has proven to be well suited as a common basis for the scene understanding tasks of driver assistance and autonomous systems.

...read moreread less

Proceedings Article•DOI•

Multisensor-fusion for 3D full-body human motion capture

[...]

Gerard Pons-Moll¹, Andreas Baak², Thomas Helten², Meinard Müller², Hans-Peter Seidel², Bodo Rosenhahn¹ - Show less +2 more•Institutions (2)

Leibniz University of Hanover¹, Saarland University²

13 Jun 2010

TL;DR: This work proposes a hybrid tracker that combines video with a small number of inertial units to compensate for the drawbacks of each sensor type and obtains drift-free and accurate position information from video data and gets accurate limb orientations and good performance under fast motions from inertial sensors.

...read moreread less

Abstract: In this work, we present an approach to fuse video with orientation data obtained from extended inertial sensors to improve and stabilize full-body human motion capture. Even though video data is a strong cue for motion analysis, tracking artifacts occur frequently due to ambiguities in the images, rapid motions, occlusions or noise. As a complementary data source, inertial sensors allow for drift-free estimation of limb orientations even under fast motions. However, accurate position information cannot be obtained in continuous operation. Therefore, we propose a hybrid tracker that combines video with a small number of inertial units to compensate for the drawbacks of each sensor type: on the one hand, we obtain drift-free and accurate position information from video data and, on the other hand, we obtain accurate limb orientations and good performance under fast motions from inertial sensors. In several experiments we demonstrate the increased performance and stability of our human motion tracker.

...read moreread less

Journal Article•DOI•

Maximum a Posteriori Video Super-Resolution Using a New Multichannel Image Prior

[...]

Stefanos P. Belekos¹, Nikolaos Galatsanos², Aggelos K. Katsaggelos³•Institutions (3)

National and Kapodistrian University of Athens¹, University of Patras², Northwestern University³

01 Jun 2010-IEEE Transactions on Image Processing

TL;DR: A class of SR algorithms based on the maximum a posteriori (MAP) framework is proposed, which utilize a new multichannel image prior model, along with the state-of-the-art single channel image prior and observation models.

...read moreread less

Abstract: Super-resolution (SR) is the term used to define the process of estimating a high-resolution (HR) image or a set of HR images from a set of low-resolution (LR) observations. In this paper we propose a class of SR algorithms based on the maximum a posteriori (MAP) framework. These algorithms utilize a new multichannel image prior model, along with the state-of-the-art single channel image prior and observation models. A hierarchical (two-level) Gaussian nonstationary version of the multichannel prior is also defined and utilized within the same framework. Numerical experiments comparing the proposed algorithms among themselves and with other algorithms in the literature, demonstrate the advantages of the adopted multichannel approach.

...read moreread less

Collapse