scispace - formally typeset
Search or ask a question

Showing papers on "Motion estimation published in 2010"


Proceedings ArticleDOI
12 Jun 2010
TL;DR: In this paper, a three-stage process is proposed to recover 3D human pose from monocular image sequences in real-world scenarios, such as crowded street scenes, based on tracking-by-detection.
Abstract: Automatic recovery of 3D human pose from monocular image sequences is a challenging and important research topic with numerous applications. Although current methods are able to recover 3D pose for a single person in controlled environments, they are severely challenged by real-world scenarios, such as crowded street scenes. To address this problem, we propose a three-stage process building on a number of recent advances. The first stage obtains an initial estimate of the 2D articulation and viewpoint of the person from single frames. The second stage allows early data association across frames based on tracking-by-detection. These two stages successfully accumulate the available 2D image evidence into robust estimates of 2D limb positions over short image sequences (= tracklets). The third and final stage uses those tracklet-based estimates as robust image observations to reliably recover 3D pose. We demonstrate state-of-the-art performance on the HumanEva II benchmark, and also show the applicability of our approach to articulated 3D tracking in realistic street conditions.

632 citations


Journal ArticleDOI
13 Jun 2010
TL;DR: In this article, a novel optical flow estimation method is proposed, which reduces the reliance of the flow estimates on their initial values propagated from the coarser level and enables recovering many motion details in each scale.
Abstract: We discuss the cause of a severe optical flow estimation problem that fine motion structures cannot always be correctly reconstructed in the commonly employed multi-scale variational framework. Our major finding is that significant and abrupt displacement transition wrecks small-scale motion structures in the coarse-to-fine refinement. A novel optical flow estimation method is proposed in this paper to address this issue, which reduces the reliance of the flow estimates on their initial values propagated from the coarser level and enables recovering many motion details in each scale. The contribution of this paper also includes adaption of the objective function and development of a new optimization procedure. The effectiveness of our method is borne out by experiments for both large- and small-displacement optical flow estimation.

559 citations


Proceedings ArticleDOI
21 Jun 2010
TL;DR: This paper proposes a novel approach for estimating the egomotion of the vehicle from a sequence of stereo images which is directly based on the trifocal geometry between image triples, thus no time expensive recovery of the 3-dimensional scene structure is needed.
Abstract: A common prerequisite for many vision-based driver assistance systems is the knowledge of the vehicle's own movement. In this paper we propose a novel approach for estimating the egomotion of the vehicle from a sequence of stereo images. Our method is directly based on the trifocal geometry between image triples, thus no time expensive recovery of the 3-dimensional scene structure is needed. The only assumption we make is a known camera geometry, where the calibration may also vary over time. We employ an Iterated Sigma Point Kalman Filter in combination with a RANSAC-based outlier rejection scheme which yields robust frame-to-frame motion estimation even in dynamic environments. A high-accuracy inertial navigation system is used to evaluate our results on challenging real-world video sequences. Experiments show that our approach is clearly superior compared to other filtering techniques in terms of both, accuracy and run-time.

456 citations


Book ChapterDOI
01 Jan 2010
TL;DR: Using data with ground truth from an RTK GPS system, it is shown experimentally that the algorithms can track motion, in off-road terrain, over distances of 10 km, with an error of less than 10 m.
Abstract: Motion estimation from stereo imagery, sometimes called visual odometry, is a well-known process. However, it is difficult to achieve good performance using standard techniques. We present the results of several years of work on an integrated system to localize a mobile robot in rough outdoor terrain using visual odometry, with an increasing degree of precision. We discuss issues that are important for real-time, high-precision performance: choice of features, matching strategies, incremental bundle adjustment, and filtering with inertial measurement sensors. Using data with ground truth from an RTK GPS system, we show experimentally that our algorithms can track motion, in off-road terrain, over distances of 10 km, with an error of less than 10 m (0.1%).

413 citations


Proceedings ArticleDOI
13 Jun 2010
TL;DR: This paper derives an efficient filtering algorithm for tracking human pose using a stream of monocular depth images and describes a novel algorithm for propagating noisy evidence about body part locations up the kinematic chain using the un-scented transform.
Abstract: Markerless tracking of human pose is a hard yet relevant problem. In this paper, we derive an efficient filtering algorithm for tracking human pose using a stream of monocular depth images. The key idea is to combine an accurate generative model — which is achievable in this setting using programmable graphics hardware — with a discriminative model that provides data-driven evidence about body part locations. In each filter iteration, we apply a form of local model-based search that exploits the nature of the kinematic chain. As fast movements and occlusion can disrupt the local search, we utilize a set of discriminatively trained patch classifiers to detect body parts. We describe a novel algorithm for propagating this noisy evidence about body part locations up the kinematic chain using the un-scented transform. The resulting distribution of body configurations allows us to reinitialize the model-based search. We provide extensive experimental results on 28 real-world sequences using automatic ground-truth annotations from a commercial motion capture system.

406 citations


Book ChapterDOI
05 Sep 2010
TL;DR: A novel single image deblurring method to estimate spatially non-uniform blur that results from camera shake that out-performs current approaches which make the assumption of spatially invariant blur.
Abstract: We present a novel single image deblurring method to estimate spatially non-uniform blur that results from camera shake. We use existing spatially invariant deconvolution methods in a local and robust way to compute initial estimates of the latent image. The camera motion is represented as a Motion Density Function (MDF) which records the fraction of time spent in each discretized portion of the space of all possible camera poses. Spatially varying blur kernels are derived directly from the MDF. We show that 6D camera motion is well approximated by 3 degrees of motion (in-plane translation and rotation) and analyze the scope of this approximation. We present results on both synthetic and captured data. Our system out-performs current approaches which make the assumption of spatially invariant blur.

388 citations


Journal ArticleDOI
TL;DR: A robust subspace separation scheme is developed that deals with practical issues in a unified mathematical framework and gives surprisingly good performance in the presence of the three types of pathological trajectories mentioned above.
Abstract: In this paper, we study the problem of segmenting tracked feature point trajectories of multiple moving objects in an image sequence. Using the affine camera model, this problem can be cast as the problem of segmenting samples drawn from multiple linear subspaces. In practice, due to limitations of the tracker, occlusions, and the presence of nonrigid objects in the scene, the obtained motion trajectories may contain grossly mistracked features, missing entries, or corrupted entries. In this paper, we develop a robust subspace separation scheme that deals with these practical issues in a unified mathematical framework. Our methods draw strong connections between lossy compression, rank minimization, and sparse representation. We test our methods extensively on the Hopkins155 motion segmentation database and other motion sequences with outliers and missing data. We compare the performance of our methods to state-of-the-art motion segmentation methods based on expectation-maximization and spectral clustering. For data without outliers or missing information, the results of our methods are on par with the state-of-the-art results and, in many cases, exceed them. In addition, our methods give surprisingly good performance in the presence of the three types of pathological trajectories mentioned above. All code and results are publicly available at http://perception.csl.uiuc.edu/coding/motion/.

348 citations


Journal ArticleDOI
TL;DR: This paper treats tracking as a learning problem of estimating the location and the scale of an object given its previous location, scale, as well as current and previous image frames, and introduces multiple path ways in CNN to better fuse local and global information.
Abstract: In this paper, we treat tracking as a learning problem of estimating the location and the scale of an object given its previous location, scale, as well as current and previous image frames. Given a set of examples, we train convolutional neural networks (CNNs) to perform the above estimation task. Different from other learning methods, the CNNs learn both spatial and temporal features jointly from image pairs of two adjacent frames. We introduce multiple path ways in CNN to better fuse local and global information. A creative shift-variant CNN architecture is designed so as to alleviate the drift problem when the distracting objects are similar to the target in cluttered environment. Furthermore, we employ CNNs to estimate the scale through the accurate localization of some key points. These techniques are object-independent so that the proposed method can be applied to track other types of object. The capability of the tracker of handling complex situations is demonstrated in many testing sequences.

346 citations


Journal ArticleDOI
TL;DR: High quality, high spatial resolution, and high calculation speed can be all simultaneously obtained using the proposed methodology, and could prove very useful and flexible for real-time motion estimation as well as in other fields such as optical flow and image registration.
Abstract: High-precision motion estimation has become essential in ultrasound-based techniques such as time-domain Doppler and elastography. Normalized cross-correlation (NCC) has been shown as one of the best motion estimators. However, a significant drawback is its associated computational cost, especially when RF signals are used. In this paper, a method based on sum tables developed elsewhere is adapted for fast NCC calculation in ultrasound-based motion estimation, and is tested with respect to the speed enhancement of the specific application of ultrasound-based motion estimation. Both the numerator and denominator in the NCC definition are obtained through pre-calculated sum tables to eliminate redundancy of repeated NCC calculations. Unlike a previously reported method, a search region following the principle of motion estimation is applied in the construction of sum tables. Because an exhaustive search and high window overlap are typically used for highest quality imaging, the computational cost of the proposed method is significantly lower than that of the direct method using the NCC definition, without increasing bias and variance characteristics of the motion estimation or sacrificing the spatial resolution. Therefore, high quality, high spatial resolution, and high calculation speed can be all simultaneously obtained using the proposed methodology. The high efficiency of this method was verified using RF signals from a human abdominal aorta in vivo. For the parameters typically used, a real-time, very high frame rate of 310 frames/s was achieved for the motion estimation. The proposed method was also extended to 2-D NCC motion estimation and motion estimation with other algorithms. The technique could thus prove very useful and flexible for real-time motion estimation as well as in other fields such as optical flow and image registration.

325 citations


Journal ArticleDOI
TL;DR: A nonparametric regression method for denoising 3-D image sequences acquired via fluorescence microscopy and an original statistical patch-based framework for noise reduction and preservation of space-time discontinuities are presented.
Abstract: We present a nonparametric regression method for denoising 3-D image sequences acquired via fluorescence microscopy. The proposed method exploits the redundancy of the 3-D+time information to improve the signal-to-noise ratio of images corrupted by Poisson-Gaussian noise. A variance stabilization transform is first applied to the image-data to remove the dependence between the mean and variance of intensity values. This preprocessing requires the knowledge of parameters related to the acquisition system, also estimated in our approach. In a second step, we propose an original statistical patch-based framework for noise reduction and preservation of space-time discontinuities. In our study, discontinuities are related to small moving spots with high velocity observed in fluorescence video-microscopy. The idea is to minimize an objective nonlocal energy functional involving spatio-temporal image patches. The minimizer has a simple form and is defined as the weighted average of input data taken in spatially-varying neighborhoods. The size of each neighborhood is optimized to improve the performance of the pointwise estimator. The performance of the algorithm (which requires no motion estimation) is then evaluated on both synthetic and real image sequences using qualitative and quantitative criteria.

299 citations


Journal ArticleDOI
TL;DR: A new procedure for static head-pose estimation and a new algorithm for visual 3-D tracking are presented and integrated into the novel real-time system for measuring the position and orientation of a driver's head.
Abstract: Driver distraction and inattention are prominent causes of automotive collisions. To enable driver-assistance systems to address these problems, we require new sensing approaches to infer a driver's focus of attention. In this paper, we present a new procedure for static head-pose estimation and a new algorithm for visual 3-D tracking. They are integrated into the novel real-time (30 fps) system for measuring the position and orientation of a driver's head. This system consists of three interconnected modules that detect the driver's head, provide initial estimates of the head's pose, and continuously track its position and orientation in six degrees of freedom. The head-detection module consists of an array of Haar-wavelet Adaboost cascades. The initial pose estimation module employs localized gradient orientation (LGO) histograms as input to support vector regressors (SVRs). The tracking module provides a fine estimate of the 3-D motion of the head using a new appearance-based particle filter for 3-D model tracking in an augmented reality environment. We describe our implementation that utilizes OpenGL-optimized graphics hardware to efficiently compute particle samples in real time. To demonstrate the suitability of this system for real driving situations, we provide a comprehensive evaluation with drivers of varying ages, race, and sex spanning daytime and nighttime conditions. To quantitatively measure the accuracy of system, we compare its estimation results to a marker-based cinematic motion-capture system installed in the automotive testbed.

Journal ArticleDOI
TL;DR: This work develops strategies for multiple sensor platforms to explore a noisy scalar field in the plane using provably convergent cooperative Kalman filters that apply to general cooperative exploration missions and presents a novel method to determine the shape of the platform formation to minimize error in the estimates.
Abstract: Autonomous mobile sensor networks are employed to measure large-scale environmental fields. Yet an optimal strategy for mission design addressing both the cooperative motion control and the cooperative sensing is still an open problem. We develop strategies for multiple sensor platforms to explore a noisy scalar field in the plane. Our method consists of three parts. First, we design provably convergent cooperative Kalman filters that apply to general cooperative exploration missions. Second, we present a novel method to determine the shape of the platform formation to minimize error in the estimates and design a cooperative formation control law to asymptotically achieve the optimal formation shape. Third, we use the cooperative filter estimates in a provably convergent motion control law that drives the center of the platform formation to move along level curves of the field. This control law can be replaced by control laws enabling other cooperative exploration motion, such as gradient climbing, without changing the cooperative filters and the cooperative formation control laws. Performance is demonstrated on simulated underwater platforms in simulated ocean fields.

01 Jan 2010
TL;DR: In this paper, a 1-point RANSAC algorithm was proposed to estimate the absolute scale of a moving object with only a single feature correspondence using nonholonomic constraints of wheeled vehicles (e.g., cars, bikes, mobile robots).
Abstract: The first biggest problem in visual motion estimation is data association; matched points contain many outliers that must be detected and removed for the motion to be accurately estimated In the last few years, a very established method for removing outliers has been the "5-point RANSAC" algorithm which needs a minimum of 5 point correspondences to estimate the model hypotheses Because of this, however, it can require up to thousand iterations to find a set of points free of outliers In this talk, I will show that by exploiting the non-holonomic constraints of wheeled vehicles (eg cars, bikes, mobile robots) it is possible to use a restrictive motion model which allows us to parameterize the motion with only 1 point correspondence Using a single feature correspondence for motion estimation is the lowest model parameterization possible and results in the most efficient algorithm for removing outliers: 1-point RANSAC The second problem in monocular visual odometry is the estimation of the absolute scale I will show that vehicle non-holonomic constraints make it also possible to estimate the absolute scale completely automatically whenever the vehicle turns In this talk, I will give a mathematical derivation and provide experimental results on both simulated and real data over a large image dataset collected during a 25 Km path

Journal ArticleDOI
TL;DR: A dynamic texture-based approach to the recognition of facial Action Units (AUs, atomic facial gestures) and their temporal models (i.e., sequences of temporal segments: neutral, onset, apex, and offset) in near-frontal-view face videos is proposed.
Abstract: In this work, we propose a dynamic texture-based approach to the recognition of facial Action Units (AUs, atomic facial gestures) and their temporal models (i.e., sequences of temporal segments: neutral, onset, apex, and offset) in near-frontal-view face videos. Two approaches to modeling the dynamics and the appearance in the face region of an input video are compared: an extended version of Motion History Images and a novel method based on Nonrigid Registration using Free-Form Deformations (FFDs). The extracted motion representation is used to derive motion orientation histogram descriptors in both the spatial and temporal domain. Per AU, a combination of discriminative, frame-based GentleBoost ensemble learners and dynamic, generative Hidden Markov Models detects the presence of the AU in question and its temporal segments in an input image sequence. When tested for recognition of all 27 lower and upper face AUs, occurring alone or in combination in 264 sequences from the MMI facial expression database, the proposed method achieved an average event recognition accuracy of 89.2 percent for the MHI method and 94.3 percent for the FFD method. The generalization performance of the FFD method has been tested using the Cohn-Kanade database. Finally, we also explored the performance on spontaneous expressions in the Sensitive Artificial Listener data set.

Proceedings ArticleDOI
13 Jun 2010
TL;DR: A new motion estimation algorithm is based on non-local total variation regularization which allows to integrate the low level image segmentation process in a unified variational framework and demonstrates that it can cope with the aforementioned problems.
Abstract: State-of-the-art motion estimation algorithms suffer from three major problems: Poorly textured regions, occlusions and small scale image structures. Based on the Gestalt principles of grouping we propose to incorporate a low level image segmentation process in order to tackle these problems. Our new motion estimation algorithm is based on non-local total variation regularization which allows us to integrate the low level image segmentation process in a unified variational framework. Numerical results on the Mid-dlebury optical flow benchmark data set demonstrate that we can cope with the aforementioned problems.

Journal ArticleDOI
TL;DR: In this paper, a generalized transmission index that can evaluate the motion/force transmissibility of fully parallel manipulators is proposed based on the virtual coefficient and is relative to singularity.

Journal ArticleDOI
01 Jan 2010
TL;DR: Recent development in three major issues involved in a general human motion analysis system, namely, human detection, view-invariant pose representation and estimation, and behavior understanding are presented.
Abstract: As viewpoint issue is becoming a bottleneck for human motion analysis and its application, in recent years, researchers have been devoted to view-invariant human motion analysis and have achieved inspiring progress. The challenge here is to find a methodology that can recognize human motion patterns to reach increasingly sophisticated levels of human behavior description. This paper provides a comprehensive survey of this significant research with the emphasis on view-invariant representation, and recognition of poses and actions. In order to help readers understand the integrated process of visual analysis of human motion, this paper presents recent development in three major issues involved in a general human motion analysis system, namely, human detection, view-invariant pose representation and estimation, and behavior understanding. Public available standard datasets are recommended. The concluding discussion assesses the progress so far, and outlines some research challenges and future directions, and solution to what is essential to achieve the goals of human motion analysis.

Journal ArticleDOI
TL;DR: This is the first time a patient-specific model of the aortic and mitral valves is automatically estimated from volumetric sequences, which operates on cardiac computed tomography (CT) and transesophageal echocardiogram (TEE) data.
Abstract: As decisions in cardiology increasingly rely on noninvasive methods, fast and precise image processing tools have become a crucial component of the analysis workflow. To the best of our knowledge, we propose the first automatic system for patient-specific modeling and quantification of the left heart valves, which operates on cardiac computed tomography (CT) and transesophageal echocardiogram (TEE) data. Robust algorithms, based on recent advances in discriminative learning, are used to estimate patient-specific parameters from sequences of volumes covering an entire cardiac cycle. A novel physiological model of the aortic and mitral valves is introduced, which captures complex morphologic, dynamic, and pathologic variations. This holistic representation is hierarchically defined on three abstraction levels: global location and rigid motion model, nonrigid landmark motion model, and comprehensive aortic-mitral model. First we compute the rough location and cardiac motion applying marginal space learning. The rapid and complex motion of the valves, represented by anatomical landmarks, is estimated using a novel trajectory spectrum learning algorithm. The obtained landmark model guides the fitting of the full physiological valve model, which is locally refined through learned boundary detectors. Measurements efficiently computed from the aortic-mitral representation support an effective morphological and functional clinical evaluation. Extensive experiments on a heterogeneous data set, cumulated to 1516 TEE volumes from 65 4-D TEE sequences and 690 cardiac CT volumes from 69 4-D CT sequences, demonstrated a speed of 4.8 seconds per volume and average accuracy of 1.45 mm with respect to expert defined ground-truth. Additional clinical validations prove the quantification precision to be in the range of inter-user variability. To the best of our knowledge this is the first time a patient-specific model of the aortic and mitral valves is automatically estimated from volumetric sequences.

Journal ArticleDOI
TL;DR: An approach for accurately measuring human motion through Markerless Motion Capture (MMC) is presented that uses multiple color cameras and combines an accurate and anatomically consistent tracking algorithm with a method for automatically generating subject specific models.
Abstract: An approach for accurately measuring human motion through Markerless Motion Capture (MMC) is presented. The method uses multiple color cameras and combines an accurate and anatomically consistent tracking algorithm with a method for automatically generating subject specific models. The tracking approach employed a Levenberg-Marquardt minimization scheme over an iterative closest point algorithm with six degrees of freedom for each body segment. Anatomical consistency was maintained by enforcing rotational and translational joint range of motion constraints for each specific joint. A subject specific model of the subjects was obtained through an automatic model generation algorithm (Corazza et al. in IEEE Trans. Biomed. Eng., 2009) which combines a space of human shapes (Anguelov et al. in Proceedings SIGGRAPH, 2005) with biomechanically consistent kinematic models and a pose-shape matching algorithm. There were 15 anatomical body segments and 14 joints, each with six degrees of freedom (13 and 12, respectively for the HumanEva II dataset). The overall method is an improvement over (Mundermann et al. in Proceedings of CVPR, 2007) in terms of both accuracy and robustness. Since the method was originally developed for ?8 cameras, the method performance was tested both (i) on the HumanEva II dataset (Sigal and Black, Technical Report CS-06-08, 2006) in a 4 camera configuration, (ii) on a series of motions including walking trials, a very challenging gymnastic motion and a dataset with motions similar to HumanEva II but with variable number of cameras.

Book ChapterDOI
05 Sep 2010
TL;DR: An adaptive video denosing framework that integrates robust optical flow into a nonlocal means (NLM) framework with noise level estimation and introduces approximate K-nearest neighbor matching to significantly reduce the complexity of classical NLM methods is proposed.
Abstract: Although the recent advances in the sparse representations of images have achieved outstanding denosing results, removing real, structured noise in digital videos remains a challenging problem We show the utility of reliable motion estimation to establish temporal correspondence across frames in order to achieve high-quality video denoising In this paper, we propose an adaptive video denosing framework that integrates robust optical flow into a nonlocal means (NLM) framework with noise level estimation The spatial regularization in optical flow is the key to ensure temporal coherence in removing structured noise Furthermore, we introduce approximate K-nearest neighbor matching to significantly reduce the complexity of classical NLM methods Experimental results show that our system is comparable with the state of the art in removing AWGN, and significantly outperforms the state of the art in removing real, structured noise

Journal IssueDOI
TL;DR: A novel combination of RANSAC plus extended Kalman filter (EKF) that uses the available prior probabilistic information from the EKF in the RANSac model hypothesize stage to allow the minimal sample size to be reduced to one, resulting in large computational savings without the loss of discriminative power.
Abstract: Random sample consensus (RANSAC) has become one of the most successful techniques for robust estimation from a data set that may contain outliers. It works by constructing model hypotheses from random minimal data subsets and evaluating their validity from the support of the whole data. In this paper we present a novel combination of RANSAC plus extended Kalman filter (EKF) that uses the available prior probabilistic information from the EKF in the RANSAC model hypothesize stage. This allows the minimal sample size to be reduced to one, resulting in large computational savings without the loss of discriminative power. 1-Point RANSAC is shown to outperform both in accuracy and computational cost the joint compatibility branch and bound (JCBB) algorithm, a gold-standard technique for spurious rejection within the EKF framework. Two visual estimation scenarios are used in the experiments: first, six-degree-of-freedom (DOF) motion estimation from a monocular sequence (structure from motion). Here, a new method for benchmarking six-DOF visual estimation algorithms based on the use of high-resolution images is presented, validated, and used to show the superiority of 1-point RANSAC. Second, we demonstrate long-term robot trajectory estimation combining monocular vision and wheel odometry (visual odometry). Here, a comparison against global positioning system shows an accuracy comparable to state-of-the-art visual odometry methods. © 2010 Wiley Periodicals, Inc.

Proceedings ArticleDOI
13 Jun 2010
TL;DR: This paper proposes a probabilistic graphical model for the problem of propagating labels in video sequences, also termed the label propagation problem, and reports studies on a state of the art Random forest classifier based video segmentation scheme, trained using fully ground truth data and with data obtained from label propagation.
Abstract: This paper proposes a probabilistic graphical model for the problem of propagating labels in video sequences, also termed the label propagation problem. Given a limited amount of hand labelled pixels, typically the start and end frames of a chunk of video, an EM based algorithm propagates labels through the rest of the frames of the video sequence. As a result, the user obtains pixelwise labelled video sequences along with the class probabilities at each pixel. Our novel algorithm provides an essential tool to reduce tedious hand labelling of video sequences, thus producing copious amounts of useable ground truth data. A novel application of this algorithm is in semi-supervised learning of discriminative classifiers for video segmentation and scene parsing. The label propagation scheme can be based on pixel-wise correspondences obtained from motion estimation, image patch based similarities as seen in epitomic models or even the more recent, semantically consistent hierarchical regions. We compare the abilities of each of these variants, both via quantitative and qualitative studies against ground truth data. We then report studies on a state of the art Random forest classifier based video segmentation scheme, trained using fully ground truth data and with data obtained from label propagation. The results of this study strongly support and encourage the use of the proposed label propagation algorithm.

Journal ArticleDOI
TL;DR: Spatiotemporal registration can provide accurate motion estimation for 4D CT and improves the robustness to artifacts and is found most suitable to account for the sudden changes of motion at this breathing phase.
Abstract: Purpose: Four-dimensional computed tomography (4D CT) can provide patient-specific motion information for radiotherapy planning and delivery. Motion estimation in 4D CT is challenging due to the reduced image quality and the presence of artifacts. We aim to improve the robustness of deformable registration applied to respiratory-correlated imaging of the lungs, by using a global problem formulation and pursuing a restrictive parametrization for the spatiotemporal deformation model.

Proceedings ArticleDOI
13 Jun 2010
TL;DR: A new framework that incorporates radiative transfer theory to estimate object reflectance and the mean shift algorithm to simultaneously track the object based on its reflectance spectra is proposed and the combination of spectral detection and motion prediction enables the tracker to be robust against abrupt motions, and facilitate fast convergence of themean shift tracker.
Abstract: Recent advances in electronics and sensor design have enabled the development of a hyperspectral video camera that can capture hyperspectral datacubes at near video rates The sensor offers the potential for novel and robust methods for surveillance by combining methods from computer vision and hyperspectral image analysis Here, we focus on the problem of tracking objects through challenging conditions, such as rapid illumination and pose changes, occlusions, and in the presence of confusers A new framework that incorporates radiative transfer theory to estimate object reflectance and the mean shift algorithm to simultaneously track the object based on its reflectance spectra is proposed The combination of spectral detection and motion prediction enables the tracker to be robust against abrupt motions, and facilitate fast convergence of the mean shift tracker In addition, the system achieves good computational efficiency by using random projection to reduce spectral dimension The tracker has been evaluated on real hyperspectral video data

Journal ArticleDOI
TL;DR: This paper shows that a sensible combination of complementary concepts for 3D tracking: region fitting on one side and dense optical flow as well as tracked SIFT features on the other yields a general tracking system that can be applied in a large variety of scenarios without the need to manually adjust weighting parameters.
Abstract: In this paper, we propose the combined use of complementary concepts for 3D tracking: region fitting on one side and dense optical flow as well as tracked SIFT features on the other. Both concepts are chosen such that they can compensate for the shortcomings of each other. While tracking by the object region can prevent the accumulation of errors, optical flow and SIFT can handle larger transformations. Whereas segmentation works best in case of homogeneous objects, optical flow computation and SIFT tracking rely on sufficiently structured objects. We show that a sensible combination yields a general tracking system that can be applied in a large variety of scenarios without the need to manually adjust weighting parameters.

Proceedings ArticleDOI
13 Jun 2010
TL;DR: An algorithm is presented to remove wobble artifacts from a video captured with a rolling shutter camera undergoing large accelerations or jitter to show how estimating the rapid motion of the camera can be posed as a temporal super-resolution problem.
Abstract: We present an algorithm to remove wobble artifacts from a video captured with a rolling shutter camera undergoing large accelerations or jitter. We show how estimating the rapid motion of the camera can be posed as a temporal super-resolution problem. The low-frequency measurements are the motions of pixels from one frame to the next. These measurements are modeled as temporal integrals of the underlying high-frequency jitter of the camera. The estimated high-frequency motion of the camera is then used to re-render the sequence as though all the pixels in each frame were imaged at the same time. We also present an auto-calibration algorithm that can estimate the time between the capture of subsequent rows in the camera.

Journal ArticleDOI
TL;DR: A robust FFT-based approach to scale-invariant image registration and introduces the normalized gradient correlation, which shows that, using image gradients to perform correlation, the errors induced by outliers are mapped to a uniform distribution for which it features robust performance.
Abstract: We present a robust FFT-based approach to scale-invariant image registration. Our method relies on FFT-based correlation twice: once in the log-polar Fourier domain to estimate the scaling and rotation and once in the spatial domain to recover the residual translation. Previous methods based on the same principles are not robust. To equip our scheme with robustness and accuracy, we introduce modifications which tailor the method to the nature of images. First, we derive efficient log-polar Fourier representations by replacing image functions with complex gray-level edge maps. We show that this representation both captures the structure of salient image features and circumvents problems related to the low-pass nature of images, interpolation errors, border effects, and aliasing. Second, to recover the unknown parameters, we introduce the normalized gradient correlation. We show that, using image gradients to perform correlation, the errors induced by outliers are mapped to a uniform distribution for which our normalized gradient correlation features robust performance. Exhaustive experimentation with real images showed that, unlike any other Fourier-based correlation techniques, the proposed method was able to estimate translations, arbitrary rotations, and scale factors up to 6.

Proceedings ArticleDOI
David Pfeiffer1, Uwe Franke1
21 Jun 2010
TL;DR: The new dynamic Stixel World has proven to be well suited as a common basis for the scene understanding tasks of driver assistance and autonomous systems.
Abstract: Correlation based stereo vision has proven its power in commercially available driver assistance systems. Recently, real-time dense stereo vision has become available on inexpensive FPGA hardware. In order to manage the huge amount of data, a medium-level representation named “Stixel World” has been proposed for further analysis. In this representation the free space in front of the vehicle is limited by adjacent rectangular sticks of a certain width. Distance and height of each so called stixel are determined by those parts of the obstacle it represents. This Stixel World is a compact but flexible representation of the three-dimensional traffic situation. The underlying model assumption is that objects stand on the ground and have approximately vertical pose with a flat surface. So far, this representation is static since it is computed for each frame independently. Driver assistance, however, is most interested in pose and motion of moving obstacles. For this reason, we introduce tracking of stixels in this paper. Using the 6D-Vision Kalman filter framework, lateral as well as longitudinal motion is estimated for each stixel. That way, the grouping of stixels based on similar motion as well as the detection of moving obstacles turns out to be significantly simplified. The new dynamic Stixel World has proven to be well suited as a common basis for the scene understanding tasks of driver assistance and autonomous systems.

Proceedings ArticleDOI
13 Jun 2010
TL;DR: This work proposes a hybrid tracker that combines video with a small number of inertial units to compensate for the drawbacks of each sensor type and obtains drift-free and accurate position information from video data and gets accurate limb orientations and good performance under fast motions from inertial sensors.
Abstract: In this work, we present an approach to fuse video with orientation data obtained from extended inertial sensors to improve and stabilize full-body human motion capture. Even though video data is a strong cue for motion analysis, tracking artifacts occur frequently due to ambiguities in the images, rapid motions, occlusions or noise. As a complementary data source, inertial sensors allow for drift-free estimation of limb orientations even under fast motions. However, accurate position information cannot be obtained in continuous operation. Therefore, we propose a hybrid tracker that combines video with a small number of inertial units to compensate for the drawbacks of each sensor type: on the one hand, we obtain drift-free and accurate position information from video data and, on the other hand, we obtain accurate limb orientations and good performance under fast motions from inertial sensors. In several experiments we demonstrate the increased performance and stability of our human motion tracker.

Journal ArticleDOI
TL;DR: A class of SR algorithms based on the maximum a posteriori (MAP) framework is proposed, which utilize a new multichannel image prior model, along with the state-of-the-art single channel image prior and observation models.
Abstract: Super-resolution (SR) is the term used to define the process of estimating a high-resolution (HR) image or a set of HR images from a set of low-resolution (LR) observations. In this paper we propose a class of SR algorithms based on the maximum a posteriori (MAP) framework. These algorithms utilize a new multichannel image prior model, along with the state-of-the-art single channel image prior and observation models. A hierarchical (two-level) Gaussian nonstationary version of the multichannel prior is also defined and utilized within the same framework. Numerical experiments comparing the proposed algorithms among themselves and with other algorithms in the literature, demonstrate the advantages of the adopted multichannel approach.