scispace - formally typeset
Search or ask a question

Showing papers by "Thomas Brox published in 2008"


Journal ArticleDOI
TL;DR: An iterative version of the nonlocal means filter that is derived from a variational principle and is designed to yield nontrivial steady states is suggested to be particularly useful in order to restore regular, textured patterns.
Abstract: This paper contributes two novel techniques in the context of image restoration by nonlocal filtering. First, we introduce an efficient implementation of the nonlocal means filter based on arranging the data in a cluster tree. The structuring of data allows for a fast and accurate preselection of similar patches. In contrast to previous approaches, the preselection is based on the same distance measure as used by the filter itself. It allows for large speedups, especially when the search for similar patches covers the whole image domain, i.e., when the filter is truly nonlocal. However, also in the windowed version of the filter, the cluster tree approach compares favorably to previous techniques in respect of quality versus computational cost. Second, we suggest an iterative version of the filter that is derived from a variational principle and is designed to yield nontrivial steady states. It reveals to be particularly useful in order to restore regular, textured patterns.

261 citations


Book ChapterDOI
20 Oct 2008
TL;DR: The main contribution is to decouple the position and velocity estimation steps, and to estimate dense velocities using a variational approach, which provides dense velocity estimates with accurate results at distances up to 50 meters.
Abstract: This paper presents a technique for estimating the three-dimensional velocity vector field that describes the motion of each visible scene point (scene flow). The technique presented uses two consecutive image pairs from a stereo sequence. The main contribution is to decouple the position and velocity estimation steps, and to estimate dense velocities using a variational approach. We enforce the scene flow to yield consistent displacement vectors in the left and right images. The decoupling strategy has two main advantages: Firstly, we are independent in choosing a disparity estimation technique, which can yield either sparse or dense correspondences, and secondly, we can achieve frame rates of 5 fps on standard consumer hardware. The approach provides dense velocity estimates with accurate results at distances up to 50 meters.

218 citations


Proceedings ArticleDOI
23 Jun 2008
TL;DR: A markerless motion capture system that takes the lower-dimensional pose manifold into account by modeling the motion restrictions via soft constraints during pose optimization by presenting motion capture results for challenging outdoor scenes including shadows and strong illumination changes.
Abstract: This work deals with modeling and markerless tracking of athletes interacting with sports gear. In contrast to classical markerless tracking, the interaction with sports gear comes along with joint movement restrictions due to additional constraints: while humans can generally use all their joints, interaction with the equipment imposes a coupling between certain joints. A cyclist who performs a cycling pattern is one example: The feet are supposed to stay on the pedals, which are again restricted to move along a circular trajectory in 3D-space. In this paper, we present a markerless motion capture system that takes the lower-dimensional pose manifold into account by modeling the motion restrictions via soft constraints during pose optimization. Experiments with two different models, a cyclist and a snowboarder, demonstrate the applicability of the method. Moreover, we present motion capture results for challenging outdoor scenes including shadows and strong illumination changes.

63 citations


01 Jan 2008
TL;DR: This work presents efficient implementations of nonlocal filtering using highly adapted data structures such as cluster trees, spill trees and cluster forests and introduces several extensions of the original nonlocal means filter which introduce invariance with respect to variations in brightness, scale, and rotation.
Abstract: In this work, we propose a study of efficient and optimal texture denoising methods based on the nonlocal means filter. In particular, we present efficient implementations of nonlocal filtering using highly adapted data structures such as cluster trees, spill trees and cluster forests. A comparative study of computational speed indicates that cluster forests are superior to alternative methods. Moreover, we introduce several extensions of the original nonlocal means filter which introduce invariance with respect to variations in brightness, scale, and rotation. 1. NONLOCAL SMOOTHING Image enhancement and noise removal is a classical task in image processing with a long history and many methodologies. Especially discontinuity preserving filters, such as nonlinear diffusion filters [1], the ROF filter [2], and certain types of wavelet shrinkage [3], have been very successful in removing noise from images while preserving their most relevant structures. However, despite their success in a wide field of applications, they reveal a significant shortcoming when it comes to highly oscillatory structures, as they naturally appear in textural patterns. Such structures are confused with the noise and are erroneously removed. When considering denoising of an image I : Ω → R, nonlinear diffusion filters and related methods are spatially local in the sense that at each location x ∈ Ω the update of the evolving image u : Ω → R×[0, T ], u(x, 0) = I(x) is determined only by derivatives of u at that same location x. A class of image filters which adaptively takes into account intensity information from more distant locations are the Yaroslavsky neighbourhood filters [4]: u(x) = ∫ K(x, y)I(y) dy ∫ K(x, y) dy . (1) Here the smoothed image u(x) is stated as the weighted average of pixels of the original image I(x). The weights are determined by a nonnegative kernel function K, which decays with the distance d(x, y) = γ|x− y| + |I(x)− I(y)|. A typical choice is the Gaussian kernel K(x, y) = 1 (2πh2)D/2 exp ( − 2(x,y) 2h2 ) with kernel width h and dimensionality of the data D. This filter assigns large weights to pixels y and their intensities I(y) which are similar in the sense that they are close to (x, I(x)) in space and in intensity. The parameter γ allows to adjust the relative importance of spatial and tonal similarity. Neighbourhood filters are also known as local M-smoothers [5, 6]. These filters can also be iterated, which results in the bilateral filter [7, 8]. Relations between such neighbourhood filters and nonlinear diffusion filters have been investigated in [9, 10, 11]. It turns out that even though these semi-local filters1 substantially increase the number of candidate pixels for averaging compared to diffusion filters, they reveal a similar qualitative denoising behavior as nonlinear diffusion: whereas they preserve large scale structures, small scale structures are regarded as noise and are removed. For achieving a better preservation of small-scale textural patterns, a small but decisive extension of the neighbourhood filters is necessary. Rather than considering only the centre pixel in the similarity of two points, we can regard local balls (patches) around these points. These patches capture the dependencies of neighbouring pixels and thus can distinguish textural patterns. The idea is inspired by works on texture synthesis [12, 13, 14] and has been proposed simultaneously with the nonlocal means filter [15] and the UINTA filter [16]. Both filters use a distance that considers not only the similarity of the central pixel, but also the similarity of its neighbourhood: d(x, y) = ∫ Gρ(x) ( I(x− x′)− I(y − x′) )2 dx′. (2) The Gaussian kernel Gρ, which is not to be confused with the kernel K, acts as a weighted neighbourhood of size ρ. A uniformly weighted box can be chosen as well, which illustrates the basic concept of comparing patches. Since the above similarity measure takes into account complete patches instead of single pixel intensities, only similar textures play a role in the averaging. This removes noise 1semi-local due to the spatial distance that plays a role in the similarity while the fine repetitive structures that are due to the texture are preserved by the filter. A variety of applications exist for this filter. Apart from denoising images, the concept can, for instance, be translated to video enhancement [17] and the smoothing of 3D surfaces [18]. Apart from ρ, the size of the patch, the filter contains another important parameter, namely the width h of the kernel K. It quantifies how fast the weights decay with increasing dissimilarity of respective patches. Statistical reasoning as in [19, 20] allows to determine h automatically via cross-validation or by estimating the noise variance. Ideas to adapt the size of the patch ρ locally to the data have been presented in [21]. Other improvements of the basic filter concern the iteration of the filter. The bilateral filter is an iterative version of the Yaroslavsky filter, and there is no reason why we should not iterate also a neighbourhood filter whose similarities are defined on patches rather than single pixel intensities. The entropy minimisation framework of the UINTA filter [16] leads to such a patch-based variant of the bilateral filter. Other iterative versions have been proposed in [22, 23, 24, 25]. Some of these filters, particularly [23, 24], are designed in a way that brings them closer to local filters by running many iterations with weights K(x, y) computed on the initial image I and emphasising the spatial distance of the pixels rather than the similarity of the patches. In the present paper, we are concerned with two other important issues in nonlocal filtering: computational complexity and invariance of the patch distance with respect to certain transformations, such as rotation, scaling, and illumination. The next section will deal with the computational complexity and present a novel indexing structure, the cluster forest. Invariant patch comparisons will be the subject of Section 3. 2. FAST NONLOCAL FILTERING Regarding the computational complexity of nonlocal filters reveals that a price must be payed for the great results. At each pixel, weights to all other pixels have to be computed. This yields a computational complexity of O(DN), where N is the number of pixels in the image, and D is the patch size. For larger images, this complexity is quite a burden. Hence, several approximations have been suggested. The most popular way is to restrict the search to patches in a local neighborhood [15], which turns the initially nonlocal filter into a semi-local one. This reduces the computational complexity to O(DN). Similarly, we can apply random sampling, where samples from the vicinity of the reference patch are preferred [16]. Both strategies assume that the most similar patches are in the vicinity of the reference patch. It has been shown that in many denoising tasks, semi-local filtering not only reduces the computational load, but even improves the denoising quality significantly. However, solving the complexity problem by retreating to a semi-local variant revokes the initial idea and properties of nonlocal filtering. It is definitely not the Figure 1. Schematic illustration of a cluster tree. Leafs contain a relatively small set of similar patches. right strategy in case of applications where true nonlocal filtering is beneficial. Speedups without necessarily abandoning the idea of nonlocal filtering have been achieved in [26, 17, 27, 23, 20]. In [26], patch comparison is performed only for a subset of reference patches lying on a coarser grid of the image. The computed weights are then used to restore a whole block of pixels at once. It is obvious that this approach can be used to gain a significant constant factor in the computational complexity, yet for the sake of quality the grid cannot be made arbitrarily coarse. In case of nonlocal filtering, we are hence left with the quadratic time complexity of the original filter. 2.1. Acceleration by preselecting patches In patch based nonlocal filtering almost all computation time is spent on computing distances between patches. However, only a relatively small part of all patches is sufficiently similar for their kernel weights K(x, y) to play a role in the averaging. Hence, in order to speed up the filter, the basic idea of the approaches in [17, 27] has been to compute distances only for a reduced set of patches. Preselection of patches is performed by some alternative distances, which can be computed very quickly, such as the difference of the patches’ means or variances. Indeed, this strategy leads to a significant speedup, particularly in case of large patches. The disadvantage of this approach is that the preselection criterion is hardly related to the distance of patches. Two patches with same means and variances often comprise vastly different textural structures. Although this hardly harms the filtering outcome for each patch in the preselected set the exact distance is computed it reduces the efficiency of the method, as the preselected set still contains a large number of dissimilar patches.

18 citations


Book ChapterDOI
10 Jun 2008
TL;DR: In this paper, geometric prior information about the floor location is integrated in the pose tracking process, and poses in which body parts intersect the ground plane are penalized by employing soft constraints.
Abstract: In order to overcome typical problems in markerless motion capture from video, such as ambiguities, noise, and occlusions, many techniques reduce the high dimensional search space by integration of prior information about the movement pattern or scene. In this work, we present an approach in which geometric prior information about the floor location is integrated in the pose tracking process. We penalize poses in which body parts intersect the ground plane by employing soft constraints in the pose estimation framework. Experiments with rigid objects and the HumanEVA-II benchmark show that tracking is remarkably stabilized.

14 citations


Book ChapterDOI
01 Jan 2008
TL;DR: This chapter surveys a high quality generative method, which employs the person’s silhouette extracted from one or multiple camera views for fitting an a-priori given 3D body surface model.
Abstract: Human 3D motion tracking from video is an emerging research field with many applications demanding highly detailed results. This chapter surveys a high quality generative method, which employs the person’s silhouette extracted from one or multiple camera views for fitting an a-priori given 3D body surface model. A coupling between pose estimation and contour extraction allows for reliable tracking in cluttered scenes without the need of a static background. The optic flow computed between two successive frames is used for pose prediction. It improves the quality of tracking in case of fast motion and/or low frame rates. In order to cope with unreliable or insufficient data, the framework is further extended by the use of prior knowledge on static joint angle configurations.

13 citations


Book ChapterDOI
09 Jul 2008
TL;DR: Self-occlusion is a common problem in silhouette based motion capture, which often results in ambiguous pose configurations by splitting the surface model of the object and tracking the silhouette of each part rather than the whole object.
Abstract: Self-occlusion is a common problem in silhouette based motion capture, which often results in ambiguous pose configurations. In most works this is compensated by a priori knowledge about the motion or the scene, or by the use of multiple cameras. Here we suggest to overcome this problem by splitting the surface model of the object and tracking the silhouette of each part rather than the whole object. The splitting can be done automatically by comparing the appearance of the different parts with the Jensen-Shannon divergence. Tracking is then achieved by maximizing the appearance differences of all involved parts and the background simultaneously via gradient descent. We demonstrate the improvements with tracking results from simulated and real world scenes.

10 citations


Book ChapterDOI
01 Jan 2008

5 citations


Book ChapterDOI
18 Feb 2008
TL;DR: This work deals with modeling and tracking of mechanical systems which are given as kinematic chains with restricted degrees of freedom and suggests to model them numerically via soft constraints.
Abstract: This work deals with modeling and tracking of mechanical systems which are given as kinematic chains with restricted degrees of freedom. Such systems may involve many joints, but due to additional restrictions or mechanical properties the joints depend on each other. So-called closed-chain or parallel manipulators are examples for kinematic chains with additional constraints. Though the degrees of freedom are limited, the complexity of the dynamic equations increases rapidly when studied analytically. In this work, we suggest to avoid this kind of analytic integration of interconnection constraints and instead to model them numerically via soft constraints.

4 citations