scispace - formally typeset
Search or ask a question

Showing papers by "Christian Theobalt published in 2009"


01 Jan 2009
TL;DR: This paper proposes a method for capturing the performance of a human or an animal from a multi-view video sequence and proposes a novel optimization scheme for skeleton-based pose estimation that exploits the skeleton's tree structure to split the optimization problem into a local one and a lower dimensional global one.
Abstract: This paper proposes a method for capturing the performance of a human or an animal from a multi-view video sequence. Given an articulated template model and silhouettes from a multi-view image sequence, our approach recovers not only the movement of the skeleton, but also the possibly non-rigid temporal deformation of the 3D surface. While large scale deformations or fast movements are captured by the skeleton pose and approximate surface skinning, true small scale deformations or non-rigid garment motion are captured by fitting the surface to the silhouette. We further propose a novel optimization scheme for skeleton-based pose estimation that exploits the skeleton's tree structure to split the optimization problem into a local one and a lower dimensional global one. We show on various sequences that our approach can capture the 3D motion of animals and humans accurately even in the case of rapid movements and wide apparel like skirts.

349 citations


Proceedings ArticleDOI
20 Jun 2009
TL;DR: This paper proposes a method for capturing the performance of a human or an animal from a multi-view video sequence and proposes a novel optimization scheme for skeleton-based pose estimation that exploits the skeleton's tree structure to split the optimization problem into a local one and a lower dimensional global one.
Abstract: This paper proposes a method for capturing the performance of a human or an animal from a multi-view video sequence. Given an articulated template model and silhouettes from a multi-view image sequence, our approach recovers not only the movement of the skeleton, but also the possibly non-rigid temporal deformation of the 3D surface. While large scale deformations or fast movements are captured by the skeleton pose and approximate surface skinning, true small scale deformations or non-rigid garment motion are captured by fitting the surface to the silhouette. We further propose a novel optimization scheme for skeleton-based pose estimation that exploits the skeleton's tree structure to split the optimization problem into a local one and a lower dimensional global one. We show on various sequences that our approach can capture the 3D motion of animals and humans accurately even in the case of rapid movements and wide apparel like skirts.

333 citations


Proceedings ArticleDOI
20 Jun 2009
TL;DR: LidarBoost is presented, a 3D depth superresolution method that combines several low resolution noisy depth images of a static scene from slightly displaced viewpoints, and merges them into a high-resolution depth image.
Abstract: Depth maps captured with time-of-flight cameras have very low data quality: the image resolution is rather limited and the level of random noise contained in the depth maps is very high. Therefore, such flash lidars cannot be used out of the box for high-quality 3D object scanning. To solve this problem, we present LidarBoost, a 3D depth superresolution method that combines several low resolution noisy depth images of a static scene from slightly displaced viewpoints, and merges them into a high-resolution depth image. We have developed an optimization framework that uses a data fidelity term and a geometry prior term that is tailored to the specific characteristics of flash lidars. We demonstrate both visually and quantitatively that LidarBoost produces better results than previous methods from the literature.

212 citations


Proceedings ArticleDOI
01 Sep 2009
TL;DR: This work proposes an integrated multi-view sensor fusion approach that combines information from multiple color cameras and multiple ToF depth sensors to obtain high quality dense and detailed 3D models of scenes challenging for stereo alone, while simultaneously reducing complex noise of ToF sensors.
Abstract: Multi-view stereo methods frequently fail to properly reconstruct 3D scene geometry if visible texture is sparse or the scene exhibits difficult self-occlusions Time-of-Flight (ToF) depth sensors can provide 3D information regardless of texture but with only limited resolution and accuracy To find an optimal reconstruction, we propose an integrated multi-view sensor fusion approach that combines information from multiple color cameras and multiple ToF depth sensors First, multi-view ToF sensor measurements are combined to obtain a coarse but complete model Then, the initial model is refined by means of a probabilistic multi-view fusion framework, optimizing over an energy function that aggregates ToF depth sensor information with multi-view stereo and silhouette constraints We obtain high quality dense and detailed 3D models of scenes challenging for stereo alone, while simultaneously reducing complex noise of ToF sensors

174 citations


Journal ArticleDOI
01 Dec 2009
TL;DR: This work presents a method for automatically synthesizing body language animations directly from the participants' speech signals, without the need for additional input, suitable for animating characters from live human speech.
Abstract: Human communication involves not only speech, but also a wide variety of gestures and body motions Interactions in virtual environments often lack this multi-modal aspect of communication We present a method for automatically synthesizing body language animations directly from the participants' speech signals, without the need for additional input Our system generates appropriate body language animations by selecting segments from motion capture data of real people in conversation The synthesis can be performed progressively, with no advance knowledge of the utterance, making the system suitable for animating characters from live human speech The selection is driven by a hidden Markov model and uses prosody-based features extracted from speech The training phase is fully automatic and does not require hand-labeling of input data, and the synthesis phase is efficient enough to run in real time on live microphone input User studies confirm that our method is able to produce realistic and compelling body language

140 citations


Patent
12 May 2009
TL;DR: In this paper, a volumetric deformation is applied to the digital representation of a figure as a function of the reference points and the correlation of the representation of the figure.
Abstract: A variety of methods, devices and storage mediums are implemented for creating digital representations of figures. According to one such computer implemented method, a volumetric representation of a figure is correlated with an image of the figure. Reference points are found that are common to each of two temporally distinct images of the figure, the reference points representing movement of the figure between the two images. A volumetric deformation is applied to the digital representation of the figure as a function of the reference points and the correlation of the volumetric representation of the figure. A fine deformation is applied as a function of the coarse/volumetric deformation. Responsive to the applied deformations, an updated digital representation of the figure is generated.

45 citations


Proceedings Article
01 Jan 2009
TL;DR: An intuitive interface is designed for a user to sketch, in a few seconds, additional hints to the algorithm, to obtain 3D reconstructions of much higher quality than previous fullyautomatic methods.
Abstract: We presenti23, an algorithm to reconstruct a 3D model from a single image taken with a normal photo camera. It is based off an automatic machine learning approach that casts 3D reconstruction as a probabilistic inference problem using a Markov Random Field trained on ground truth data. Since it is difficult to learn the statistical relations for all possible images, the quality of the automatic reconstruction is sometimes unsatisfying. We therefore designed an intuitive interface for a user to sketch, in a few seconds, additional hints to the algorithm. We have developed a way to incorporate these constraints into the probabilistic reconstruction framework in order to obtain 3D reconstructions of much higher quality than previous fullyautomatic methods. Our system also represents an exciting new computational photography tool, enabling new ways of rendering and editing photos. 1

11 citations