scispace - formally typeset
Search or ask a question

Showing papers by "Luc Van Gool published in 2002"


Book ChapterDOI
16 Sep 2002
TL;DR: This paper presents the integration of color distributions into particle filtering and shows how these distributions can be adapted over time.
Abstract: Color can provide an efficient visual feature for tracking non-rigid objects in real-time. However, the color ofan object can vary over time dependent on the illumination, the visual angle and the camera parameters. To handle these appearance changes a color-based target model must be adapted during temporally stable image observations.This paper presents the integration ofcolor distributions into particle filtering and shows how these distributions can be adapted over time. A particle filter tracks several hypotheses simultaneously and weights them according to their similarity to the target model. As similarity measure between two color distributions the popular Bhattacharyya coefficient is applied. In order to update the target model to slowly varying image conditions, frames where the object is occluded or too noisy must be discarded.

288 citations


01 Jan 2002
TL;DR: In this article, the authors present a system that allows for an evolutionary introduction of depth perception into the existing 2D digital TV framework, where all parts of the 3D processing chain are optimized to one another.
Abstract: In this paper we will present the concept of a system that allows for an evolutionary introduction of depth perception into the existing 2D digital TV framework. The work is part of the European Information Society Technologies (IST) project “Advanced Three-Dimensional Television System Technologies” (ATTEST), an activity where industries, research centers and universities have joined forces to design a backwardscompatible, flexible and modular broadcast 3D-TV system, where all parts of the 3D processing chain are optimised to one another. This includes content creation, coding and transmission, display and research in human 3D perception, which be will used to guide the development process. The goals of the project comprise the development of a novel broadcast 3D camera, algorithms to convert existing 2D-video material into 3D, a 2Dcompatible coding and transmission scheme for 3D-video using MPEG2/4/7 technologies and the design of two new autostereoscopic displays.

155 citations


Book ChapterDOI
28 May 2002
TL;DR: A complete approach is proposed that detects the problem and defers the computation of parameters that are ambiguous in projective space (i.e. the registration between partial reconstructions only sharing a common plane and poses of cameras only seeing planar features) till after self-calibration.
Abstract: In this paper we address the problem of uncalibrated structure and motion recovery from image sequences that contain dominant planes in some of the views. Traditional approaches fail when the features common to three consecutive views are all located on a plane. This happens because in the uncalibrated case there is a fundamental ambiguity in relating the structure before and after the plane. This is, however, a situation that is often hard to avoid in man-made environments. We propose a complete approach that detects the problem and defers the computation of parameters that are ambiguous in projective space (i.e. the registration between partial reconstructions only sharing a common plane and poses of cameras only seeing planar features) till after self-calibration. Also a new linear self-calibration algorithm is proposed that couples the intrinsics between multiple subsequences. The final result is a complete metric 3D reconstruction of both structure and motion for the whole sequence. Experimental results on real image sequences show that the approach yields very good results.

128 citations


Journal ArticleDOI
TL;DR: How computers can automatically build realistic 3D models from 2D images acquired with a handheld camera is shown.
Abstract: How computers can automatically build realistic 3D models from 2D images acquired with a handheld camera.

113 citations


Journal ArticleDOI
TL;DR: The development of this automatic crude registration has allowed us to create a system that can generate complex 3D models from a set of partial reconstructions without any user intervention or prior knowledge of relative positions.

85 citations


Book ChapterDOI
16 Sep 2002
TL;DR: The paper discusses a structured light setup for fast, one-shot 3D acquisition that consists of equidistant, vertical stripes that are extracted with sub-pixel accuracy by a pattern-specific snake.
Abstract: The paper discusses a structured light setup for fast, one-shot 3D acquisition. The projection pattern consists of equidistant, vertical stripes. The main problem is the determination of the stripe-boundaries. They are extracted with sub-pixel accuracy by a pattern-specific snake. An initialization procedure yields the rough contours. Subpixel accuracy is reached through an iterative relaxation process. The labeling problem is automatically solved if all boundaries are located. Interpolation guarantees that the correct number of boundaries is initialized.

55 citations


Book ChapterDOI
16 Sep 2002
TL;DR: This paper presents a multicamera Visual Room (ViRoom), constructed from low-cost digital cameras and standard computers running on Linux and a fully automatic self-calibration method for multiple cameras and without any known calibration object.
Abstract: This paper presents a multicamera Visual Room (ViRoom). It is constructed from low-cost digital cameras and standard computers running on Linux. Software based synchronized image capture is introduced. A fully automatic self-calibration method for multiple cameras and without any known calibration object is proposed and verified by 3D reconstruction experiments. This handy calibration allows an easy reconfiguration of the setup. Aside from the computers which are usually already available, such a synchronized multicamera setup with six or seven cameras costs less than 1000 USD.

53 citations


Journal ArticleDOI
TL;DR: The strategy followed by the paper, which focuses on speech, follows a kind of bootstrap procedure and learns 3D shape statistics from a talking face with a relatively small number of markers to simulate facial anatomy.
Abstract: Realistic face animation is especially hard as we are all experts in the perception and interpretation of face dynamics. One approach is to simulate facial anatomy. Alternatively, animation can be based on first observing the visible 3D dynamics, extracting the basic modes, and putting these together according to the required performance. This is the strategy followed by the paper, which focuses on speech. The approach follows a kind of bootstrap procedure. First, 3D shape statistics are learned from a talking face with a relatively small number of markers. A 3D reconstruction is produced at temporal intervals of 1/25 seconds. A topological mask of the lower half of the face is fitted to the motion. Principal component analysis (PCA) of the mask shapes reduces the dimension of the mask shape space. The result is twofold. On the one hand, the face can be animated; in our case it can be made to speak new sentences. On the other hand, face dynamics can be tracked in 3D without markers for performance capture. Copyright © 2002 John Wiley & Sons, Ltd.

44 citations


Book ChapterDOI
16 Sep 2002
TL;DR: Several successfully reconstructed complex roof structures corroborate the potential of the model-based reconstruction of complex polyhedral building roofs and infer missing parts of the roof model by invoking the geometric regime once more.
Abstract: This paper investigates into model-based reconstruction of complex polyhedral building roofs. A roof is modelled as a structured ensemble of planar polygonal faces. The modelling is done in two different regimes. One focuses on geometry, whereas the other is ruled by semantics. Inside the geometric regime, 3D line segments are grouped into planes and further into faces using a Bayesian analysis. In the second regime, the preliminary geometric models are subject to a semantic interpretation. The knowledge gained in this step is used to infer missing parts of the roof model (by invoking the geometric regime once more) and to adjust the overall roof topology. Several successfully reconstructed complex roof structures corroborate the potential of the approach.

41 citations


Book ChapterDOI
28 May 2002
TL;DR: Depth extraction with a mobile stereo system is described, with emphasis on the integration of the motion and stereo cues, and the resulting system can handle large displacements, depth discontinuities and occlusions.
Abstract: Depth extraction with a mobile stereo system is described. The stereo setup is precalibrated, but the system extracts its own motion. Emphasis lies on the integration of the motion and stereo cues. It is guided by the relative confidence that the system has in these cues. This weighing is fine-grained in that it is determined for every pixel at every iteration. Reliable information spreads fast at the expense of less reliable data, both in terms of spatial communication and in terms of exchange between cues. The resulting system can handle large displacements, depth discontinuities and occlusions. Experimental results corroborate the viability of the approach.

34 citations



Book ChapterDOI
01 Jan 2002
TL;DR: A self-learning prototype system for the real-time detection of unusual motion patterns and motion recognition based on the same method, the extended Condensation algorithm, used for the object tracking.
Abstract: This paper describes a self-learning prototype system for the real-time detection of unusual motion patterns The proposed surveillance system uses a three-step approach consisting of a tracking, a learning and a recognition part In the first step, an arbitrary, changing number of objects are tracked with an extension of the Condensation algorithm Prom the history of the tracked object states, temporal trajectories are formed which describe the motion paths of these objects Secondly, characteristic motion patterns are learned by clustering these trajectories into prototype curves In the final step, motion recognition is then tackled by tracking the position within these prototype curves based on the same method, the extended Condensation algorithm, used for the object tracking


01 Jan 2002
TL;DR: This paper investigates into the model-based reconstruction of complex polyhedral building roofs by using a set of 3D line segments obtained from multiview correspondence analysis of high resolution colour imagery, and chooses the optimal patch and plane configuration non-deterministically.
Abstract: This paper investigates into the model-based reconstruction of complex polyhedral building roofs. A set of 3D line segments, obtained from multiview correspondence analysis of high resolution colour imagery, is used as input data. The 3D line segments are grouped into planes by means of a Bayesian model selection procedure. In the resulting planes, models for polygonal patches are then instantiated. Driven by the Maximum Expected Utility principle, the algorithm chooses the optimal patch and plane configuration non-deterministically. Roof reconstruction is completed by further reasoning steps which are guided by the semantic interpretation of the intermediate patch configuration. Several successfully reconstructed complex roof structures corroborate the potential of the approach.

Proceedings ArticleDOI
01 Nov 2002
TL;DR: The goals of the project towards the optimized 3D broadcast chain comprise the development of a novel broadcast 3D camera, algorithms to convert existing 2D-video material into 3D, a2D-compatible coding and transmission scheme for 3D- video using MPEG-2/4/7 technologies and the design of two new autostereoscopic displays.
Abstract: In this paper we will present the concept of a modular three dimensional broadcast chain, that allows for an evolutionary introduction of depth perception into the context of 2D digital TV. The work is performed within the framework of the European Information Society Technologies (IST) project "Advanced Three-dimensional Television System Technologies" (ATTEST), bringing together the expertise of industries, research centers and universities to design a backwards-compatible, flexible and modular broadcast 3D-TV system. This three dimensional broadcast chain includes content creation, coding, transmission and display. Research in human 3D perception will be used to guide the development process. The goals of the project towards the optimized 3D broadcast chain comprise the development of a novel broadcast 3D camera, algorithms to convert existing 2D-video material into 3D, a 2D-compatible coding and transmission scheme for 3D-video using MPEG-2/4/7 technologies and the design of two new autostereoscopic displays.

Journal ArticleDOI
TL;DR: An automatic processing pipeline is presented that analyses an image sequence and automatically extracts camera motion, calibration and scene geometry and a dense estimate of the surface geometry of the observed scene is computed using stereo matching.
Abstract: This paper contains two parts. In the first part an automatic processing pipeline is presented that analyses an image sequence and automatically extracts camera motion, calibration and scene geometry. The system combines state-of-the-art algorithms developed in computer vision, computer graphics and photogrammetry. The approach consists of two stages. Salient features are extracted and tracked throughout the sequence to compute the camera motion and calibration and the 3D structure of the observed features. Then a dense estimate of the surface geometry of the observed scene is computed using stereo matching. The second part of the paper discusses how this information can be used for visualization. Traditionally, a textured 3D model is constructed from the computed information and used to render new images. Alternatively, it is also possible to avoid the need for an explicit 3D model and to obtain new views directly by combining the appropriate pixels from recorded views. It is interesting to note that even when there is an ambiguity on the reconstructed geometry, correct new images can often still be generated. Copyright © 2002 John Wiley & Sons, Ltd.

Proceedings Article
01 Jan 2002
TL;DR: A method for realistic face animation is proposed, which focuses on speech animation, and replicates the 3D ’visemes’ that it has learned from talking actors, and adds the necessary coarticulation effects.
Abstract: A method for realistic face animation is proposed. In particular it focuses on speech animation. When asked to animate a face it replicates the 3D ’visemes’ that it has learned from talking actors, and adds the necessary coarticulation effects. The speech animation could be based on as few as 16 modes, extracted through Independent Component Analysis from different face dynamics. The exact deformation fields that come with the different visemes are adapted by the system to take the shape of the given face into account. By localising the face to be animated in a face space, where also the locations of the neutral example faces are known, visemes are adapted automatically according to the relative distance with respect to these examples.


Book ChapterDOI
28 May 2002
TL;DR: In this article, a linear least squares (LQS) algorithm was proposed to recover structure and motion from video sequences without tedious pre-calibrations and allowing lens distortions to change over time.
Abstract: Lens distortions in off-the-shelf or wide-angle cameras block the road to high accuracy Structure and Motion Recovery (SMR) from video sequences. Neglecting lens distortions introduces a systematic error buildup which causes recovered structure and motion to bend and inhibits turntable or other loop sequences to close perfectly. Locking back onto previously reconstructed structure can become impossible due to the large drift caused by the error buildup. Bundle adjustments are widely used to perform an ultimate post-minimization of the total reprojection error. However, the initial recovered structure and motion needs to be close to optimal to avoid local minima. We found that bundle adjustments cannot remedy the error buildup caused by ignoring lens distortions. The classical approach to distortion removal involves a preliminary distortion estimation using a calibration pattern, known geometric properties of perspective projections or only 2D feature correspondences. Often the distortion is assumed constant during camera usage and removed from the images before applying SMR algorithms. However, lens distortions can change by zooming, focusing and temperature variations. Moreover, when only the video sequence is available preliminary calibration is often not an option. This paper addresses all fore-mentioned problems by sequentially recovering lens distortions together with structure and motion from video sequences without tedious pre-calibrations and allowing lens distortions to change over time. The devised algorithms are fairly simple as they only use linear least squares techniques. The unprocessed video sequence forms the only input and no severe restrictions are placed on viewed scene geometry. Therefore, the accurate recovery of structure and motion is fully automated and widely applicable. The experiments demonstrate the necessity of modeling lens distortions to achieve high accuracy in recovered structure and motion.

Book ChapterDOI
28 May 2002
TL;DR: In this paper, the Gaussian image is used to detect planar, cylindrical and conical regions and then the rigid motion between the patches is computed to put the patches in the same coordinates system.
Abstract: Most 3D recording methods generate multiple partial reconstructions that must be integrated to form a complete model. The coarse registration step roughly aligns the parts with each other. Several methods for coarse registration have been developed that are based on matching points between different parts. These methods look for interest points and use a point signature that encodes the local surface geometry to find corresponding points. We developed a technique that is complementary to these methods. Local descriptions can fail or can be highly inefficient when the surfaces contain local symmetries. In stead of discarding these regions, we introduce a method that first uses the Gaussian image to detect planar, cylindrical and conical regions and uses this information to compute the rigid motion between the patches. For combining the information from multiple regions to a single solution, we use a a Hough space that accumulates votes for candidate transformations. Due to their symmetry, they update a subspace of parameter space in stead of a single bin. Experiments on real range data from different views of the same object show that the method can find the rigid motion to put the patches in the same coordinates system.

01 Jan 2002
TL;DR: Murale as discussed by the authors is a European IST project that will develop 3D capture and visualisation technology for archaeology, which will put special emphasis on the usability on the site, by the archaeologists themselves.
Abstract: Murale is a European IST project that will develop 3D capture and visualisation technology for archaeology. The project will put special emphasis on the usability on the site, by the archaeologists themselves. The paper describes techniques that are being developed by three of the Murale partners in particular. These comprise two methods to generate 3D models of objects, and approaches to deal with the textures of materials and terrain. Put together with the database and visualisation expertise brought in by the other partners, Murale will not only contribute to the enhanced visualisation of archaeological sites and finds, but also to a faster and more complete documentation of the progress of excavations. The ancient city of Sagalassos, one of the major excavation sites in the eastern part of the Mediterranean, will be used as the primary test site.

Journal Article
TL;DR: A method that first uses the Gaussian image to detect planar, cylindrical and conical regions and uses this information to compute the rigid motion between the patches to put the patches in the same coordinates system is introduced.

01 Jan 2002
TL;DR: It is shown, how the knowledge gained in this step can be used to infer missing parts of the roof model (by invoking the geometric regime once more) and to adjust the overall roof topology.
Abstract: This paper investigates into model-based reconstruction of complex polyhedral building roofs. A roof is modelled as a structured ensemble of planar polygonal faces. The modelling is done in two different regimes. One focuses on geometry, whereas the other is ruled by semantics. Inside the geometric regime, which is the primary topic of this paper, 3D line segments are grouped into planes and further into faces using a Bayesian analysis. In the second regime, the preliminary geometric models are subject to a semantic interpretation. It is shown, how the knowledge gained in this step can be used to infer missing parts of the roof model (by invoking the geometric regime once more) and to adjust the overall roof topology. Several successfully reconstructed complex roof structures corroborate the potential of the approach.


Book ChapterDOI
28 May 2002
TL;DR: In this article, the layout of the different subtextures is modeled as a texture, which can be generated automatically by segmenting the texture into sub-textures and then filling in the layout with the appropriate sub-texture.
Abstract: Textures can often more easily be described as a composition of subtextures than as a single texture. The paper proposes a way to model and synthesize such "composite textures", where the layout of the different subtextures is itself modeled as a texture, which can be generated automatically. Examples are shown for building materials with an intricate structure and for the automatic creation of landscape textures. First, a model of the composite texture is generated. This procedure comprises manual or unsupervised texture segmentation to learn the spatial layout of the composite texture and the extraction of models for each of the subtextures. Synthesis of a composite texture includes the generation of a layout texture, which is subsequently filled in with the appropriate subtextures. This scheme is refined further by also including interactions between neighboring subtextures.

01 May 2002
TL;DR: A new method to animate the face of a speaking avatar such that it realistically pronounces any given text, based on the audio only, is presented, which requires minimal bandwidth and relatively low computational effort.
Abstract: In this paper we present a new method to animate the face of a speaking avatar —i.e., a synthetic 3D human face— such that it realistically pronounces any given text, based on the audio only. Especially the lip movements must be rendered carefully, and perfectly synchronised with the audio, in order to have a realistic looking result, from which it should in principle be possible to understand the pronounced sentence by lip reading. Since such a system requires minimal bandwidth and relatively low computational effort, it could e.g. be used to transmit video conferencing data over a very low bandwidth channel, where the lip motion rendering is done at the receiving end, by only transmitting the audio channel, or in extremis even only an orthographic or phonetic transcription of the text together with precise phoneme timing information.

Book ChapterDOI
28 May 2002
TL;DR: In this article, different ways of representing the photometric changes in image intensities caused by changes in illumination and viewpoint, aiming at a balance between goodness-of-fit and low complexity.
Abstract: In this paper we compare different ways of representing the photometric changes in image intensities caused by changes in illumination and viewpoint, aiming at a balance between goodness-of-fit and low complexity. We derive invariant features based on generalized color moment invariants - that can deal with geometric and photometric changes of a planar pattern - corresponding to the chosen photometric models. The geometric changes correspond to a perspective skew. We compare the photometric models also in terms of the invariants' discriminative power and classification performance in a pattern recognition system.

Journal Article
TL;DR: A way to model and synthesize such "composite textures", where the layout of the different subtextures is itself modeled as a texture, which can be generated automatically, is proposed.
Abstract: Textures can often more easily be described as a composition of subtextures than as a single texture. The paper proposes a way to model and synthesize such "composite textures", where the layout of the different subtextures is itself modeled as a texture, which can be generated automatically. Examples are shown for building materials with an intricate structure and for the automatic creation of landscape textures. First, a model of the composite texture is generated. This procedure comprises manual or unsupervised texture segmentation to learn the spatial layout of the composite texture and the extraction of models for each of the subtextures. Synthesis of a composite texture includes the generation of a layout texture, which is subsequently filled in with the appropriate subtextures. This scheme is refined further by also including interactions between neighboring subtextures.

Journal Article
TL;DR: This work hypothesized that the Kanizsa figure illusion is due to the modification of the image according to the higher level depth interpretation, and implemented a feedback model based on a surface completion scheme that created a central surface and extended the contours from the inducers.

Journal Article
TL;DR: This paper derives invariant features based on generalized color moment invariants - that can deal with geometric and photometric changes of a planar pattern - corresponding to the chosen photometric models, corresponding to a perspective skew.
Abstract: In this paper we compare different ways of representing the photometric changes in image intensities caused by changes in illumination and viewpoint, aiming at a balance between goodness-of-fit and low complexity. We derive invariant features based on generalized color moment invariants - that can deal with geometric and photometric changes of a planar pattern - corresponding to the chosen photometric models. The geometric changes correspond to a perspective skew. We compare the photometric models also in terms of the invariants' discriminative power and classification performance in a pattern recognition system.