scispace - formally typeset
Search or ask a question

Showing papers by "Luc Van Gool published in 2005"


Book ChapterDOI
11 Apr 2005
TL;DR: The PASCAL Visual Object Classes Challenge (PASCALVOC) as mentioned in this paper was held from February to March 2005 to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects).
Abstract: The PASCAL Visual Object Classes Challenge ran from February to March 2005. The goal of the challenge was to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). Four object classes were selected: motorbikes, bicycles, cars and people. Twelve teams entered the challenge. In this chapter we provide details of the datasets, algorithms used by the teams, evaluation criteria, and results achieved.

381 citations


01 Jan 2005
TL;DR: A novel system for autonomous mobile robot navigation with only an omnidirectional camera as sensor is presented, able to build automatically and robustly accurate topologically organised environment maps of a complex, natural environment.
Abstract: In this work we present a novel system for autonomous mobile robot navigation. With only an omnidirectional camera as sensor, this system is able to build automatically and robustly accurate topologically organised environment maps of a complex, natural environment. It can localise itself using such a map at each moment, including both at startup (kidnapped robot) or using knowledge of former localisations. The topological nature of the map is similar to the intuitive maps humans use, is memory-efficient and enables fast and simple path planning towards a specified goal. We developed a real-time visual servoing technique to steer the system along the computed path. A key technology making this all possible is the novel fast wide baseline feature matching, which yields an efficient description of the scene, with a focus on man-made environments.

198 citations


Proceedings Article
01 Jan 2005
TL;DR: A GPU-based foreground-background segmentation that processes image sequences in less than 4ms per frame is presented, exploiting parallelism on modern graphics hardware and extending the colinearity criterion with compensation for dark foreground and background areas and thus improving overall performance.
Abstract: We present a GPU-based foreground-background segmentation that processes image sequences in less than 4ms per frame. Change detection wrt. the background is based on a color similarity test in a small pixel neighbourhood, and is integrated into a Bayesian estimation framework. An iterative MRFbased model is applied, exploiting parallelism on modern graphics hardware. Resulting segmentation exhibits compactness and smoothness in foreground areas as well as for inter-frame temporal contiguity. Further renements extend the colinearity criterion with compensation for dark foreground and background areas and thus improving overall performance.

65 citations


01 Jan 2005
TL;DR: A new fully automated cuneiform tablet digitizing solution is presented which is relatively inexpensive and easily field-deployable, and allows for photorealistic virtual re-lighting and non-photorealistic rendering of the tab lets in real-time through the use of programmable graphics hardware.
Abstract: Modern researchers in the field of ancient Mesopotamian studies read th ir primary sources, cuneiform tablets, either manually, by moving lights around the tablet to maximize readability, or b y studying photographs (or drawn copies) when the actual tablet is not at hand. Although the latter method only h lds partial information, and is therefore less desirable, it is often the only available resource due to the inac cessibility of tablet collections. Recently, several digitizing projects have been proposed to provide acc urate 2D+ and 3D models of these tablets for digital preservation. However, these methods require manual intera c ion or are not available to many research groups due to their cost. Furthermore, the digitizing device should be quick ly deployable on-site, have an easy calibration procedure and should not have any moving parts which could be problematic in difficult circumstances. We therefore present a new fully automated cuneiform tablet digitizing solution th at is relatively inexpensive and easily field-deployable. The solution consists of a small, light-weight dome o f light sources and a digital camera. 2D+ representations of the tablets are created by use of photometric stere o. Th obtained information allows for photorealistic virtual re-lighting and non-photorealistic rendering of the tab lets in real-time through the use of programmable graphics hardware.

42 citations


01 Jan 2005
TL;DR: This paper proposes a method for reexecution of these paths by a series of visual homing operations that yields a navigation method with unique properties: it is accurate, robust, fast, and without odometry error build-up.
Abstract: Omnidirectional vision sensors are very attractive for autonomous robots because they offer a rich source of environment information. The main challenge in using these for mobile robots is managing this wealth of information. A relatively recent approach is the use of fast wide baseline local features, which we developed and use in the novel sparse visual path following method described in this paper. These local features have the great advantage that they can be recognized even if the viewpoint differs significantly. This opens the door to a memory efficient description of a path by sparsely sampling it with images. We propose a method for reexecution of these paths by a series of visual homing operations. Motion estimation is done by simultaneously tracking the set of features, with recovery of lost features by backprojecting them from a local sparse 3D feature map. This yields a navigation method with unique properties: it is accurate, robust, fast, and without odometry error build-up.

38 citations


Book ChapterDOI
16 Oct 2005
TL;DR: A new method for face modeling and face recognition from a pair of calibrated stereo cameras, which builds a stereo reconstruction of the face by adjusting the global transformation parameters and the shape parameters of a 3D morphable face model.
Abstract: This paper presents a new method for face modeling and face recognition from a pair of calibrated stereo cameras. In a first step, the algorithm builds a stereo reconstruction of the face by adjusting the global transformation parameters and the shape parameters of a 3D morphable face model. The adjustment of the parameters is such that stereo correspondence between both images is established, i.e. such that the 3D-vertices of the model project on similarly colored pixels in both images. In a second step, the texture information is extracted from the image pair and represented in the texture space of the morphable face model. The resulting shape and texture coefficients form a person specific feature vector and face recognition is performed by comparing query vectors with stored vectors. To validate our algorithm, an extensive image database was built. It consists of stereo-pairs of 70 subjects. For recognition testing, the subjects were recorded under 6 different head directions, ranging from a frontal to a profile view. The face recognition results are very good, with 100% recognition on frontal views and 97% recognition on half-profile views.

32 citations


Proceedings Article
01 Jan 2005
TL;DR: The prototype of an interactive museum guide that runs on a tablet PC that features a touchscreen, a webcam and a Bluetooth receiver that recognises objects on display in museums based on images of the latter taken directly by the visitor.
Abstract: In this paper, we describe the prototype of an interactive museum guide. It runs on a tablet PC that features a touchscreen, a webcam and a Bluetooth receiver. This guide recognises objects on display in museums based on images of the latter which are taken directly by the visitor. Furthermore, the computer can determine the visitor’s location by receiving signals emitted from Bluetooth senders in the museum, so called BTnodes. This information is used to reduce the search space for the extraction of relevant objects. Hence, the recognition accuracy is increased and the search time reduced. Moreover, this information can be used to indicate the user’s current location in the museum. The prototype has been demonstrated to visitors of the Swiss National Museum in Zurich.

29 citations


01 Jan 2005
TL;DR: A key element of this approach are viewpoint robustness of reflectance features, which are proposed and compared for 3D reconstruction.
Abstract: Textures often result from complicated surface geometries. We propose a method that extracts such geometry from raw BTF data. Exploiting the huge amount of input data, only a few and rather weak assumptions about reflectance and geometry suffice. A key element of our approach are viewpoint robustness of reflectance features. We propose a few and compare them for 3D reconstruction.

23 citations


Proceedings ArticleDOI
05 Jan 2005
TL;DR: A monocular object tracker, able to detect and track multiple objects in non-controlled environments, that adapts to changing lighting conditions, handles occlusions, and works in real-time.
Abstract: We present a monocular object tracker, able to detect and track multiple objects in non-controlled environments. Bayesian per-pixel classification is used to build a tracking framework that segments an image into foreground and background objects, based on observations of object appearances and motions. Gaussian mixtures are used to build the color appearance models. The system adapts to changing lighting conditions, handles occlusions, and works in real-time.

18 citations


Proceedings ArticleDOI
TL;DR: An algorithm for efficient image synthesis to generate realistic virtual views of a dynamic scene from a new camera viewpoint on video-conferencing applications using a combined approach of CPU and GPU processing.
Abstract: This paper presents an algorithm for efficient image synthesis. The main goal is to generate realistic virtual views of a dynamic scene from a new camera viewpoint. The algorithm works online on two or more incoming video streams from calibrated cameras. A reasonably large distance between the cameras is allowed. The main focus is on video-conferencing applications. The background is assumed to be static, as is often the case in such environments. By performing a foreground segmentation, the foreground and the background can be handled separately. For the background a slower, more accurate algorithm can be used. Reaching a high throughput is most crucial for the foreground, as this is the dynamic part of the scene. We use a combined approach of CPU and GPU processing. Performing depth calculations on the GPU is very efficient, thanks to the possibilities of the latest graphical boards. However the result tends to be rather noisy. As such we apply a regularisation algorithm on the CPU to ameliorate this result. The final interpolation is again provided by rendering on the graphical board. The big advantage of using both CPU and GPU is that they can run completely in parallel. This can be realised by an implementation using multiple threads. As such different algorithms can be applied to two frames simultaneously and the total throughput is increased.

11 citations


Book ChapterDOI
21 Oct 2005
TL;DR: The paper will describe how the user is observed, the 3D geometry involved, and the calibration steps necessary to set up the system, making the system adaptive and accessible to non-expert users.
Abstract: This paper describes the development of a real-time perceptive user interface. Two cameras are used to detect a user's head, eyes, hand, fingers and gestures. These cues are interpreted to control a user interface on a large screen. The result is a fully functional integrated system that processes roughly 7.5 frames per second on a Pentium IV system. The calibration of this setup is carried out through a few simple and intuitive routines, making the system adaptive and accessible to non-expert users. The minimal hardware requirements are two web-cams and a computer. The paper will describe how the user is observed (head, eye, hand and finger detection, gesture recognition), the 3D geometry involved, and the calibration steps necessary to set up the system.

01 Jan 2005
TL;DR: This work proposes a novel markerless solution to full body pose tracking by integrating multiple cues such as edges, color information and volumetric reconstruction by using Stochastic Meta Descent while taking advantage of the color information to overcome ambiguities caused by limbs touching each other.
Abstract: This work proposes a novel markerless solution to full body pose tracking by integrating multiple cues such as edges, color information and volumetric reconstruction. A model built from superellipsoids is fitted to a colored volumetric reconstruction by using Stochastic Meta Descent (SMD) while taking advantage of the color information to overcome ambiguities caused by limbs touching each other. As the volumetric reconstruction is inaccurate at times, the tracking is refined by matching model contours against image edges. The model consists of a set of superellipsoids. For the matching, we introduce a way to efficiently render the contours of superellipsoids, incl. self-occlusions. The integration of edge information is demonstrated to increase the robustness and accuracy of the tracking. Several challenging body tracking sequences, showing complex movements and full articulation, illustrate this.

Proceedings ArticleDOI
05 Jan 2005
TL;DR: This article shows how a learned statistical model can be used to make a prediction of the unknown (occluded) features and how the technique can be applied to the analysis of human loco-motion, when body parts are occluded.
Abstract: A lot of computer vision applications have to deal with occlusions. In such settings only a subset of the features of interest can be observed, i.e. only incomplete or partial measurements are available. In this article we show how a learned statistical model can be used to make a prediction of the unknown (occluded) features. The probabilistic nature of the framework also allows to compute the remaining uncertainty given an incomplete observation. The resulting posterior probability distribution can then be used for inference. Additional unknowns such as alignment or scale are easily incorporated into the framework. Instead of computing the alignment in a preprocessing step, it is left as an additional uncertainty, similar to the uncertainty introduced by the missing values of the measurement. It is shown how the technique can be applied to the analysis of human loco-motion, when body parts are occluded. Experiments show how the unobserved body locations are predicted and how it can be inferred whether the measurements come from a running or walking sequence.

Journal ArticleDOI
TL;DR: This issue is organised into three sections and proposes and experimentally validate two different models of texture, while the third paper exploits shading and shadowing in surface reconstruction.
Abstract: The motivation for this special issue is twofold. Firstly we wished to provide an issue that combines papers on texture analysis with papers on texture synthesis. By analysis we mean the processing of images of texture to produce a set of measurements. These measurements could concern: statistics such as average surface roughness or directionality; the class of texture to which the surface belongs; or the mapping of the surface into different regions using segmentation algorithms. By synthesis we mean the generation of descriptions or images of surface texture for computer graphics applications. Typical examples concern the generation of large areas of surface texture that are perceptually identical to the small samples from which they have been generated. Secondly we wanted to provide a showcase for recent research on three-dimensional surface texture. Critically the images of such surfaces are not purely a function of the colour albedo pattern. Rather, spatial variation of relief and reflectance mean that shadows, highlights and occlusion all affect the resulting imagery. Thus images are a function of viewpoint and illumination condition as well as surface texture—and so texture synthesis and analysis must take these effects into account. These issues are timely also when considered in the wider context of image-based rendering, of which texture mapping in a way was a precursor. This active field is a good example of the confluence of computer vision and graphics. This issue is organised into three sections. The first set of papers does not specifically address either analysis or synthesis as defined above but may be used for either. The first two of these papers propose and experimentally validate two different models of texture, while the third paper exploits shading and shadowing in surface reconstruction. The next three papers are all concerned with the classification of three-dimensional surface textures while the last four address texture synthesis issues.

Proceedings ArticleDOI
TL;DR: Experiments demonstrate the feasibility of the proposed approach to 3D human body tracking and show that the high speeds that are required due to the closed feedback loop can be achieved.
Abstract: In this paper a new approach to 3D human body tracking is proposed. A sparse 3D reconstruction of the subject to be tracked is made using a structured light system consisting of a precalibrated LCD projector and a camera. At a number of points-of-interest, easily detectable features are projected. The resulting sparse 3D reconstruction is used to estimate the body pose of the tracked person. This new estimate of the body pose is then fed back to the structured light system and allows to adapt the projected patterns, i.e. decide where to project features. Given the observations, a physical simulation is used to estimate the body pose by attaching forces to the limbs of the body model. The sparse 3D observations are augmented by denser silhouette information, in order to make the tracking more robust. Experiments demonstrate the feasibility of the proposed approach and show that the high speeds that are required due to the closed feedback loop can be achieved.

01 Jan 2005
TL;DR: A new system for recognition, tracking and pose estimation of people in video sequences based on the wavelet transform from the upper body part and uses Support Vector Machines (SVM) for classification.
Abstract: This paper presents a new system for recognition, tracking and pose estimation of people in video sequences. It is based on the wavelet transform from the upper body part and uses Support Vector Machines (SVM) for classification. Recognition is carried out hierarchically by first recognizing people and then individual characters. The characteristic features that best discriminate one person from another are learned automatically. Tracking is solved via a particle filter that utilizes the SVM output and a first order kinematic model to obtain a robust scheme that successfully handles occlusion, different poses and camera zooms. For pose estimation a collection of SVM classifiers is evaluated to detect specific, learned poses.

Proceedings ArticleDOI
01 Jan 2005
TL;DR: This work presents work which combines the requests for low-latency, low-cost synchronisation within a multi-camera-projector system for rapid 3D-scanning of human bodies under motion.
Abstract: Cluster-based architectures are very popular in the construction of versatile computer vision and graphics applications. Hereby, computers are connected over a network to perform collaborative processing. Systems which include cameras demand for accurate synchronisation as well as low latencies for short-message data transfers. We present work which combines the requests for low-latency, low-cost synchronisation within a multi-camera-projector system for rapid 3D-scanning of human bodies under motion. Key to the setup is a flexible interface controller connected to each network computer. Data packages as well as trigger stimuli and graphics synchronisation signals are quickly distributed to the addressed client machines. A real-time software framework allows for very low latencies that are needed for high-speed parallel processing.


01 Jan 2005
TL;DR: It is concluded that 2D abstract models and 3D views provide different, but complementing, benefits in the analysis of an archaeological excavation, and it is believed that linked 2D and3D views should be available.
Abstract: This paper will present the first results of a new approach to recording and visualising archaeological excavations using integrated 3D and Harris Matrix data entry, query and visualisations tools. Accurate records of stratigraphie sequences in an archaeological excavation are crucial for post-excavation analysis. Traditional recording techniques capture 2D or 2.5D surface plans ofstratigraphic units. Relationships between units are recorded and the sequence is visualised as a 2D abstract model, the Harris Matrix. Several software tools have been developed to assist in this task, replacing earlier time-consuming and error-prone paper-based methods. Recent progress in photogramtnetry and other 3D recording techniques has also made it po.ssible to visualise excavated layers in a 3D space. Computer technology has thus developed to incorporate photogrammetric models, enabling archaeologists to view and analyse excavations within the 3D world in which they work. A review of existing tools has shown that whilst each approach to visualising excavated layers has particular strengths, individually they do not provide a level ofunder.itanding that is required for a 'complete picture \ A computer generated Harris Matrix diagram is essential for understanding stratigraphie relationships, whilst a 3D model is extremely effective for the visual comparison of the form and structural relationships of these layers. We conclude that 2D abstract models and 3D views provide different, but complementaiy, benefits in the analysis of an archaeological excavation. For archaeologists to significantly benefit from both of these tools, we believe that linked 2D and 3D views should be available This paper describes a first attempt to provide such linking. Two tools providing suitable visualisations are the jnet graph tool and the Stratigraphie Visualisation Tool (STRAT). Jnet is a 2D Harris Matrix tool that allows the user to analyse stratigraphie relationships between layers and manipulate data. The STRA T tool is a 3D world in which archaeologists can navigate and explore in detail the layers of an excavation. The integration tool uses XML to communicate between jnet and STRAT providing a standard description method to facilitate the data exchange. XML is the native data format for jnet, so it provides seamless software mapping between the two tools. Import aitd export software incorporated within both STRAT and jnet transforms and stores this data in a structure suitable for exchange between the 2D (jnet) and 3D (STRAT) applications. The result of this software solution is a flexible composite software tool allowing t^vo different views of a site, which archaeologists can use to model, view and analyse their excavations more effectively A test excavation was carried out in Sagalassos (Turkey) in the summer of 2004. After documenting and registering the .stratigraphie data on site, it was entered into the new tool. Sections of a Harris Matrix, such as a particular trench, can be viewed to establish relationships between strata. Navigation in 3D within a trench permits viewing from all angles and replaying through the stratigraphie sequence. The results, presented in this paper show the high potential of this approach for future archaeological research.


Book ChapterDOI
21 Oct 2005
TL;DR: This paper presents a new interactive teleconferencing system which adds a ‘virtual' camera to the scene which can move freely in between multiple real cameras, which produces a clearer and more engaging view for the remote audience, without the need for a human editor.
Abstract: This paper presents a new interactive teleconferencing system. It adds a ‘virtual' camera to the scene which can move freely in between multiple real cameras. The viewpoint can automatically be selected using basic cinematographic rules, based on the position and the actions of the instructor. This produces a clearer and more engaging view for the remote audience, without the need for a human editor. For the creation of the novel views generated by such a ‘virtual' camera, segmentation and depth calculations are required. The system is semi-automatic, in that the user is asked to indicate a few corresponding points or edges for generating an initial rough background model. Next to the static background and moving foreground also multiple independently moving objects are catered for. The initial foreground contour is tracked over time, using a new active contour. If a second object appears, the contour prediction allows to recognize this situation and to take appropriate measures. The 3D models are continuously validated based on a Birchfield dissimilarity measure. The foreground model is updated every frame, the background is refined if necessary. The current implementation can reach approx 4 fps on a single desktop.

01 Jan 2005
TL;DR: An Imaging and Localization Package (ILP) is described which is capable of performing the computer vision processing described above and makes use of these algorithms to compute both the calibration of the camera and the 3D reconstruction of the terrain.
Abstract: Aerial platforms can become an integral part of surface exploration missions on planets or moons with an atmosphere, like Venus, Mars or Titan. One of the most immediate applications for aerobots is ultra-high resolution imaging over extensive areas of the planet. Planetary Aerobot missions could prove very useful in the automatic detection of geological features and selection of possible landing sites for a planetary exploration mission. The Aerobot system can travel across many areas on the planet and send the appropriate data back to Earth. Unfortunately the bandwidth available for data transmission from the planet to Earth is very limited. These bandwidth restrictions would require, if the Aerobot System were to transmit all collected imagery of the planet, heavy compression on the images. This compression would definitely hamper the scientists on Earth to determine the interesting areas. The computer vision algorithms used to reconstruct the terrain in 3D would be disadvantaged as well. It is therefore imperative that the Aerobot System has some degree of autonomy and can perform computer vision operations on its own which makes it possible to detect interesting geological features on the spacecraft. During the execution of its mission, the Aerobot System has access to the uncompressed imagery, taken by its camera(s). It is therefore recommended to perform all critical computer vision processing on the system itself before the images are polluted by compression. The generation of Digital Elevation Maps is clearly a computer vision process that suffers from compression. Generation of these maps on the Aerobot and sending them, together with the compressed reconstructed parts of the images, will lead to a much better understanding of the observed areas, for the same amount of expended bandwidth. In this paper, an Imaging and Localization Package (ILP) is described which is capable of performing the computer vision processing described above. All data collected by the Aerobot needs to be correlated with the position in which the measurement was acquired. On long duration missions, the Aerobot can not rely on localization performed by an orbiter or from ground; it must have its own means. During the last decade the computer vision community has made tremendous progress in acquiring 3D information from images taken by uncalibrated cameras, while at the same time self-calibrating the camera. The ILP makes use of these algorithms to compute both the calibration of the camera and the 3D reconstruction of the terrain. The specifics of an Aerobot mission, like almost linear motion of the camera, almost planar terrains, etc, require changes and new techniques in the reconstruction pipeline. Once calibration and reconstruction have been computed, scoring techniques can be applied to the data which now comprises of not only the images but of 3D information as well. The scoring algorithms should be written such that high scores for an image correspond to a large chance on interesting geological features to be found in that specific image. These scores can

Journal Article
TL;DR: In this article, the authors propose a novel affine invariant region type, built up from a combination of fitted superellipses, which offers a much wider range of shapes through the addition of a very limited number of shape parameters, with the traditional ellipses and parallelograms as subsets.
Abstract: Affine invariant regions have proved a powerful feature for object recognition and categorization. These features heavily rely on object textures rather than shapes, however. Typically, their shapes have been fixed to ellipses or parallelograms. The paper proposes a novel affine invariant region type, that is built up from a combination of fitted superellipses. These novel features have the advantage of offering a much wider range of shapes through the addition of a very limited number of shape parameters, with the traditional ellipses and parallelograms as subsets. The paper offers a solution for the robust fitting of superellipses to partial contours, which is a crucial step towards the implementation of the novel features.