scispace - formally typeset
Search or ask a question

Showing papers by "Luc Van Gool published in 2006"


Book ChapterDOI
07 May 2006
TL;DR: A novel scale- and rotation-invariant interest point detector and descriptor, coined SURF (Speeded Up Robust Features), which approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster.
Abstract: In this paper, we present a novel scale- and rotation-invariant interest point detector and descriptor, coined SURF (Speeded Up Robust Features). It approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster. This is achieved by relying on integral images for image convolutions; by building on the strengths of the leading existing detectors and descriptors (in casu, using a Hessian matrix-based measure for the detector, and a distribution-based descriptor); and by simplifying these methods to the essential. This leads to a combination of novel detection, description, and matching steps. The paper presents experimental results on a standard evaluation set, as well as on imagery obtained in the context of a real-life object recognition application. Both show SURF's strong performance.

13,011 citations


Journal ArticleDOI
01 Jul 2006
TL;DR: CGA shape is shown to efficiently generate massive urban models with unprecedented level of detail, with the virtual rebuilding of the archaeological site of Pompeii as a case in point.
Abstract: CGA shape, a novel shape grammar for the procedural modeling of CG architecture, produces building shells with high visual quality and geometric detail. It produces extensive architectural models for computer games and movies, at low cost. Context sensitive shape rules allow the user to specify interactions between the entities of the hierarchical shape descriptions. Selected examples demonstrate solutions to previously unsolved modeling problems, especially to consistent mass modeling with volumetric shapes of arbitrary orientation. CGA shape is shown to efficiently generate massive urban models with unprecedented level of detail, with the virtual rebuilding of the archaeological site of Pompeii as a case in point.

1,073 citations


Book ChapterDOI
07 May 2006
TL;DR: An extensive experimental evaluation on detecting five diverse object classes over hundreds of images demonstrates that the proposed method works in very cluttered images, allows for scale changes and considerable intra-class shape variation, is robust to interrupted contours, and is computationally efficient.
Abstract: We propose a method for object detection in cluttered real images, given a single hand-drawn example as model. The image edges are partitioned into contour segments and organized in an image representation which encodes their interconnections: the Contour Segment Network. The object detection problem is formulated as finding paths through the network resembling the model outlines, and a computationally efficient detection technique is presented. An extensive experimental evaluation on detecting five diverse object classes over hundreds of images demonstrates that our method works in very cluttered images, allows for scale changes and considerable intra-class shape variation, is robust to interrupted contours, and is computationally efficient.

317 citations



Journal ArticleDOI
30 Oct 2006
TL;DR: A web-based 3D reconstruction service, developed to relieve those needs of the cultural heritage field, consisting of a pipeline that starts with the user uploading images of an object or scene(s) he wants to reconstruct in 3D.
Abstract: The use of 3D information in the field of cultural heritage is increasing year by year. From this field comes a large demand for cheaper and more flexible ways of 3D reconstruction. This paper describes a web-based 3D reconstruction service, developed to relieve those needs of the cultural heritage field. This service consists of a pipeline that starts with the user uploading images of an object or scene(s) he wants to reconstruct in 3D. The automatic reconstruction process, running on a server connected to a cluster of computers, computes the camera calibration, as well as dense depth (or range-) maps for the images. This result can be downloaded from an ftp server and visualized with a specific tool running on the user’s PC.

254 citations


Proceedings Article
01 Jan 2006
TL;DR: In this paper, a novel object recognition approach based on affine invariant regions is presented, which actively counters the problems related to the limited repeatability of the region detectors, and the difficulty of matching, in the presence of large amounts of background clutter and particularly challenging viewing conditions.
Abstract: We present a novel Object Recognition approach based on affine invariant regions. It actively counters the problems related to the limited repeatability of the region detectors, and the difficulty of matching, in the presence of large amounts of background clutter and particularly challenging viewing conditions. After producing an initial set of matches, the method gradually explores the surrounding image areas, recursively constructing more and more matching regions, increasingly farther from the initial ones. This process covers the object with matches, and simultaneously separates the correct matches from the wrong ones. Hence, recognition and segmentation are achieved at the same time. The approach includes a mechanism for capturing the relationships between multiple model views and exploiting these for integrating the contributions of the views at recognition time. This is based on an efficient algorithm for partitioning a set of region matches into groups lying on smooth surfaces. Integration is achieved by measuring the consistency of configurations of groups arising from different model views. Experimental results demonstrate the stronger power of the approach in dealing with extensive clutter, dominant occlusion, and large scale and viewpoint changes. Non-rigid deformations are explicitly taken into account, and the approximative contours of the object are produced. All presented techniques can extend any view-point invariant feature extractor.

190 citations


Journal ArticleDOI
TL;DR: In this paper, a novel object recognition approach based on affine invariant regions is presented, which actively counters the problems related to the limited repeatability of the region detectors, and the difficulty of matching, in the presence of large amounts of background clutter and particularly challenging viewing conditions.
Abstract: We present a novel Object Recognition approach based on affine invariant regions. It actively counters the problems related to the limited repeatability of the region detectors, and the difficulty of matching, in the presence of large amounts of background clutter and particularly challenging viewing conditions. After producing an initial set of matches, the method gradually explores the surrounding image areas, recursively constructing more and more matching regions, increasingly farther from the initial ones. This process covers the object with matches, and simultaneously separates the correct matches from the wrong ones. Hence, recognition and segmentation are achieved at the same time. The approach includes a mechanism for capturing the relationships between multiple model views and exploiting these for integrating the contributions of the views at recognition time. This is based on an efficient algorithm for partitioning a set of region matches into groups lying on smooth surfaces. Integration is achieved by measuring the consistency of configurations of groups arising from different model views. Experimental results demonstrate the stronger power of the approach in dealing with extensive clutter, dominant occlusion, and large scale and viewpoint changes. Non-rigid deformations are explicitly taken into account, and the approximative contours of the object are produced. All presented techniques can extend any view-point invariant feature extractor.

186 citations


Book ChapterDOI
TL;DR: The approach can extend any viewpoint invariant feature extractor and covers the object with matches, and simultaneously separates the correct matches from the wrong ones, and approximative contours of the object are produced.
Abstract: Methods based on local, viewpoint invariant features have proven capable of recognizing objects in spite of viewpoint changes, occlusion and clutter. However, these approaches fail when these factors are too strong, due to the limited repeatability and discriminative power of the features. As additional shortcomings, the objects need to be rigid and only their approximate location is found. We present an object recognition approach which overcomes these limitations. An initial set of feature correspondences is first generated. The method anchors on it and then gradually explores the surrounding area, trying to construct more and more matching features, increasingly farther from the initial ones. The resulting process covers the object with matches, and simultaneously separates the correct matches from the wrong ones. Hence, recognition and segmentation are achieved at the same time. Only very few correct initial matches suffice for reliable recognition. Experimental results on still images and television news broadcasts demonstrate the stronger power of the presented method in dealing with extensive clutter, dominant occlusion, large scale and viewpoint changes. Moreover non-rigid deformations are explicitly taken into account, and the approximative contours of the object are produced. The approach can extend any viewpoint invariant feature extractor.

159 citations


Journal ArticleDOI
TL;DR: It is shown that a combination of multiple image cues helps the tracker to overcome ambiguous situations such as limbs touching or strong occlusions of body parts, and stochastic sampling makes SMD robust against local minima and lowers the computational costs as a small set of predicted image features is sufficient for optimization.

130 citations


01 Jan 2006
TL;DR: It is demonstrated that both the object recognition performance as well as the speed of the SURF algorithm surpasses the results obtained with SIFT, its main contender.
Abstract: In this paper, we describe the application of the novel SURF (Speeded Up Robust Features) algorithm [1] for the recognition of objects of art. For this purpose, we developed a prototype of a mobile interactive museum guide consisting of a tablet PC that features a touchscreen and a webcam. This guide recognises objects in museums based on images taken by the visitor. Using different image sets of real museum objects, we demonstrate that both the object recognition performance as well as the speed of the SURF algorithm surpasses the results obtained with SIFT, its main contender.

123 citations


Journal Article
TL;DR: In this article, the image edges are partitioned into contour segments and organized in an image representation which encodes their interconnections, and the object detection problem is formulated as finding paths through the network resembling the model outlines, and a computationally efficient detection technique is presented.
Abstract: We propose a method for object detection in cluttered real images, given a single hand-drawn example as model. The image edges are partitioned into contour segments and organized in an image representation which encodes their interconnections: the Contour Segment Network. The object detection problem is formulated as finding paths through the network resembling the model outlines, and a computationally efficient detection technique is presented. An extensive experimental evaluation on detecting five diverse object classes over hundreds of images demonstrates that our method works in very cluttered images, allows for scale changes and considerable intra-class shape variation, is robust to interrupted contours, and is computationally efficient.

Book ChapterDOI
01 Oct 2006
TL;DR: This paper presents a core image processing element of a fully automatic laser photocoagulation system, namely a novel approach for retina mosaicing, which relies on recent developments in region detection and feature description to automatically fuse retina images.
Abstract: Laser photocoagulation is a proven procedure to treat various pathologies of the retina. Challenges such as motion compensation, correct energy dosage, and avoiding incidental damage are responsible for the still low success rate. They can be overcome with improved instrumentation, such as a fully automatic laser photocoagulation system. In this paper, we present a core image processing element of such a system, namely a novel approach for retina mosaicing. Our method relies on recent developments in region detection and feature description to automatically fuse retina images. In contrast to the state-of-the-art the proposed approach works even for retina images with no discernable vascularity. Moreover, an efficient scheme to determine the blending masks of arbitrarily overlapping images for multi-band blending is presented.

Proceedings ArticleDOI
30 Oct 2006
TL;DR: This paper examines how architectural shape grammars can be used to procedurally generate 3D reconstructions of an archaeological site, using the Puuc-style buildings found in Xkipche, Mexico, as a test-case.
Abstract: This paper examines how architectural shape grammars can be used to procedurally generate 3D reconstructions of an archaeological site. The Puuc-style buildings found in Xkipche, Mexico, were used as a test-case. We first introduce the ancient Mayan site of Xkipche and give an overview of the building types as distinguished by the archaeologists, based on excavations and surveys of the building remains at the surface. Secondly, we outline the elements of the building design that are characteristic of the Puuc architecture. For the creation of the actual building geometries, we further determine the shape grammar rules for the different architectural parts. The modeling system can then be used to reconstruct the whole site based on various GIS (Geographical Information Systems) data given as input, such as building footprints, architectural information, and elevation. The results demonstrate that our modeling system is, in contrast to traditional 3D modeling, able to efficiently construct a large number of high quality geometric models at low cost.

Book
01 Jan 2006
TL;DR: Thematic projects integrating different approaches for 3D modeling, spatial information systems, reconstruction and visualization, and animation.
Abstract: Preface Information management and networks Multidisciplinary projects integrating different approaches Remote sensing Laser scanning New and integrated technologies for 3D modeling Data acquisition for documentation Mobile mapping and site modeling Spatial information systems Theory and methods of 3D modeling Archaeological landscapes Reconstruction and visualization Visualization and animation Virtual and augmented reality Short contributions Color plates List of participants Author index

Book ChapterDOI
13 Jul 2006
TL;DR: In this article, a method for mining frequently occurring objects and scenes from videos is presented, which is based on the class of frequent itemset mining algorithms, which have proven their efficiency in other domains, but have not been applied to video mining before.
Abstract: We present a method for mining frequently occurring objects and scenes from videos. Object candidates are detected by finding recurring spatial arrangements of affine covariant regions. Our mining method is based on the class of frequent itemset mining algorithms, which have proven their efficiency in other domains, but have not been applied to video mining before. In this work we show how to express vector-quantized features and their spatial relations as itemsets. Furthermore, a fast motion segmentation method is introduced as an attention filter for the mining algorithm. Results are shown on real world data consisting of music video clips.

Book ChapterDOI
Beat Fasel1, Luc Van Gool1
27 Jul 2006
TL;DR: The proposed interactive museum guide achieves object recognition via image matching and thus allows the use of model sets that do not need to be segmented and a postprocessing strategy that allows to improve object recognition rates by suppressing multiple matches.
Abstract: In this paper we describe an interactive guide that is able to automatically retrieve information about objects on display in museums. A visitor can point this mobile device at exhibits and automatically retrieve descriptions about objects of interest in a non-distractive way. We investigate Gaussian image intensity attenuation and a foveation-based preprocessing approach which both allow to focus interest point extraction towards the center of an image. Furthermore, we describe a postprocessing strategy that allows to improve object recognition rates by suppressing multiple matches. The proposed interactive museum guide achieves object recognition via image matching and thus allows the use of model sets that do not need to be segmented.

Proceedings ArticleDOI
14 Jun 2006
TL;DR: This paper addresses the problem of camera self-calibration, bundle adjustment and 3D reconstruction from line segments in two images of poorly-textured indoor scenes with a new method to identify polyhedral junctions resulting from the intersections of the line segments.
Abstract: This paper addresses the problem of camera self-calibration, bundle adjustment and 3D reconstruction from line segments in two images of poorly-textured indoor scenes. First, we generate line segment correspondences, using an extended version of our previously proposed matching scheme. The first main contribution is a new method to identify polyhedral junctions resulting from the intersections of the line segments. At the same time, the images are segmented into planar polygons. This is done using an algorithm based on a Binary Space Partitioning (BSP) tree. The junctions are matched end points of the detected line segments and hence can be used to obtain the epipolar geometry. The essential matrix is considered for metric camera calibration. For better stability, the second main contribution consists in a bundle adjustment on the line segments and the camera parameters that reduces the number of unknowns by a maximum flow algorithm. Finally, a piecewise-planar 3D reconstruction is computed based on the segmentation of the BSP tree. The system?s performance is tested on some challenging examples.

Journal ArticleDOI
30 Oct 2006
TL;DR: In this first issue of Machine Vision and Applications to Cultural Heritage (CH), the reader finds papers describing several state-of-the-art 3D acquisition techniques, designed or at least selected for CH applications.
Abstract: Computer Graphics has been used to aid in the study and presentation of Cultural Heritage for a long time now. Increasingly, also Computer Vision starts to play its role, as a matter of fact often in combination with graphics. This has lead the International Association for Pattern Recognition (IAPR) to create its Technical Committee 19 ‘Computer Vision for Cultural Heritage Applications’ (http://iapr-tc19.prip.tuwien.ac.at/), co-chaired by the editors of this special issue. About the same time, the creation of the European Network of Excellence EPOCH (Excellence in the Processing of Open Cultural Heritage – http://www.epoch-net.org) has given this specific domain of research an additional impetus in Europe, but also beyond. It therefore seemed timely to devote one or more special issues of Machine Vision and Applications to Cultural Heritage (CH). In this first issue, the reader finds papers describing several state-of-the-art 3D acquisition techniques, designed or at least selected for CH applications.

Proceedings Article
01 Jan 2006
Abstract: We propose a method for computing the absolute distances to static obstacles using a single omnidirectional camera. The method is applied to mobile robots. We achieve this without restricting the application to predetermined translations or the use of artificial markers. In contrast to prior work, our method is able to build absolute scale 3D without the need of a known baseline length, traditionally acquired by odometry. Instead we use the ground plane assumption together with the camera system's height to determine the scale factor. Using only one omnidirectional camera our method is cheaper, more informative and more compact than the traditional methods for distance determination, especially when a robot is already equipped with a camera for e.g. navigation. It also provides more information since it determines distances in a 3D space instead of in one plane. The experiments show promising results. The algorithm is indeed capable of determining the distances in meters to features and obstacles and is able to locate all major obstacles in the scene.

Book ChapterDOI
12 Sep 2006
TL;DR: A practical system for vision-based traffic scene analysis from a moving vehicle based on a cognitive feedback loop which integrates real-time geometry estimation with appearance-based object detection and enables the construction of novel capabilities such as the accurate 3D estimation of object locations and orientations and their temporal integration in a world coordinate frame.
Abstract: This paper presents a practical system for vision-based traffic scene analysis from a moving vehicle based on a cognitive feedback loop which integrates real-time geometry estimation with appearance-based object detection We demonstrate how those two components can benefit from each other's continuous input and how the transferred knowledge can be used to improve scene analysis Thus, scene interpretation is not left as a matter of logical reasoning, but is instead addressed by the repeated interaction and consistency checks between different levels and modes of visual processing As our results show, the proposed tight integration significantly increases recognition performance, as well as overall system robustness In addition, it enables the construction of novel capabilities such as the accurate 3D estimation of object locations and orientations and their temporal integration in a world coordinate frame The system is evaluated on a challenging real-world car detection task in an urban scenario

Proceedings ArticleDOI
30 Oct 2006
TL;DR: A system prototype for self-determination and privacy enhancement in video surveilled areas by integrating computer vision and cryptographic techniques into networked building automation systems is presented.
Abstract: We present a system prototype for self-determination and privacy enhancement in video surveilled areas by integrating computer vision and cryptographic techniques into networked building automation systems. This paper describes research work that has been done within the first half of the collaborative blue-c-II project and is conducted by an interdisciplinary team of researchers. Persons in a video stream control their visibility on a per-viewer base and can choose to allow either the real view or an obscured image to be seen. The parts of the video stream that show a person are protected by an AES cipher and can be sent over untrusted networks. Experimental results are presented by the example of a meeting room scenario. The paper concludes with remarks on the usability and encountered problems.

Book ChapterDOI
11 Jul 2006
TL;DR: In this article, the joint probability distribution of appearance and body pose using a mixture of view-dependent models is learned for monocular human body tracking using learned models, which can capture multimodal and nonlinear relationships reliably.
Abstract: This paper considers the problem of monocular human body tracking using learned models. We propose to learn the joint probability distribution of appearance and body pose using a mixture of view-dependent models. In such a way the multimodal and nonlinear relationships can be captured reliably. We formulate inference algorithms that are based on generative models while exploiting the advantages of a learned model when compared to the traditionally used geometric body models. Given static images or sequences, body poses and bounding box locations are inferred using silhouette based image descriptors. Prior information about likely body poses and a motion model are taken into account. We consider analytical computations and Monte-Carlo techniques, as well as a combination of both. In a Rao-Blackwellised particle filter, the tracking problem is partitioned into a part that is solved analytically, and a part that is solved with particle filtering. Tracking results are reported for human locomotion

01 Jan 2006
TL;DR: A new vision based method for robot localization using an omnidirectional camera that gets accurate metric localization based on a minimal reference image set, using the 1D three view geometry.
Abstract: In this paper we propose a new vision based method for robot localization using an omnidirectional camera. The method has three steps efficiently combined to deal with big reference image sets, each step evaluates less images than the previous but is more complex and accurate. Given the current uncalibrated image seen by the robot, the hierarchical algorithm gives the possibility of obtaining appearance-based (topological) and metric localization. Compared to other similar vision-based localization methods, the one proposed here has the advantage that it gets accurate metric localization based on a minimal reference image set, using the 1D three view geometry. Moreover, thanks to the linear wide baseline features used, the method is insensitive to illumination changes and occlusions, while keeping the computational load small. The simplicity of the radial line feature used speeds up the process while it keeps acceptable accuracy. We show experiments with two omnidirectional image data-sets to evaluate performance of the method.

Journal Article
TL;DR: This paper proposes to learn the joint probability distribution of appearance and body pose using a mixture of view-dependent models to capture the multimodal and nonlinear relationships in monocular human body tracking using learned models.
Abstract: This paper considers the problem of monocular human body tracking using learned models. We propose to learn the joint probability distribution of appearance and body pose using a mixture of view-dependent models. In such a way the multimodal and nonlinear relationships can be captured reliably. We formulate inference algorithms that are based on generative models while exploiting the advantages of a learned model when compared to the traditionally used geometric body models. Given static images or sequences, body poses and bounding box locations are inferred using silhouette based image descriptors. Prior information about likely body poses and a motion model are taken into account. We consider analytical computations and Monte-Carlo techniques, as well as a combination of both. In a Rao-Blackwellised particle filter, the tracking problem is partitioned into a part that is solved analytically, and a part that is solved with particle filtering. Tracking results are reported for human locomotion.


Journal ArticleDOI
TL;DR: On-line, interactive system for enhanced tele-teaching starts from a network of fixed cameras, placed around an instructor, which allows the system to place the instructor in a ‘virtual or mixed environment’.
Abstract: Our on-line, interactive system for enhanced tele-teaching starts from a network of fixed cameras, placed around an instructor. A ‘virtual’ camera can move freely between these ‘real’ cameras. The creation of a virtual view requires on-line foreground/background segmentation and 3D reconstructions of the foreground. An off-line, user-interactive method is used for the initial 3D reconstruction of the background. An on-line update of the background model is done automatically. This on-line analysis also allows the system to place the instructor in a ‘virtual or mixed environment’. The surroundings can be replaced by a more suitable background, like an outdoor scene or large-sized presentation slides. Given a desktop and a set of consumer grade cameras, a lecture can be given from an arbitrary place, e.g. an office. Virtual objects, in particular ‘virtual post-its’ or labels, can be used to augment the scene. They can be on-line selected slide cut-outs or predefined elements. The former are based on simple gestures with a laser pointer.

Book ChapterDOI
12 Sep 2006
TL;DR: It is proved that the varying zoom and rotation of two pan-tilt units can be extracted solely from the planar homography which exists between both cameras.
Abstract: In many sports and surveillance scenarios the action is dynamic and takes place on a planar surface, while being recorded by two or more zoom-pan-tilt cameras. Although their position is fixed, these cameras can typically rotate and zoom independently from each other. When rotation and zoom of each camera are known, one could reconstruct the dynamic event in 3D and generate different views of the action. Sensors exist which report zoom and orientation changes of pan-tilt units. In absence of such sensors, however, we prove that the varying zoom and rotation of two pan-tilt units can be extracted solely from the planar homography which exists between both cameras.

01 Jan 2006
TL;DR: This work has developed a system which localizes solely based on naturally occurring landmarks, using a panoramic camera system which provides omnidirectional images of the environment and a Bayesian framework makes it possible to track the system's position in quasi real time.
Abstract: This work was motivated by the goal of building a wearable virtual tourist system that could guide people around in large complex environments. It must yield a precise localization, even in situations in which Global Positioning Systems (GPS) cannot provide navigational information, including include indoor locations and narrow streets where there is no line of sight to the GPS satellites. Because installing active badges or beacon systems involves substantial effort and expense, we have developed a system which localizes solely based on naturally occurring landmarks. As sensory input, we only use a panoramic camera system which provides omnidirectional images of the environment. During the training stage, the system is led around in the environment while recording images at regular time intervals. Offline, these images are automatically archived in a world model. Unlike traditional approaches we don’t build an Euclidean metrical map. The used world model is a graph reflecting the topological structure of the environment: e.g. for indoor environments rooms are nodes and corridors are edges of the graph. Image comparison is done using both global color measures and matching of specially developed local features. These measures are designed to be robust, respectively invariant, to image distortions caused by viewpoint changes, illumination changes and occlusions. This leads to a system that can recognize a certain place even if its location is not exactly the same as the location from where the reference image was taken, even if the illumination is substantially different, and even if there are large occluded parts. Using this world model, localization can be done by comparing a new query image, taken at the current position of the mobile system, with the images in the model. A Bayesian framework makes it possible to track the system’s position in quasi real time. When the present location is known, context­sensitive touristical information can be given to the tourist, or a path to a target location can be carried out using the topological map.


Book ChapterDOI
07 May 2006
TL;DR: The method is based on uncalibrated Structure-from-Motion (SfM) to extract 3D models for the foreground object and the background, as well as for their relative motion, and fixes the relative scales between the scene parts within and between the videos.
Abstract: The paper presents a method for multi-dimensional registration of two video streams. The sequences are captured by two hand-held cameras moving independently with respect to each other, both observing one object rigidly moving apart from the background. The method is based on uncalibrated Structure-from-Motion (SfM) to extract 3D models for the foreground object and the background, as well as for their relative motion. It fixes the relative scales between the scene parts within and between the videos. It also provides the registration between all partial 3D models, and the temporal synchronization between the videos. The crux is that not a single point on the foreground or background needs to be in common between both video streams. Extensions to more than two cameras and multiple foreground objects are possible.