Showing papers by "Luc Van Gool published in 2009"

PDF

Open Access

Proceedings Article•DOI•

Robust tracking-by-detection using a detector confidence particle filter

[...]

Michael D. Breitenstein¹, Fabian Reichlin¹, Bastian Leibe¹, Esther Koller-Meier¹, Luc Van Gool¹ - Show less +1 more•Institutions (1)

ETH Zurich¹

01 Jan 2009

TL;DR: A novel approach for multi-person tracking-by-detection in a particle filtering framework that uses the continuous confidence of pedestrian detectors and online trained, instance-specific classifiers as a graded observation model, which relies only on information from the past and is suitable for online applications.

...read moreread less

Abstract: We propose a novel approach for multi-person tracking-by-detection in a particle filtering framework In addition to final high-confidence detections, our algorithm uses the continuous confidence of pedestrian detectors and online trained, instance-specific classifiers as a graded observation model Thus, generic object category knowledge is complemented by instance-specific information A main contribution of this paper is the exploration of how these unreliable information sources can be used for multi-person tracking The resulting algorithm robustly tracks a large number of dynamically moving persons in complex scenes with occlusions, does not rely on background modeling, and operates entirely in 2D (requiring no camera or ground plane calibration) Our Markovian approach relies only on information from the past and is suitable for online applications We evaluate the performance on a variety of datasets and show that it improves upon state-of-the-art methods

...read moreread less

633 citations

Proceedings Article•DOI•

Beyond semi-supervised tracking: Tracking should be as simple as detection, but not simpler than recognition

[...]

Severin Stalder¹, Helmut Grabner¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

01 Jan 2009

TL;DR: A multiple classifier system for model-free tracking that outperforms other on-line tracking methods especially in case of occlusions and presence of similar objects.

...read moreread less

Abstract: We present a multiple classifier system for model-free tracking. The tasks of detection (finding the object of interest), recognition (distinguishing similar objects in a scene), and tracking (retrieving the object to be tracked) are split into separate classifiers in the spirit of simplifying each classification task. The supervised and semi-supervised classifiers are carefully trained on-line in order to increase adaptivity while limiting accumulation of errors, i.e. drifting. In the experiments, we demonstrate real-time tracking on several challenging sequences, including multi-object tracking of faces, humans, and other objects. We outperform other on-line tracking methods especially in case of occlusions and presence of similar objects.

...read moreread less

246 citations

Proceedings Article•DOI•

Face/Off: live facial puppetry

[...]

Thibaut Weise¹, Hao Li¹, Luc Van Gool¹, Mark Pauly¹•Institutions (1)

ETH Zurich¹

01 Aug 2009

TL;DR: A complete integrated system for live facial puppetry that enables high-resolution real-time facial expression tracking with transfer to another person's face and the actor becomes a puppeteer with complete and accurate control over a digital face is presented.

...read moreread less

Abstract: We present a complete integrated system for live facial puppetry that enables high-resolution real-time facial expression tracking with transfer to another person's face. The system utilizes a real-time structured light scanner that provides dense 3D data and texture. A generic template mesh, fitted to a rigid reconstruction of the actor's face, is tracked offline in a training stage through a set of expression sequences. These sequences are used to build a person-specific linear face model that is subsequently used for online face tracking and expression transfer. Even with just a single rigid pose of the target face, convincing real-time facial animations are achievable. The actor becomes a puppeteer with complete and accurate control over a digital face.

...read moreread less

239 citations

Proceedings Article•DOI•

Tracking a hand manipulating an object

[...]

Henning Hamer¹, Konrad Schindler², Esther Koller-Meier¹, Luc Van Gool¹•Institutions (2)

ETH Zurich¹, Technische Universität Darmstadt²

01 Sep 2009

TL;DR: To achieve robustness to partial occlusions, this work uses an individual local tracker for each segment of the articulated structure, which enforces the anatomical hand structure through soft constraints on the joints between adjacent segments.

...read moreread less

Abstract: We present a method for tracking a hand while it is interacting with an object This setting is arguably the one where hand-tracking has most practical relevance, but poses significant additional challenges: strong occlusions by the object as well as self-occlusions are the norm, and classical anatomical constraints need to be softened due to the external forces between hand and object To achieve robustness to partial occlusions, we use an individual local tracker for each segment of the articulated structure The segments are connected in a pairwise Markov random field, which enforces the anatomical hand structure through soft constraints on the joints between adjacent segments The most likely hand configuration is found with belief propagation Both range and color data are used as input Experiments are presented for synthetic data with ground truth and for real data of people manipulating objects

...read moreread less

181 citations

Proceedings Article•DOI•

Segmentation-Based Urban Traffic Scene Understanding

[...]

Andreas Ess¹, Tobias Mueller, Helmut Grabner¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

01 Jan 2009

TL;DR: The experiments show that while a state-of-the-art scene classifier can keep global classes such as road types, similarly well apart, a manually crafted feature set based on a segmentation clearly outperforms it on object classes.

...read moreread less

Abstract: Recognizing the traffic scene in front of a car is an important asset for autonomous driving, as well as for safety systems. While GPS-based maps abound and have reached an incredible level of accuracy, they can still profit from additional, image-based information. Especially in urban scenarios, GPS reception can be shaky, or the map might not contain the latest detours due to constructions, demonstrations, etc. Furthermore, such maps are static and cannot account for other dynamic traffic agents, such as cars or pedestrians. In this paper, we therefore propose an image-based system that is able to recognize both the road type (straight, left/right curve, crossing, ...) as well as a set of often encountered objects (car, pedestrian, pedestrian crossing). The obtained information could then be fused with existing maps and either assist the driver directly (e.g., a pedestrian crossing is ahead: slow down) or help in improving object tracking (e.g., where are possible entrance points for pedestrians or cars?). Starting from a video sequence obtained from a car driving through urban areas, we employ a two-stage architecture termed SegmentationBased Urban Traffic Scene Understanding (SUTSU) that first builds an intermediate representation of the image based on a patch-wise image classification. The patch-wise segmentation is inspired by recent work [3, 4, 5] and assigns class probabilities to every 8× 8 image patch. As a feature set, we use the coefficients of the Walsh-Hadamard transform (a decomposition of the image into square waves), and, if available, additional information from the depth map. These are then used in a oneversus-all training using AdaBoost for feature selection, where we choose 13 texture classes that we found to be representative of typical urban scenes. This yields a meta representation of the scene that is more suitable for further processing, Fig. 1 (b,c). In recent publications, such a segmentation was used for a variety of purposes, such as improvement of object detection [1, 5], analysis of occlusion boundaries, or 3D reconstruction. In this paper, we will investigate the use of a segmentation for urban scene analysis. We infer another set of features from the segmentation’s probability maps, analyzing repetitivity, curvature, and rough structure. This set is then again used with a one-versus-all training to infer both the type of road segment ahead, as well the additional presence of pedestrians, cars, or pedestrian crossing. A Hidden Markov Model is used for temporally smoothing the result. SUTSU is tested on two challenging sequences, spanning over 50 minutes video of driving through Zurich. The experiments show that while a state-of-the-art scene classifier [2] can keep global classes such as road types, similarly well apart, a manually crafted feature set based on a segmentation clearly outperforms it on object classes. Example images are shown in Fig. 2. The main contribution of this paper is the application of recent research efforts in scene categorization research to do vision “in the wild”, driving through urban scenarios. We furthermore show the advantage of a segmentation-based approach over a global descriptor, as the intermediate representation can easily be adapted to other underlying image data (e.g. dusk, rain, ...), without having to change the high-level classifier.

...read moreread less

179 citations

Proceedings Article•DOI•

Multi-view traffic sign detection, recognition, and 3D localisation

[...]

Radu Timofte¹, Karel Zimmermann¹, Luc Van Gool¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 Jan 2009

TL;DR: A pipeline for the efficient detection and recognition of traffic signs is proposed and 2D and 3D techniques are combined to improve results beyond the state-of-the-art, which is still very much preoccupied with single view analysis.

...read moreread less

Abstract: Several applications require information about street furniture. Part of the task is to survey all traffic signs. This has to be done for millions of km of road, and the exercise needs to be repeated every so often. A van with 8 roof-mounted cameras drove through the streets and took images every meter. The paper proposes a pipeline for the efficient detection and recognition of traffic signs. The task is challenging, as illumination conditions change regularly, occlusions are frequent, 3D positions and orientations vary substantially, and the actual signs are far less similar among equal types than one might expect. We combine 2D and 3D techniques to improve results beyond the state-of-the-art, which is still very much preoccupied with single view analysis.

...read moreread less

139 citations

Proceedings Article•DOI•

In-hand scanning with online loop closure

[...]

Thibaut Weise¹, Thomas Wismer¹, Bastian Leibe², Luc Van Gool¹•Institutions (2)

ETH Zurich¹, RWTH Aachen University²

01 Jan 2009

TL;DR: A complete 3D in-hand scanning system that allows users to scan objects by simply turning them freely in front of a real-time 3D range scanner and the online model is of sufficiently high quality to serve as the final model.

...read moreread less

Abstract: We present a complete 3D in-hand scanning system that allows users to scan objects by simply turning them freely in front of a real-time 3D range scanner. The 3D object model is reconstructed online as a point cloud by registering and integrating the incoming 3D patches with the online 3D model. The accumulation of registration errors leads to the well-known loop closure problem. We address this issue already during the scanning session by distorting the object as rigidly as possible. Scanning errors are removed by explicitly handling outliers. As a result of our proposed online modeling and error handling procedure, the online model is of sufficiently high quality to serve as the final model. Thus, no additional post-processing is required which might lead to artifacts in the model reconstruction. We demonstrate our approach on several difficult real-world objects and quantitatively evaluate the resulting modeling accuracy.

...read moreread less

127 citations

Proceedings Article•DOI•

I know what you did last summer: object-level auto-annotation of holiday snaps

[...]

Stephan Gammeter¹, Lukas Bossard¹, Till Quack¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

01 Jan 2009

TL;DR: The efficiency of the retrieval process is optimized by creating more compact and precise indices for visual vocabularies using background information obtained in the crawling stage of the system.

...read moreread less

Abstract: The state-of-the art in visual object retrieval from large databases allows to search millions of images on the object level Recently, complementary works have proposed systems to crawl large object databases from community photo collections on the Internet We combine these two lines of work to a large-scale system for auto-annotation of holiday snaps The resulting method allows for automatic labeling objects such as landmark buildings, scenes, pieces of art etc at the object level in a fully automatic manner The labeling is multi-modal and consists of textual tags, geographic location, and related content on the Internet Furthermore, the efficiency of the retrieval process is optimized by creating more compact and precise indices for visual vocabularies using background information obtained in the crawling stage of the system We demonstrate the scalability and precision of the proposed method by conducting experiments on millions of images downloaded from community photo collections on the Internet

...read moreread less

108 citations

Bilateral Filtering: Theory and Applications: Series: Foundations and Trends® in Computer Graphics and Vision

[...]

Sylvain Paris, Pierre Kornprobst, Jack Tumblin, Frédo Durand, Brian Curless, Luc Van Gool, Richard Szeliski - Show less +3 more

01 Jan 2009

102 citations

Proceedings Article•DOI•

Hunting Nessie - Real-time abnormality detection from webcams

[...]

Michael D. Breitenstein¹, Helmut Grabner¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

01 Jan 2009

TL;DR: This work presents a data-driven, unsupervised method for unusual scene detection from static webcams, based on simple image features that detects plausible unusual scenes, which have not been observed in the data-stream before.

...read moreread less

Abstract: We present a data-driven, unsupervised method for unusual scene detection from static webcams. Such time-lapse data is usually captured with very low or varying framerate. This precludes the use of tools typically used in surveillance (e.g., object tracking). Hence, our algorithm is based on simple image features. We define usual scenes based on the concept of meaningful nearest neighbours instead of building explicit models. To effectively compare the observations, our algorithm adapts the data representation. Furthermore, we use incremental learning techniques to adapt to changes in the data-stream. Experiments on several months of webcam data show that our approach detects plausible unusual scenes, which have not been observed in the data-stream before.

...read moreread less

75 citations

Journal Article•DOI•

Procedural modeling for digital cultural heritage

[...]

Simon Haegler¹, Pascal Müller¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

01 Feb 2009-Eurasip Journal on Image and Video Processing

TL;DR: It is argued that procedural modeling technology based on shape grammars provides an interesting alternative to such measures, as they tend to spoil the experience for the observer.

...read moreread less

Abstract: The rapid development of computer graphics and imaging provides the modern archeologist with several tools to realistically model and visualize archeological sites in 3D. This, however, creates a tension between veridical and realistic modeling. Visually compelling models may lead people to falsely believe that there exists very precise knowledge about the past appearance of a site. In order to make the underlying uncertainty visible, it has been proposed to encode this uncertainty with different levels of transparency in the rendering, or of decoloration of the textures. We argue that procedural modeling technology based on shape grammars provides an interesting alternative to such measures, as they tend to spoil the experience for the observer. Both its efficiency and compactness make procedural modeling a tool to produce multiple models, which together sample the space of possibilities. Variations between the different models express levels of uncertainty implicitly, while letting each individual model keeping its realistic appearance. The underlying, structural description makes the uncertainty explicit. Additionally, procedural modeling also yields the flexibility to incorporate changes as knowledge of an archeological site gets refined. Annotations explaining modeling decisions can be included. We demonstrate our procedural modeling implementation with several recent examples.

...read moreread less

Proceedings Article•DOI•

Unsupervised face alignment by robust nonrigid mapping

[...]

Jianke Zhu¹, Luc Van Gool¹, Steven C. H. Hoi²•Institutions (2)

ETH Zurich¹, Nanyang Technological University²

01 Jan 2009

TL;DR: Based on a regularized face model, unsupervised face alignment into the Lucas-Kanade image registration approach is frame and a robust optimization scheme to handle appearance variations is proposed.

...read moreread less

Abstract: We propose a novel approach to unsupervised facial image alignment. Differently from previous approaches, that are confined to affine transformations on either the entire face or separate patches, we extract a nonrigid mapping between facial images. Based on a regularized face model, we frame unsupervised face alignment into the Lucas-Kanade image registration approach. We propose a robust optimization scheme to handle appearance variations. The method is fully automatic and can cope with pose variations and expressions, all in an unsupervised manner. Experiments on a large set of images showed that the approach is effective.

...read moreread less

Proceedings Article•DOI•

Exemplar-based Action Recognition in Video

[...]

Geert Willems¹, Jan Hendrik Becker¹, Tinne Tuytelaars¹, Luc Van Gool²•Institutions (2)

Katholieke Universiteit Leuven¹, ETH Zurich²

01 Jan 2009

TL;DR: This work is the first to extend the exemplar-based approach using local features into the spatio-temporal domain, and allows to avoid the problems that typically plague sliding window-based approaches - in particular the exhaustive search over spatial coordinates, time, and spatial as well as temporal scales.

...read moreread less

Abstract: In this work, we present a method for action localization and recognition using an exemplar-based approach. It starts from local dense yet scale-invariant spatio-temporal features. The most discriminative visual words are selected and used to cast bounding box hypotheses, which are then verified and further grouped into the final detections. To the best of our knowledge, we are the first to extend the exemplar-based approach using local features into the spatio-temporal domain. This allows us to avoid the problems that typically plague sliding window-based approaches - in particular the exhaustive search over spatial coordinates, time, and spatial as well as temporal scales. We report state-ofthe-art results on challenging datasets, extracted from real movies, for both classification and localization.

...read moreread less

Proceedings Article•DOI•

Real-time accurate stereo with bitwise fast voting on CUDA

[...]

Ke Zhang¹, Jiangbo Lu¹, Gauthier Lafruit¹, Rudy Lauwereins¹, Luc Van Gool¹ - Show less +1 more•Institutions (1)

Katholieke Universiteit Leuven¹

01 Sep 2009

TL;DR: A GPU-oriented bitwise fast voting method is proposed to effectively improve the matching accuracy, which is enormously faster than the histogram-based approach, efficiently exploiting the computing resources of GPUs.

...read moreread less

Abstract: This paper proposes a real-time design for accurate stereo matching on Compute Unified Device Architecture (CUDA). We adopt a leading local algorithm for its high data parallelism. A GPU-oriented bitwise fast voting method is proposed to effectively improve the matching accuracy, which is enormously faster than the histogram-based approach. The whole algorithm is parallelized on CUDA at a fine granularity, efficiently exploiting the computing resources of GPUs. On-chip shared memory is utilized to alleviate the latency of memory accesses. Compared to the CPU counterpart, our design attains a speedup factor of 52. With high matching accuracy, the proposed design is still among the fastest stereo methods on GPUs. The advantages of speed and accuracy advocate our design for practical applications such as robotics systems and multiview teleconferencing.

...read moreread less

Proceedings Article•DOI•

A distributed camera system for multi-resolution surveillance

[...]

Nicola Bellotto¹, Eric Sommerlade¹, Ben Benfold¹, Charles Bibby¹, Ian Reid¹, Daniel Roth², Carles Fernández, Luc Van Gool², Jordi Gonzàlez - Show less +5 more•Institutions (2)

University of Oxford¹, ETH Zurich²

20 Oct 2009

TL;DR: An architecture for a multi-camera, multi-resolution surveillance system to support a set of distributed static and pan-tilt-zoom cameras and visual tracking algorithms, together with a central supervisor unit is described.

...read moreread less

Abstract: We describe an architecture for a multi-camera, multi-resolution surveillance system. The aim is to support a set of distributed static and pan-tilt-zoom (PTZ) cameras and visual tracking algorithms, together with a central supervisor unit. Each camera (and possibly pan-tilt device) has a dedicated process and processor. Asynchronous interprocess communications and archiving of data are achieved in a simple and effective way via a central repository, implemented using an SQL database.

...read moreread less

Journal Article•DOI•

Learning Generative Models for Multi-Activity Body Pose Estimation

[...]

Tobias Jaeggli¹, Esther Koller-Meier¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

01 Jun 2009-International Journal of Computer Vision

TL;DR: In this paper, a generative model of the relationship of body pose and image appearance using a sparse kernel regressor is proposed to track through poorly segmented low-resolution image sequences where tracking otherwise fails.

...read moreread less

Abstract: We present a method to simultaneously estimate 3D body pose and action categories from monocular video sequences. Our approach learns a generative model of the relationship of body pose and image appearance using a sparse kernel regressor. Body poses are modelled on a low-dimensional manifold obtained by Locally Linear Embedding dimensionality reduction. In addition, we learn a prior model of likely body poses and a dynamical model in this pose manifold. Sparse kernel regressors capture the nonlinearities of this mapping efficiently. Within a Recursive Bayesian Sampling framework, the potentially multimodal posterior probability distributions can then be inferred. An activity-switching mechanism based on learned transfer functions allows for inference of the performed activity class, along with the estimation of body pose and 2D image location of the subject. Using a rough foreground segmentation, we compare Binary PCA and distance transforms to encode the appearance. As a postprocessing step, the globally optimal trajectory through the entire sequence is estimated, yielding a single pose estimate per frame that is consistent throughout the sequence. We evaluate the algorithm on challenging sequences with subjects that are alternating between running and walking movements. Our experiments show how the dynamical model helps to track through poorly segmented low-resolution image sequences where tracking otherwise fails, while at the same time reliably classifying the activity type.

...read moreread less

Proceedings Article•DOI•

Feature-centric Efficient Subwindow Search

[...]

Alain Lehmann¹, Bastian Leibe¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

01 Jan 2009

TL;DR: This paper makes the connection between sliding-window and Hough-based object detection explicit and shows that the feature-centric view of the latter also nicely fits with the branch and bound paradigm, while it avoids the ESS memory tradeoff.

...read moreread less

Abstract: Many object detection systems rely on linear classifiers embedded in a sliding-window scheme Such exhaustive search involves massive computation Efficient Subwindow Search (ESS) [11] avoids this by means of branch and bound However, ESS makes an unfavourable memory tradeoff Memory usage scales with both image size and overall object model size This risks becoming prohibitive in a multiclass system In this paper, we make the connection between sliding-window and Hough-based object detection explicit Then, we show that the feature-centric view of the latter also nicely fits with the branch and bound paradigm, while it avoids the ESS memory tradeoff Moreover, on-line integral image calculations are not needed Both theoretical and quantitative comparisons with the ESS bound are provided, showing that none of this comes at the expense of performance

...read moreread less

Exploring context to learn scene specific object detectors

[...]

Severin Stalder, Helmut Grabner, Luc Van Gool

01 Jan 2009

TL;DR: This work proposes a method using different types of context in order to collect scene specific samples from both, the background and the object class over time, which can robustly adapt to the scene without drifting.

...read moreread less

Abstract: Generic person detection is an ill-posed problem as context is widely ignored. Local context can be used to split the generic detection task into easier sub-problems, which was recently explored by classifier grids. The detection problem gets simplified spatially by training separate classifiers for each possible location in the image. So far, adaptive grid based approaches only focused on exploring the specific background class. In contrast, we propose a method using different types of context in order to collect scene specific samples from both, the background andthe object class over time. These samples are used to update the specific object detectors. Due to limiting label noise and avoiding direct feedback loops our system can robustly adapt to the scene without drifting. Results on the PETS 2009 dataset show significantly improved person detections, especially, during static and dynamic occlusions ( e.g., lamp poles and crowded scenes).

...read moreread less

Proceedings Article•

In-hand Scanning with Online Loop Closure

[...]

Thibaut Weise, Thomas Wismer, Bastian Leibe, Luc Van Gool

01 Jan 2009

TL;DR: In this article, the authors present a complete 3D in-hand scanning system that allows users to scan objects by simply turning them freely in front of a real-time 3D range scanner.

...read moreread less

Proceedings Article•DOI•

PRISM: PRincipled Implicit Shape Model

[...]

Alain Lehmann¹, Bastian Leibe², Luc Van Gool¹•Institutions (2)

ETH Zurich¹, RWTH Aachen University²

01 Jan 2009

TL;DR: It is shown that it is sufficient to use soft-matching during learning only and to perform fast nearest neighbour matching at recognition time (where speed is of prime importance) and a framework is proposed which overcomes these problems and gives a sound justification to the voting procedure.

...read moreread less

Abstract: This paper addresses the problem of object detection by means of the Generalised Hough transform paradigm. The Implicit Shape Model (ISM) is a well-known approach based on this idea. It made this paradigm popular and has been adopted many times. Although the algorithm exhibits robust detection performance, its description, i.e. its probabilistic model, involves arguments which are unsatisfactory from a probabilistic standpoint. We propose a framework which overcomes these problems and gives a sound justification to the voting procedure. Furthermore, our framework allows for a formal understanding of the heuristic of soft-matching commonly used in visual vocabulary systems. We show that it is sufficient to use soft-matching during learning only and to perform fast nearest neighbour matching at recognition time (where speed is of prime importance). Our implementation is based on Gaussian Mixture Models (instead of kernel density estimators as with ISM) which lead to a fast gradient-based object detector.

...read moreread less

Journal Article•DOI•

Real-Time Body Pose Recognition Using 2D or 3D Haarlets

[...]

Michael Van den Bergh¹, Esther Koller-Meier², Luc Van Gool²•Institutions (2)

ETH Zurich¹, Katholieke Universiteit Leuven²

01 Jun 2009-International Journal of Computer Vision

TL;DR: A novel approach to markerless real-time pose recognition in a multicamera setup is presented and Average Neighborhood Margin Maximization (ANMM) is introduced as a powerful new technique to train Haar-like features.

...read moreread less

Abstract: This article presents a novel approach to markerless real-time pose recognition in a multicamera setup. Body pose is retrieved using example-based classification based on Haar wavelet-like features to allow for real-time pose recognition. Average Neighborhood Margin Maximization (ANMM) is introduced as a powerful new technique to train Haar-like features. The rotation invariant approach is implemented for both 2D classification based on silhouettes, and 3D classification based on visual hulls.

...read moreread less

Journal Article•

3D Reconstruction from multiple images

[...]

Theo Moons¹, Luc Van Gool, Maarten Vergauwen•Institutions (1)

Katholieke Universiteit Leuven¹

01 Jan 2009-Foundations and Trends in Computer Graphics and Vision

Proceedings Article•DOI•

Elastic convolved ICP for the registration of deformable objects

[...]

Ryusuke Sagawa¹, Kiyotaka Akasaka¹, Yasushi Yagi¹, Henning Hamer², Luc Van Gool² - Show less +1 more•Institutions (2)

Osaka University¹, ETH Zurich²

01 Jan 2009

TL;DR: The proposed method computes the convolution for every vertex in a model, it incorporates dense feature matching as opposed to sparse matching based on certain feature descriptors, and is analogous to window matching in 2D image registration.

...read moreread less

Abstract: This paper describes a method for registering deformable 3D objects. When an object such as a hand deforms, the deformation of the local shape is small, whereas the global shape deforms to a greater extent in many cases. Therefore, the local shape can be used as a feature for matching corresponding points. Instead of using a descriptor of the local shape, we introduce the convolution of the error between corresponding points for each vertex of a 3D mesh model. This approach is analogous to window matching in 2D image registration. Since the proposed method computes the convolution for every vertex in a model, it incorporates dense feature matching as opposed to sparse matching based on certain feature descriptors. Through experiments, we show that the convolution is useful for finding corresponding points and evaluate the accuracy of the registration.

...read moreread less

Markovian tracking-by-detection from a single, uncalibrated camera

[...]

Michael D. Breitenstein, Fabian Reichlin, Bastian Leibe, Esther Koller-Meier, Luc Van Gool - Show less +1 more

01 Jan 2009

TL;DR: An algorithm for multi-person tracking-bydetection in a particle filtering framework that tightly couples object detection, classification, and tracking components and robustly tracks a variable number of dynamically moving persons in complex scenes with occlusions is presented.

...read moreread less

Abstract: We present an algorithm for multi-person tracking-bydetection in a particle filtering framework. To address the unreliability of current state-of-the-art object detectors, our algorithm tightly couples object detection, classification, and tracking components. Instead of relying only on the final, sparse output from a detector, we additionally employ its continuous intermediate output to impart our approach with more flexibility to handle difficult situations. The resulting algorithm robustly tracks a variable number of dynamically moving persons in complex scenes with occlusions. The approach does not rely on background modeling and is based only on 2D information from a single camera, not requiring any camera or ground plane calibration. We evaluate the algorithm on the PETS’09 tracking dataset and discuss the importance of the different algorithm components to robustly handle difficult situations.

...read moreread less

Proceedings Article•DOI•

Robust stereo matching with fast Normalized Cross-Correlation over shape-adaptive regions

[...]

Ke Zhang¹, Jiangbo Lu¹, Gauthier Lafruit¹, Rudy Lauwereins¹, Luc Van Gool¹ - Show less +1 more•Institutions (1)

Katholieke Universiteit Leuven¹

07 Nov 2009

TL;DR: An efficient stereo algorithm with NCC over shape-adaptive matching regions is proposed, producing depth-discontinuity preserving disparity maps while remaining the advantage of robustness to radiometric differences.

...read moreread less

Abstract: Normalized Cross-Correlation (NCC) is a common matching technique to tolerate radiometric differences between stereo images. However, traditional rectangle-based NCC tends to blur the depth discontinuities. This paper proposes an efficient stereo algorithm with NCC over shape-adaptive matching regions, producing depth-discontinuity preserving disparity maps while remaining the advantage of robustness to radiometric differences. To alleviate the computational intensity, we propose an acceleration algorithm using an orthogonal integral image technique, achieving a speedup factor of 10∼27. In addition, a voting scheme on reliable estimates is applied to refine the initial estimates. Experiments show that, besides the robustness, the proposed method obtains accurate disparity maps at fast speed. Our method highly ranks among the local approaches in the Middlebury stereo benchmark.

...read moreread less

Proceedings Article•DOI•

Hough Transform-based Mouth Localization for Audio-Visual Speech Recognition

[...]

Gabriele Fanelli¹, Jürgen Gall¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

01 Jan 2009

TL;DR: A novel method for mouth localization in the context of multimodal speech recognition where audio and visual cues are fused to improve the speech recognition accuracy and the superior accuracy and quantitative improvements for audio-visual speech recognition over monomodal approaches are demonstrated.

...read moreread less

Abstract: We present a novel method for mouth localization in the context of multimodal speech recognition where audio and visual cues are fused to improve the speech recognition accuracy. While facial feature points like mouth corners or lip contours are commonly used to estimate at least scale, position, and orientation of the mouth, we propose a Hough transform-based method. Instead of relying on a predefined sparse subset of mouth features, it casts probabilistic votes for the mouth center from several patches in the neighborhood and accumulates the votes in a Hough image. This makes the localization more robust as it does not rely on the detection of a single feature. In addition, we exploit the different shape properties of eyes and mouth in order to localize the mouth more efficiently. Using the rotation invariant representation of the iris, scale and orientation can be efficiently inferred from the localized eye positions. The superior accuracy of our method and quantitative improvements for audio-visual speech recognition over monomodal approaches are demonstrated on two datasets.

...read moreread less

Journal Article•DOI•

Shape-from-recognition: Recognition enables meta-data transfer

[...]

Alexander Thomas¹, Vittorio Ferrari², Bastian Leibe³, Tinne Tuytelaars¹, Luc Van Gool¹ - Show less +1 more•Institutions (3)

Catholic University of Leuven¹, ETH Zurich², RWTH Aachen University³

01 Dec 2009-Computer Vision and Image Understanding

TL;DR: This paper studies one particular form of cognitive feedback, where the ability to recognize objects of a given category is exploited to infer different kinds of meta-data annotations for images of previously unseen object instances, in particular information on 3D shape.

...read moreread less

Journal Article•DOI•

Using Multi-view Recognition and Meta-data Annotation to Guide a Robot's Attention

[...]

Alexander Thomas¹, Vittorio Ferrari², Bastian Leibe³, Tinne Tuytelaars¹, Luc Van Gool² - Show less +1 more•Institutions (3)

Katholieke Universiteit Leuven¹, ETH Zurich², RWTH Aachen University³

01 Aug 2009-The International Journal of Robotics Research

TL;DR: This work presents a system that is able to recognize objects of a certain class in an image and to identify their parts for potential interactions and presents experimental results on wheelchairs, cars, and motorbikes.

...read moreread less

Abstract: In the transition from industrial to service robotics, robots will have to deal with increasingly unpredictable and variable environments. We present a system that is able to recognize objects of a certain class in an image and to identify their parts for potential interactions. The method can recognize objects from arbitrary viewpoints and generalizes to instances that have never been observed during training, even if they are partially occluded and appear against cluttered backgrounds. Our approach builds on the implicit shape model of Leibe et al. We extend it to couple recognition to the provision of meta-data useful for a task and to the case of multiple viewpoints by integrating it with the dense multi-view correspondence finder of Ferrari et al. Meta-data can be part labels but also depth estimates, information on material types, or any other pixelwise annotation. We present experimental results on wheelchairs, cars, and motorbikes.

...read moreread less

Proceedings Article•DOI•

Tracker trees for unusual event detection

[...]

Fabian Nater¹, Helmut Grabner¹, Tobias Jaeggli¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

01 Jan 2009

TL;DR: An approach for unusual event detection, based on a tree of trackers, where a better informed tracker performs more robustly in cases where unusual events occur and the normal assumptions about the world no longer hold, and a less informed tracker has a good chance of performing better.

...read moreread less

Abstract: We present an approach for unusual event detection, based on a tree of trackers. At lower levels, the trackers are trained on broad classes of targets. At higher levels, they aim at more specific targets. For instance, at the root, a general blob tracker could operate which may track any object. The next level could already use information about human appearance to better track people. A further level could go after specific types of actions like walking, running, or sitting. Yet another level up, several walking trackers can be tuned to the gait of a particular person each. Thus, at each layer, one or more families of more specific trackers are available. As long as the target behaves according to expectations, a member of a higher up such family will be better tuned to the data than its parent tracker at a lower level. Typically, a better informed tracker performs more robustly. But in cases where unusual events occur and the normal assumptions about the world no longer hold, they loose their reliability. In such cases, a less informed tracker, not relying on what has now become false information, has a good chance of performing better. Such performance inversion signals an unusual event. Inversions between levels higher up represent deviations that are semantically more subtle than inversions lower down: for instance an unknown intruder entering a house rather than seeing a non-human target.

...read moreread less

Proceedings Article•DOI•

A comparison of 3d model-based tracking approaches for human motion capture in uncontrolled environments

[...]

Mohammed Shaheen, Juergen Gall¹, Robert Strzodka, Luc Van Gool¹, Hans-Peter Seidel - Show less +1 more•Institutions (1)

ETH Zurich¹

01 Jan 2009

TL;DR: This work addresses the problem of tracking humans with skeleton-based shape models where video footage is acquired by multiple cameras where the shape deformations are parameterized by the skeleton and provides a guidance on algorithm design for different applications related to human motion capture.

...read moreread less

Abstract: This work addresses the problem of tracking humans with skeleton-based shape models where video footage is acquired by multiple cameras. Since the shape deformations are parameterized by the skeleton, the position, orientation, and configuration of the human skeleton are estimated such that the deformed shape model is best explained by the image data. To solve this problem, several algorithms have been proposed over the last years. The approaches usually rely on filtering, local optimization, or global optimization. The global optimization algorithms can be further divided into single hypothesis (SHO) and multiple hypothesis optimization (MHO). We briefly compare the underlying mathematical models and evaluate the performance of one representative algorithm for each class. Furthermore, we compare several likelihoods and parameter settings with respect to accuracy and computation cost. A thorough evaluation is performed on two sequences with uncontrolled lighting conditions and non-static background. In addition, we demonstrate the impact of the likelihood on the HumanEva benchmark. Our results provide a guidance on algorithm design for different applications related to human motion capture.

...read moreread less