Showing papers by "Luc Van Gool published in 2014"

PDF

Open Access

Book Chapter•DOI•

A+: Adjusted Anchored Neighborhood Regression for Fast Super-Resolution

[...]

Radu Timofte¹, Vincent De Smet², Luc Van Gool¹, Luc Van Gool²•Institutions (2)

ETH Zurich¹, Katholieke Universiteit Leuven²

01 Nov 2014

TL;DR: This work proposes A+, an improved variant of Anchored Neighborhood Regression, which combines the best qualities of ANR and SF and builds on the features and anchored regressors from ANR but instead of learning the regressors on the dictionary it uses the full training material, similar to SF.

...read moreread less

Abstract: We address the problem of image upscaling in the form of single image super-resolution based on a dictionary of low- and high-resolution exemplars. Two recently proposed methods, Anchored Neighborhood Regression (ANR) and Simple Functions (SF), provide state-of-the-art quality performance. Moreover, ANR is among the fastest known super-resolution methods. ANR learns sparse dictionaries and regressors anchored to the dictionary atoms. SF relies on clusters and corresponding learned functions. We propose A+, an improved variant of ANR, which combines the best qualities of ANR and SF. A+ builds on the features and anchored regressors from ANR but instead of learning the regressors on the dictionary it uses the full training material, similar to SF. We validate our method on standard images and compare with state-of-the-art methods. We obtain improved quality (i.e. 0.2–0.7 dB PSNR better than ANR) and excellent time complexity, rendering A+ the most efficient dictionary-based super-resolution method to date.

...read moreread less

1,418 citations

Book Chapter•DOI•

Food-101 – Mining Discriminative Components with Random Forests

[...]

Lukas Bossard¹, Matthieu Guillaumin¹, Luc Van Gool¹, Luc Van Gool²•Institutions (2)

ETH Zurich¹, Katholieke Universiteit Leuven²

06 Sep 2014

TL;DR: A novel method to mine discriminative parts using Random Forests (rf), which allows us to mine for parts simultaneously for all classes and to share knowledge among them, and compares nicely to other s-o-a component-based classification methods.

...read moreread less

Abstract: In this paper we address the problem of automatically recognizing pictured dishes. To this end, we introduce a novel method to mine discriminative parts using Random Forests (rf), which allows us to mine for parts simultaneously for all classes and to share knowledge among them. To improve efficiency of mining and classification, we only consider patches that are aligned with image superpixels, which we call components. To measure the performance of our rf component mining for food recognition, we introduce a novel and challenging dataset of 101 food categories, with 101’000 images. With an average accuracy of 50.76%, our model outperforms alternative classification methods except for cnn, including svm classification on Improved Fisher Vectors and existing discriminative part-mining algorithms by 11.88% and 8.13%, respectively. On the challenging mit-Indoor dataset, our method compares nicely to other s-o-a component-based classification methods.

...read moreread less

1,216 citations

Book Chapter•DOI•

Creating Summaries from User Videos

[...]

Michael Gygli¹, Helmut Grabner¹, Hayko Riemenschneider¹, Luc Van Gool², Luc Van Gool¹ - Show less +1 more•Institutions (2)

ETH Zurich¹, Katholieke Universiteit Leuven²

06 Sep 2014

TL;DR: This paper proposes a novel approach and a new benchmark for video summarization, which focuses on user videos, which are raw videos containing a set of interesting events, and generates high-quality results, comparable to manual, human-created summaries.

...read moreread less

Abstract: This paper proposes a novel approach and a new benchmark for video summarization. Thereby we focus on user videos, which are raw videos containing a set of interesting events. Our method starts by segmenting the video by using a novel “superframe” segmentation, tailored to raw videos. Then, we estimate visual interestingness per superframe using a set of low-, mid- and high-level features. Based on this scoring, we select an optimal subset of superframes to create an informative and interesting summary. The introduced benchmark comes with multiple human created summaries, which were acquired in a controlled psychological experiment. This data paves the way to evaluate summarization methods objectively and to get new insights in video summarization. When evaluating our method, we find that it generates high-quality results, comparable to manual, human-created summaries.

...read moreread less

592 citations

Book Chapter•DOI•

Face Detection without Bells and Whistles

[...]

Markus Mathias¹, Rodrigo Benenson², Marco Pedersoli¹, Luc Van Gool³, Luc Van Gool¹ - Show less +1 more•Institutions (3)

Katholieke Universiteit Leuven¹, Max Planck Society², ETH Zurich³

06 Sep 2014

TL;DR: It is shown that a properly trained vanilla DPM reaches top performance, improving over commercial and research systems, and a detector based on rigid templates - similar in structure to the Viola&Jones detector - can reach similar top performance on this task.

...read moreread less

Abstract: Face detection is a mature problem in computer vision. While diverse high performing face detectors have been proposed in the past, we present two surprising new top performance results. First, we show that a properly trained vanilla DPM reaches top performance, improving over commercial and research systems. Second, we show that a detector based on rigid templates - similar in structure to the Viola&Jones detector - can reach similar top performance on this task. Importantly, we discuss issues with existing evaluation benchmark and propose an improved procedure.

...read moreread less

588 citations

Journal Article•DOI•

Multi-view traffic sign detection, recognition, and 3D localisation

[...]

Radu Timofte¹, Karel Zimmermann², Luc Van Gool¹•Institutions (2)

Katholieke Universiteit Leuven¹, Czech Technical University in Prague²

01 Apr 2014

TL;DR: The paper proposes a pipeline for the efficient detection and recognition of traffic signs from such images, and combines 2D and 3D techniques to improve results beyond the state-of-the-art, which is still very much preoccupied with single view analysis.

...read moreread less

Abstract: Several applications require information about street furniture. Part of the task is to survey all traffic signs. This has to be done for millions of km of road, and the exercise needs to be repeated every so often. We used a van with eight roof-mounted cameras to drive through the streets and took images every meter. The paper proposes a pipeline for the efficient detection and recognition of traffic signs from such images. The task is challenging, as illumination conditions change regularly, occlusions are frequent, sign positions and orientations vary substantially, and the actual signs are far less similar among equal types than one might expect. We combine 2D and 3D techniques to improve results beyond the state-of-the-art, which is still very much preoccupied with single view analysis. For the initial detection in single frames, we use a set of colour- and shape-based criteria. They yield a set of candidate sign patterns. The selection of such candidates allows for a significant speed up over a sliding window approach while keeping similar performance. A speedup is also achieved through a proposed efficient bounded evaluation of AdaBoost detectors. The 2D detections in multiple views are subsequently combined to generate 3D hypotheses. A Minimum Description Length formulation yields the set of 3D traffic signs that best explains the 2D detections. The paper comes with a publicly available database, with more than 13,000 traffic signs annotations.

...read moreread less

309 citations

Book Chapter•DOI•

Non-maximum Suppression for Object Detection by Passing Messages Between Windows

[...]

Rasmus Rothe¹, Matthieu Guillaumin¹, Luc Van Gool¹, Luc Van Gool²•Institutions (2)

ETH Zurich¹, Katholieke Universiteit Leuven²

01 Nov 2014

TL;DR: This paper builds on the recent Affinity Propagation Clustering algorithm, which passes messages between data points to identify cluster exemplars and shows that it provides a promising solution to the shortcomings of the greedy NMS.

...read moreread less

Abstract: Non-maximum suppression (NMS) is a key post-processing step in many computer vision applications. In the context of object detection, it is used to transform a smooth response map that triggers many imprecise object window hypotheses in, ideally, a single bounding-box for each detected object. The most common approach for NMS for object detection is a greedy, locally optimal strategy with several hand-designed components (e.g., thresholds). Such a strategy inherently suffers from several shortcomings, such as the inability to detect nearby objects. In this paper, we try to alleviate these problems and explore a novel formulation of NMS as a well-defined clustering problem. Our method builds on the recent Affinity Propagation Clustering algorithm, which passes messages between data points to identify cluster exemplars. Contrary to the greedy approach, our method is solved globally and its parameters can be automatically learned from training data. In experiments, we show in two contexts – object class and generic object detection – that it provides a promising solution to the shortcomings of the greedy NMS.

...read moreread less

186 citations

Proceedings Article•DOI•

Latent Dictionary Learning for Sparse Representation Based Classification

[...]

Meng Yang¹, Dengxin Dai², Lilin Shen¹, Luc Van Gool²•Institutions (2)

Shenzhen University¹, ETH Zurich²

23 Jun 2014

TL;DR: A latent representation model is introduced, in which discrimination of the learned dictionary is exploited via minimizing the within-class scatter of coding coefficients and the latent-value weighted dictionary coherence, and a latent sparse representation based classifier is also presented.

...read moreread less

Abstract: Dictionary learning (DL) for sparse coding has shown promising results in classification tasks, while how to adaptively build the relationship between dictionary atoms and class labels is still an important open question. The existing dictionary learning approaches simply fix a dictionary atom to be either class-specific or shared by all classes beforehand, ignoring that the relationship needs to be updated during DL. To address this issue, in this paper we propose a novel latent dictionary learning (LDL) method to learn a discriminative dictionary and build its relationship to class labels adaptively. Each dictionary atom is jointly learned with a latent vector, which associates this atom to the representation of different classes. More specifically, we introduce a latent representation model, in which discrimination of the learned dictionary is exploited via minimizing the within-class scatter of coding coefficients and the latent-value weighted dictionary coherence. The optimal solution is efficiently obtained by the proposed solving algorithm. Correspondingly, a latent sparse representation based classifier is also presented. Experimental results demonstrate that our algorithm outperforms many recently proposed sparse representation and dictionary learning approaches for action, gender and face recognition.

...read moreread less

128 citations

Proceedings Article•DOI•

3D reconstruction of freely moving persons for re-identification with a depth sensor

[...]

Matteo Munaro¹, Alberto Basso¹, Andrea Fossati², Luc Van Gool², Emanuele Menegatti¹ - Show less +1 more•Institutions (2)

University of Padua¹, ETH Zurich²

29 Sep 2014

TL;DR: A novel method for creating 3D models of persons freely moving in front of a consumer depth sensor and how they can be used for long-term person re-identification is described and shown.

...read moreread less

Abstract: In this work, we describe a novel method for creating 3D models of persons freely moving in front of a consumer depth sensor and we show how they can be used for long-term person re-identification. For overcoming the problem of the different poses a person can assume, we exploit the information provided by skeletal tracking algorithms for warping every point cloud frame to a standard pose in real time. Then, the warped point clouds are merged together to compose the model. Re-identification is performed by matching body shapes in terms of whole point clouds warped to a standard pose with the described method. We compare this technique with a classification method based on a descriptor of skeleton features and with a mixed approach which exploits both skeleton and shape features. We report experiments on two datasets we acquired for RGB-D re-identification which use different skeletal tracking algorithms and which are made publicly available to foster research in this new research branch.

...read moreread less

109 citations

Book Chapter•DOI•

Learning Where to Classify in Multi-view Semantic Segmentation

[...]

Hayko Riemenschneider¹, András Bódis-Szomorú¹, Julien Weissenberg¹, Luc Van Gool², Luc Van Gool¹ - Show less +1 more•Institutions (2)

ETH Zurich¹, Katholieke Universiteit Leuven²

06 Sep 2014

TL;DR: In this paper, the geometry of a 3D mesh model obtained from multi-view reconstruction is exploited to predict the best view before the actual labeling, which leads to a further reduction of computation time and a gain in accuracy.

...read moreread less

Abstract: There is an increasing interest in semantically annotated 3D models, e.g. of cities. The typical approaches start with the semantic labelling of all the images used for the 3D model. Such labelling tends to be very time consuming though. The inherent redundancy among the overlapping images calls for more efficient solutions. This paper proposes an alternative approach that exploits the geometry of a 3D mesh model obtained from multi-view reconstruction. Instead of clustering similar views, we predict the best view before the actual labelling. For this we find the single image part that bests supports the correct semantic labelling of each face of the underlying 3D mesh. Moreover, our single-image approach may surprise because it tends to increase the accuracy of the model labelling when compared to approaches that fuse the labels from multiple images. As a matter of fact, we even go a step further, and only explicitly label a subset of faces (e.g. 10%), to subsequently fill in the labels of the remaining faces. This leads to a further reduction of computation time, again combined with a gain in accuracy. Compared to a process that starts from the semantic labelling of the images, our method to semantically label 3D models yields accelerations of about 2 orders of magnitude. We tested our multi-view semantic labelling on a variety of street scenes.

...read moreread less

100 citations

Book Chapter•DOI•

One-Shot Person Re-identification with a Consumer Depth Camera

[...]

Matteo Munaro¹, Andrea Fossati², Alberto Basso¹, Emanuele Menegatti¹, Luc Van Gool² - Show less +1 more•Institutions (2)

University of Padua¹, ETH Zurich²

01 Jan 2014

TL;DR: A comparison between two techniques for one-shot person re-identification from soft biometric cues based upon a descriptor composed of features provided by a skeleton estimation algorithm and a novel technique to warp the subject’s point cloud to a standard pose.

...read moreread less

Abstract: In this chapter, we propose a comparison between two techniques for one-shot person re-identification from soft biometric cues. One is based upon a descriptor composed of features provided by a skeleton estimation algorithm; the other compares body shapes in terms of whole point clouds. This second approach relies on a novel technique we propose to warp the subject’s point cloud to a standard pose, which allows to disregard the problem of the different poses a person can assume. This technique is also used for composing 3D models which are then used at testing time for matching unseen point clouds. We test the proposed approaches on an existing RGB-D re-identification dataset and on the newly built BIWI RGBD-ID dataset. This dataset provides sequences of RGB, depth, and skeleton data for 50 people in two different scenarios and it has been made publicly available to foster advancement in this new research branch.

...read moreread less

98 citations

Proceedings Article•DOI•

Incremental Learning of NCM Forests for Large-Scale Image Classification

[...]

Marko Ristin¹, Matthieu Guillaumin¹, Juergen Gall², Luc Van Gool¹•Institutions (2)

ETH Zurich¹, University of Bonn²

23 Jun 2014

TL;DR: This work introduces Nearest Class Mean Forests (NCMF), a variant of Random Forests where the decision nodes are based on nearest class mean (NCM) classification, and demonstrates that NCMFs not only outperform conventional random forests, but are also well suited for integrating new classes.

...read moreread less

Abstract: In recent years, large image data sets such as "ImageNet", "TinyImages" or ever-growing social networks like "Flickr" have emerged, posing new challenges to image classification that were not apparent in smaller image sets. In particular, the efficient handling of dynamically growing data sets, where not only the amount of training images, but also the number of classes increases over time, is a relatively unexplored problem. To remedy this, we introduce Nearest Class Mean Forests (NCMF), a variant of Random Forests where the decision nodes are based on nearest class mean (NCM) classification. NCMFs not only outperform conventional random forests, but are also well suited for integrating new classes. To this end, we propose and compare several approaches to incorporate data from new classes, so as to seamlessly extend the previously trained forest instead of re-training them from scratch. In our experiments, we show that NCMFs trained on small data sets with 10 classes can be extended to large data sets with 1000 classes without significant loss of accuracy compared to training from scratch on the full data.

...read moreread less

Journal Article•DOI•

Body Parts Dependent Joint Regressors for Human Pose Estimation in Still Images

[...]

Matthias Dantone¹, Juergen Gall², Christian Leistner³, Luc Van Gool¹•Institutions (3)

ETH Zurich¹, University of Bonn², Microsoft³

18 Apr 2014-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work introduces parts dependent body joint regressors which are random forests that operate over two layers that outperform independent classifiers or regressors of the state-of-the-art in terms of accuracy, while running with a couple of frames per second.

...read moreread less

Abstract: In this work, we address the problem of estimating 2d human pose from still images. Articulated body pose estimation is challenging due to the large variation in body poses and appearances of the different body parts. Recent methods that rely on the pictorial structure framework have shown to be very successful in solving this task. They model the body part appearances using discriminatively trained, independent part templates and the spatial relations of the body parts using a tree model. Within such a framework, we address the problem of obtaining better part templates which are able to handle a very high variation in appearance. To this end, we introduce parts dependent body joint regressors which are random forests that operate over two layers. While the first layer acts as an independent body part classifier, the second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This helps to overcome typical ambiguities of tree structures, such as self-similarities of legs and arms. In addition, we introduce a novel data set termed FashionPose that contains over 7,000 images with a challenging variation of body part appearances due to a large variation of dressing styles. In the experiments, we demonstrate that the proposed parts dependent joint regressors outperform independent classifiers or regressors. The method also performs better or similar to the state-of-the-art in terms of accuracy, while running with a couple of frames per second.

...read moreread less

Journal Article•DOI•

Adaptive and Weighted Collaborative Representations for image classification

[...]

Radu Timofte¹, Luc Van Gool², Luc Van Gool¹•Institutions (2)

Katholieke Universiteit Leuven¹, ETH Zurich²

01 Jul 2014-Pattern Recognition Letters

TL;DR: The Weighted Collaborative Representation Classifier (WCRC) improves the classification performance over that of the original formulation, while keeping the simplicity and the speed of the originally CRC-RLS formulation.

...read moreread less

Journal Article•DOI•

Markerless Vision-Based Augmented Reality for Urban Planning

[...]

Ludovico Carozza¹, David Tingdahl², Frédéric Bosché³, Luc Van Gool²•Institutions (3)

University of Bologna¹, Katholieke Universiteit Leuven², Heriot-Watt University³

01 Jan 2014-Computer-aided Civil and Infrastructure Engineering

TL;DR: Current state‐of‐the‐art visualization technologies are mainly fully virtual, while AR has the potential to enhance those visualizations by observing proposed designs directly within the real environment.

...read moreread less

Abstract: Augmented Reality (AR) is a rapidly develop- ing field with numerous potential applications. For ex- ample, building developers, public authorities, and other construction industry stakeholders need to visually as- sess potential new developments with regard to aesthet- ics, health and safety, and other criteria. Current state-of- the-art visualization technologies are mainly fully virtual, while AR has the potential to enhance those visualiza- tions by observing proposed designs directly within the real environment. A novel AR system is presented, that is most appropriate for urban applications. It is based on monocular vision, is markerless, and does not rely on beacon-based local- ization technologies (like GPS) or inertial sensors. Addi- tionally, the system automatically calculates occlusions of the built environment on the augmenting virtual objects. Three datasets from real environments presenting dif- ferent levels of complexity (geometrical complexity, tex- tures, occlusions) are used to demonstrate the perfor- mance of the proposed system. Videos augmented with our system are shown to provide realistic and valuable visualizations of proposed changes of the urban environ- ∗ To whom correspondence should be addressed. E-mail: f.n.bosche@

...read moreread less

Proceedings Article•DOI•

Fast, Approximate Piecewise-Planar Modeling Based on Sparse Structure-from-Motion and Superpixels

[...]

András Bódis-Szomorú¹, Hayko Riemenschneider¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

23 Jun 2014

TL;DR: This work presents a novel approach for producing dense reconstructions from multiple images and from the underlying sparse Structure-from-Motion (SfM) data in an efficient way and assumes piecewise planarity of man-made scenes and exploits both sparse visibility and a fast over-segmentation of the images.

...read moreread less

Abstract: State-of-the-art Multi-View Stereo (MVS) algorithms deliver dense depth maps or complex meshes with very high detail, and redundancy over regular surfaces. In turn, our interest lies in an approximate, but light-weight method that is better to consider for large-scale applications, such as urban scene reconstruction from ground-based images. We present a novel approach for producing dense reconstructions from multiple images and from the underlying sparse Structure-from-Motion (SfM) data in an efficient way. To overcome the problem of SfM sparsity and textureless areas, we assume piecewise planarity of man-made scenes and exploit both sparse visibility and a fast over-segmentation of the images. Reconstruction is formulated as an energy-driven, multi-view plane assignment problem, which we solve jointly over superpixels from all views while avoiding expensive photoconsistency computations. The resulting planar primitives--defined by detailed superpixel boundaries--are computed in about 10 seconds per image.

...read moreread less

Proceedings Article•DOI•

Scale-invariant line descriptors for wide baseline matching

[...]

Bart Verhagen, Radu Timofte¹, Luc Van Gool¹•Institutions (1)

Katholieke Universiteit Leuven¹

24 Mar 2014

TL;DR: Adding scale-invariance to line descriptors increases the accuracy when confronted with big scale changes and increases the number of inliers in the general case, both resulting in smaller calibration errors by means of RANSAC-like techniques and epipolar estimations.

...read moreread less

Abstract: In this paper we propose a method to add scale-invariance to line descriptors for wide baseline matching purposes. While finding point correspondences among different views is a well-studied problem, there still remain difficult cases where it performs poorly, such as textureless scenes, ambiguities and extreme transformations. For these cases using line segment correspondences is a valuable addition for finding sufficient matches. Our general method for adding scale-invariance to line segment descriptors consist of 5 basic rules. We apply these rules to enhance both the line descriptor described by Bay et al. [1] and the mean-standard deviation line descriptor (MSLD) proposed by Wang et al. [14]. Moreover, we examine the effect of the line descriptors when combined with the topological filtering method proposed by Bay et al. and the recent proposed graph matching strategy from K-VLD [6]. We validate the method using standard point correspondence benchmarks and more challenging new ones. Adding scale-invariance increases the accuracy when confronted with big scale changes and increases the number of inliers in the general case, both resulting in smaller calibration errors by means of RANSAC-like techniques and epipolar estimations.

...read moreread less

Proceedings Article•DOI•

The Synthesizability of Texture Examples

[...]

Dengxin Dai¹, Hayko Riemenschneider¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

23 Jun 2014

TL;DR: This work is the first attempt to quantify this image property, and it is found that texture synthesizability can be learned and predicted and used to trim images to parts that are more synthesizable.

...read moreread less

Abstract: Example-based texture synthesis (ETS) has been widely used to generate high quality textures of desired sizes from a small example However, not all textures are equally well reproducible that way We predict how synthesizable a particular texture is by ETS We introduce a dataset (21, 302 textures) of which all images have been annotated in terms of their synthesizability We design a set of texture features, such as 'textureness', homogeneity, repetitiveness, and irregularity, and train a predictor using these features on the data collection This work is the first attempt to quantify this image property, and we find that texture synthesizability can be learned and predicted We use this insight to trim images to parts that are more synthesizable Also we suggest which texture synthesis method is best suited to synthesise a given texture Our approach can be seen as 'winner-uses-all': picking one method among several alternatives, ending up with an overall superior ETS method Such strategy could also be considered for other vision tasks: rather than building an even stronger method, choose from existing methods based on some simple preprocessing

...read moreread less

Proceedings Article•DOI•

Gesture Recognition Portfolios for Personalization

[...]

Angela Yao¹, Luc Van Gool², Pushmeet Kohli³•Institutions (3)

ETH Zurich¹, Katholieke Universiteit Leuven², Microsoft³

23 Jun 2014

TL;DR: This paper addresses the problem of personalization in the context of gesture recognition, and proposes a novel and extremely efficient way of doing personalization that learns a set of classifiers during training, one of which is selected for each test subject based on the personalization data.

...read moreread less

Abstract: Human gestures, similar to speech and handwriting, are often unique to the individual. Training a generic classifier applicable to everyone can be very difficult and as such, it has become a standard to use personalized classifiers in speech and handwriting recognition. In this paper, we address the problem of personalization in the context of gesture recognition, and propose a novel and extremely efficient way of doing personalization. Unlike conventional personalization methods which learn a single classifier that later gets adapted, our approach learns a set (portfolio) of classifiers during training, one of which is selected for each test subject based on the personalization data. We formulate classifier personalization as a selection problem and propose several algorithms to compute the set of candidate classifiers. Our experiments show that such an approach is much more efficient than adapting the classifier parameters but can still achieve comparable or better results.

...read moreread less

Proceedings Article•DOI•

A unified framework for content-aware view selection and planning through view importance

[...]

Massimo Mauro¹, Hayko Riemenschneider², Alberto Signoroni¹, Riccardo Leonardi¹, Luc Van Gool³ - Show less +1 more•Institutions (3)

University of Brescia¹, ETH Zurich², Katholieke Universiteit Leuven³

01 Jan 2014

TL;DR: This research presents a parallel version of the TSP called TSP “TSP2” which was developed at the same time as TSP1, using the same underlying technology, but with different Brayko Riemenschneider code.

...read moreread less

Abstract: Massimo Mauro1 m.mauro001@unibs.it Hayko Riemenschneider2 http://www.vision.ee.ethz.ch/~rhayko/ Alberto Signoroni1 http://www.ing.unibs.it/~signoron/ Riccardo Leonardi1 http://www.ing.unibs.it/~leon/ Luc Van Gool2 http://www.vision.ee.ethz.ch/~vangool/ 1 Department of Information Engineering University of Brescia Brescia, Italia 2 Computer Vision Lab Swiss Federal Institute of Technology Zurich, Switzerland

...read moreread less

Proceedings Article•DOI•

Matching Features Correctly through Semantic Understanding

[...]

Nikolay Kobyshev¹, Hayko Riemenschneider¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

08 Dec 2014

TL;DR: This work introduces a way of capturing semantic scene context of a key point into a compact description and proposes to learn correct match ability of descriptors from these semantic contexts.

...read moreread less

Abstract: Image-to-image feature matching is the single most restrictive time bottleneck in any matching pipeline We propose two methods for improving the speed and quality by employing semantic scene segmentation First, we introduce a way of capturing semantic scene context of a key point into a compact description Second, we propose to learn correct match ability of descriptors from these semantic contexts Finally, we further reduce the complexity of matching to only a pre-computed set of semantically close key points All methods can be used independently and in the evaluation we show combinations for maximum speed benefits Overall, our proposed methods outperform all baselines and provide significant improvements in accuracy and an order of magnitude faster key point matching

...read moreread less

Journal Article•DOI•

Derivative-Based Scale Invariant Image Feature Detector With Error Resilience

[...]

Pradip Mainali¹, Gauthier Lafruit¹, Klaas Tack¹, Luc Van Gool¹, Rudy Lauwereins¹ - Show less +1 more•Institutions (1)

Katholieke Universiteit Leuven¹

07 Apr 2014-IEEE Transactions on Image Processing

TL;DR: A novel scale-invariant image feature detection algorithm (D-SIFER) using a newly proposed scale-space optimal 10th-order Gaussian derivative (GDO-10) filter, which reaches the jointly optimal Heisenberg's uncertainty of its impulse response in scale and space simultaneously.

...read moreread less

Abstract: We present a novel scale-invariant image feature detection algorithm (D-SIFER) using a newly proposed scale-space optimal 10th-order Gaussian derivative (GDO-10) filter, which reaches the jointly optimal Heisenberg's uncertainty of its impulse response in scale and space simultaneously (i.e., we minimize the maximum of the two moments). The D-SIFER algorithm using this filter leads to an outstanding quality of image feature detection, with a factor of three quality improvement over state-of-the-art scale-invariant feature transform (SIFT) and speeded up robust features (SURF) methods that use the second-order Gaussian derivative filters. To reach low computational complexity, we also present a technique approximating the GDO-10 filters with a fixed-length implementation, which is independent of the scale. The final approximation error remains far below the noise margin, providing constant time, low cost, but nevertheless high-quality feature detection and registration capabilities. D-SIFER is validated on a real-life hyperspectral image registration application, precisely aligning up to hundreds of successive narrowband color images, despite their strong artifacts (blurring, low-light noise) typically occurring in such delicate optical system setups.

...read moreread less

Proceedings Article•DOI•

Ground Plane Estimation Using a Hidden Markov Model

[...]

Ralf Dragon¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

23 Jun 2014

TL;DR: The problem of estimating the ground plane orientation and location in monocular video sequences from a moving observer is formulated as a state-continuous Hidden Markov Model (HMM) where the hidden state contains t and n and may be estimated by sampling and decomposing homographies.

...read moreread less

Abstract: We focus on the problem of estimating the ground plane orientation and location in monocular video sequences from a moving observer. Our only assumptions are that the 3D ego motion t and the ground plane normal n are orthogonal, and that n and t are smooth over time. We formulate the problem as a state-continuous Hidden Markov Model (HMM) where the hidden state contains t and n and may be estimated by sampling and decomposing homographies. We show that using blocked Gibbs sampling, we can infer the hidden state with high robustness towards outliers, drifting trajectories, rolling shutter and an imprecise intrinsic calibration. Since our approach does not need any initial orientation prior, it works for arbitrary camera orientations in which the ground is visible.

...read moreread less

Book Chapter•DOI•

Robust Visual Tracking with Double Bounding Box Model

[...]

Junseok Kwon¹, Junha Roh², Kyoung Mu Lee³, Luc Van Gool¹•Institutions (3)

ETH Zurich¹, Kigali Institute of Science and Technology², Seoul National University³

06 Sep 2014

TL;DR: A novel tracking algorithm that can track a highly non-rigid targets accurately and robustly, and outperforms state-of-the-art methods is proposed using a new bounding box representation called the Double Bounding Box (DBB).

...read moreread less

Abstract: A novel tracking algorithm that can track a highly non-rigid target robustly is proposed using a new bounding box representation called the Double Bounding Box (DBB). In the DBB, a target is described by the combination of the Inner Bounding Box (IBB) and the Outer Bounding Box (OBB). Then our objective of visual tracking is changed to find the IBB and OBB instead of a single bounding box, where the IBB and OBB can be easily obtained by the Dempster-Shafer (DS) theory. If the target is highly non-rigid, any single bounding box cannot include all foreground regions while excluding all background regions. Using the DBB, our method does not directly handle the ambiguous regions, which include both the foreground and background regions. Hence, it can solve the inherent ambiguity of the single bounding box representation and thus can track highly non-rigid targets robustly. Our method finally finds the best state of the target using a new Constrained Markov Chain Monte Carlo (CMCMC)-based sampling method with the constraint that the OBB should include the IBB. Experimental results show that our method tracks non-rigid targets accurately and robustly, and outperforms state-of-the-art methods.

...read moreread less

Book Chapter•DOI•

Video Registration to SfM Models

[...]

Till Kroeger¹, Luc Van Gool¹, Luc Van Gool²•Institutions (2)

ETH Zurich¹, Katholieke Universiteit Leuven²

06 Sep 2014

TL;DR: In this paper, the authors incorporate temporal constraints into the image-based registration setting and solve the problem by pose regularization with model fitting and smoothing methods, which leads to accurate, gap-free and smooth poses for all frames.

...read moreread less

Abstract: Registering image data to Structure from Motion (SfM) point clouds is widely used to find precise camera location and orientation with respect to a world model. In case of videos one constraint has previously been unexploited: temporal smoothness. Without temporal smoothness the magnitude of the pose error in each frame of a video will often dominate the magnitude of frame-to-frame pose change. This hinders application of methods requiring stable poses estimates (e.g. tracking, augmented reality). We incorporate temporal constraints into the image-based registration setting and solve the problem by pose regularization with model fitting and smoothing methods. This leads to accurate, gap-free and smooth poses for all frames. We evaluate different methods on challenging synthetic and real street-view SfM data for varying scenarios of motion speed, outlier contamination, pose estimation failures and 2D-3D correspondence noise. For all test cases a 2 to 60-fold reduction in root mean squared (RMS) positional error is observed, depending on pose estimation difficulty. For varying scenarios, different methods perform best. We give guidance which methods should be preferred depending on circumstances and requirements.

...read moreread less

Book•DOI•

Multi-view tracking of multiple targets with dynamic cameras

[...]

Till Kroeger, Ralf Dragon, Luc Van Gool

01 Jan 2014-Lecture Notes in Computer Science

TL;DR: In this paper, a tracking-by-detection algorithm for multiple targets from multiple dynamic, unlocalized and unconstrained cameras is proposed, which can effectively deal with independently moving cameras and camera registration noise.

...read moreread less

Abstract: We propose a new tracking-by-detection algorithm for multiple targets from multiple dynamic, unlocalized and unconstrained cameras. In the past tracking has either been done with multiple static cameras, or single and stereo dynamic cameras. We register several moving cameras using a given 3D model from Structure from Motion (SfM), and initialize the tracking given the registration. The camera uncertainty estimate can be efficiently incorporated into a flow-network formulation for tracking. As this is a novel task in the tracking domain, we evaluate our method on a new challenging dataset for tracking with multiple moving cameras and show that our tracking method can effectively deal with independently moving cameras and camera registration noise.

...read moreread less

Book Chapter•DOI•

Motion Segmentation with Weak Labeling Priors

[...]

Hodjat Rahmati¹, Ralf Dragon², Ole Morten Aamo¹, Luc Van Gool², Lars Adde¹ - Show less +1 more•Institutions (2)

Norwegian University of Science and Technology¹, ETH Zurich²

02 Sep 2014

TL;DR: This work proposes a solution based on a single video camera, that is not only far less intrusive, but also a lot cheaper, and outperforms current motion segmentation and tracking approaches for Cerebral Palsy detection.

...read moreread less

Abstract: Motions of organs or extremities are important features for clinical diagnosis. However, tracking and segmentation of complex, quickly changing motion patterns is challenging, certainly in the presence of occlusions. Neither state-of-the-art tracking nor motion segmentation approaches are able to deal with such cases. Thus far, motion capture systems or the like were needed which are complicated to handle and which impact on the movements. We propose a solution based on a single video camera, that is not only far less intrusive, but also a lot cheaper. The limitation of tracking and motion segmentation are overcome by a new approach to integrate prior knowledge in the form of weak labeling into motion segmentation. Using the example of Cerebral Palsy detection, we segment motion patterns of infants into the different body parts by analyzing body movements. Our experimental results show that our approach outperforms current motion segmentation and tracking approaches.

...read moreread less

Proceedings Article•DOI•

Using a Deformation Field Model for Localizing Faces and Facial Points under Weak Supervision

[...]

Marco Pedersoli¹, Radu Timofte¹, Tinne Tuytelaars¹, Luc Van Gool¹•Institutions (1)

Katholieke Universiteit Leuven¹

23 Jun 2014

TL;DR: This work extends the mixtures from trees to more general loopy graphs and can localize facial points with an accuracy similar to fully supervised approaches without any facial point annotation at the level of individual training images.

...read moreread less

Abstract: Face detection and facial points localization are interconnected tasks. Recently it has been shown that solving these two tasks jointly with a mixture of trees of parts (MTP) leads to state-of-the-art results. However, MTP, as most other methods for facial point localization proposed so far, requires a complete annotation of the training data at facial point level. This is used to predefine the structure of the trees and to place the parts correctly. In this work we extend the mixtures from trees to more general loopy graphs. In this way we can learn in a weakly supervised manner (using only the face location and orientation) a powerful deformable detector that implicitly aligns its parts to the detected face in the image. By attaching some reference points to the correct parts of our detector we can then localize the facial points. In terms of detection our method clearly outperforms the state-of-the-art, even if competing with methods that use facial point annotations during training. Additionally, without any facial point annotation at the level of individual training images, our method can localize facial points with an accuracy similar to fully supervised approaches.

...read moreread less

Proceedings Article•DOI•

An Integer Linear Programming Model for View Selection on Overlapping Camera Clusters

[...]

Massimo Mauro¹, Hayko Riemenschneider², Alberto Signoroni¹, Riccardo Leonardi¹, Luc Van Gool² - Show less +1 more•Institutions (2)

University of Brescia¹, ETH Zurich²

08 Dec 2014

TL;DR: A novel formulation for view selection is proposed, where cameras are modeled with binary variables, while the linear constraints enforce the completeness of the 3D reconstruction, and the solution of the ILP leads to an optimal subset of selected cameras.

...read moreread less

Abstract: Multi-View Stereo (MVS) algorithms scale poorly on large image sets, and quickly become unfeasible to run on a single machine with limited memory. Typical solutions to lower the complexity include reducing the redundancy of the image set (view selection), and dividing the image set in groups to be processed independently (view clustering). A novel formulation for view selection is proposed here. We express the problem with an Integer Linear Programming (ILP) model, where cameras are modeled with binary variables, while the linear constraints enforce the completeness of the 3D reconstruction. The solution of the ILP leads to an optimal subset of selected cameras. As a second contribution, we integrate ILP camera selection with a view clustering approach which exploits Leveraged Affinity Propagation (LAP). LAP clustering can efficiently deal with large camera sets. We adapt the original algorithm so that it provides a set of overlapping clusters where the minimum and maximum sizes and the number of overlapping cameras can be specified. Evaluations on four different dataset show our solution provides significant complexity reductions and guarantees near-perfect coverage, making large reconstructions feasible even on a single machine.

...read moreread less

Journal Article•

Learning where to classify in multi-view semantic segmentation

[...]

Hayko Riemenschneider, András Bódis-Szomorú, Julien Weissenberg, Luc Van Gool

01 Jan 2014-Lecture Notes in Computer Science

TL;DR: An alternative approach that exploits the geometry of a 3D mesh model obtained from multi-view reconstruction, and predicts the best view before the actual labelling to increase the accuracy of the model labelling when compared to approaches that fuse the labels from multiple images.

...read moreread less

Proceedings Article•

Quantized Kernel Learning for Feature Matching

[...]

Danfeng Qin¹, Xuan-Li Chen², Matthieu Guillaumin¹, Luc Van Gool¹•Institutions (2)

ETH Zurich¹, Technische Universität München²

08 Dec 2014

TL;DR: A simple and flexible family of non-linear kernels which are arbitrary kernels in the index space of a data quantizer, i.e., piecewise constant similarities in the original feature space that grant access to Euclidean geometry for uncompressed features are introduced.

...read moreread less

Abstract: Matching local visual features is a crucial problem in computer vision and its accuracy greatly depends on the choice of similarity measure. As it is generally very difficult to design by hand a similarity or a kernel perfectly adapted to the data of interest, learning it automatically with as few assumptions as possible is preferable. However, available techniques for kernel learning suffer from several limitations, such as restrictive parametrization or scalability. In this paper, we introduce a simple and flexible family of non-linear kernels which we refer to as Quantized Kernels (QK). QKs are arbitrary kernels in the index space of a data quantizer, i.e., piecewise constant similarities in the original feature space. Quantization allows to compress features and keep the learning tractable. As a result, we obtain state-of-the-art matching performance on a standard benchmark dataset with just a few bits to represent each feature dimension. QKs also have explicit non-linear, low-dimensional feature mappings that grant access to Euclidean geometry for uncompressed features.

...read moreread less