Showing papers by "Luc Van Gool published in 2015"

PDF

Open Access

Journal Article•DOI•

The Pascal Visual Object Classes Challenge: A Retrospective

[...]

Mark Everingham¹, S. M. Eslami², Luc Van Gool³, Christopher Williams⁴, John Winn², Andrew Zisserman⁵ - Show less +2 more•Institutions (5)

University of Leeds¹, Microsoft², ETH Zurich³, University of Edinburgh⁴, University of Oxford⁵

01 Jan 2015-International Journal of Computer Vision

TL;DR: A review of the Pascal Visual Object Classes challenge from 2008-2012 and an appraisal of the aspects of the challenge that worked well, and those that could be improved in future challenges.

...read moreread less

Abstract: The Pascal Visual Object Classes (VOC) challenge consists of two components: (i) a publicly available dataset of images together with ground truth annotation and standardised evaluation software; and (ii) an annual competition and workshop. There are five challenges: classification, detection, segmentation, action classification, and person layout. In this paper we provide a review of the challenge from 2008---2012. The paper is intended for two audiences: algorithm designers, researchers who want to see what the state of the art is, as measured by performance on the VOC datasets, along with the limitations and weak points of the current generation of algorithms; and, challenge designers, who want to see what we as organisers have learnt from the process and our recommendations for the organisation of future challenges. To analyse the performance of submitted algorithms on the VOC datasets we introduce a number of novel evaluation methods: a bootstrapping method for determining whether differences in the performance of two algorithms are significant or not; a normalised average precision so that performance can be compared across classes with different proportions of positive instances; a clustering method for visualising the performance across multiple algorithms so that the hard and easy images can be identified; and the use of a joint classifier over the submitted algorithms in order to measure their complementarity and combined performance. We also analyse the community's progress through time using the methods of Hoiem et al. (Proceedings of European Conference on Computer Vision, 2012) to identify the types of occurring errors. We conclude the paper with an appraisal of the aspects of the challenge that worked well, and those that could be improved in future challenges.

...read moreread less

6,061 citations

Proceedings Article•DOI•

DEX: Deep EXpectation of Apparent Age from a Single Image

[...]

Rasmus Rothe¹, Radu Timofte¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

07 Dec 2015

TL;DR: The proposed method, Deep EXpectation (DEX) of apparent age, first detects the face in the test image and then extracts the CNN predictions from an ensemble of 20 networks on the cropped face, significantly outperforming the human reference.

...read moreread less

Abstract: In this paper we tackle the estimation of apparent age in still face images with deep learning. Our convolutional neural networks (CNNs) use the VGG-16 architecture [13] and are pretrained on ImageNet for image classification. In addition, due to the limited number of apparent age annotated images, we explore the benefit of finetuning over crawled Internet face images with available age. We crawled 0.5 million images of celebrities from IMDB and Wikipedia that we make public. This is the largest public dataset for age prediction to date. We pose the age regression problem as a deep classification problem followed by a softmax expected value refinement and show improvements over direct regression training of CNNs. Our proposed method, Deep EXpectation (DEX) of apparent age, first detects the face in the test image and then extracts the CNN predictions from an ensemble of 20 networks on the cropped face. The CNNs of DEX were finetuned on the crawled images and then on the provided images with apparent age annotations. DEX does not use explicit facial landmarks. Our DEX is the winner (1st place) of the ChaLearn LAP 2015 challenge on apparent age estimation with 115 registered teams, significantly outperforming the human reference.

...read moreread less

603 citations

Proceedings Article•DOI•

Video summarization by learning submodular mixtures of objectives

[...]

Michael Gygli¹, Helmut Grabner¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

07 Jun 2015

TL;DR: A new method is introduced that uses a supervised approach in order to learn the importance of global characteristics of a summary and jointly optimizes for multiple objectives and thus creates summaries that posses multiple properties of a good summary.

...read moreread less

Abstract: We present a novel method for summarizing raw, casually captured videos. The objective is to create a short summary that still conveys the story. It should thus be both, interesting and representative for the input video. Previous methods often used simplified assumptions and only optimized for one of these goals. Alternatively, they used handdefined objectives that were optimized sequentially by making consecutive hard decisions. This limits their use to a particular setting. Instead, we introduce a new method that (i) uses a supervised approach in order to learn the importance of global characteristics of a summary and (ii) jointly optimizes for multiple objectives and thus creates summaries that posses multiple properties of a good summary. Experiments on two challenging and very diverse datasets demonstrate the effectiveness of our method, where we outperform or match current state-of-the-art.

...read moreread less

452 citations

Posted Content•

Seven ways to improve example-based single image super resolution

[...]

Radu Timofte¹, Rasmus Rothe¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

06 Nov 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: The Improved A+ (IA) method sets new stateof-the-art results outperforming A+ by up to 0.9dB on average PSNR whilst maintaining a low time complexity.

...read moreread less

Abstract: In this paper we present seven techniques that everybody should know to improve example-based single image super resolution (SR): 1) augmentation of data, 2) use of large dictionaries with efficient search structures, 3) cascading, 4) image self-similarities, 5) back projection refinement, 6) enhanced prediction by consistency check, and 7) context reasoning. We validate our seven techniques on standard SR benchmarks (i.e. Set5, Set14, B100) and methods (i.e. A+, SRCNN, ANR, Zeyde, Yang) and achieve substantial improvements.The techniques are widely applicable and require no changes or only minor adjustments of the SR methods. Moreover, our Improved A+ (IA) method sets new state-of-the-art results outperforming A+ by up to 0.9dB on average PSNR whilst maintaining a low time complexity.

...read moreread less

290 citations

Journal Article•DOI•

SEEDS: Superpixels Extracted Via Energy-Driven Sampling

[...]

Michael Van den Bergh¹, Xavier Boix¹, Gemma Roig¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

01 Feb 2015-International Journal of Computer Vision

TL;DR: A robust and fast to evaluate energy function is defined, based on enforcing color similarity between the boundaries and the superpixel color histogram, which is able to achieve a performance comparable to the state-of-the-art, but in real-time on a single Intel i7 CPU at 2.8 GHz.

...read moreread less

Abstract: Superpixel algorithms aim to over-segment the image by grouping pixels that belong to the same object. Many state-of-the-art superpixel algorithms rely on minimizing objective functions to enforce color homogeneity. The optimization is accomplished by sophisticated methods that progressively build the superpixels, typically by adding cuts or growing superpixels. As a result, they are computationally too expensive for real-time applications. We introduce a new approach based on a simple hill-climbing optimization. Starting from an initial superpixel partitioning, it continuously refines the superpixels by modifying the boundaries. We define a robust and fast to evaluate energy function, based on enforcing color similarity between the boundaries and the superpixel color histogram. In a series of experiments, we show that we achieve an excellent compromise between accuracy and efficiency. We are able to achieve a performance comparable to the state-of-the-art, but in real-time on a single Intel i7 CPU at 2.8 GHz.

...read moreread less

168 citations

Proceedings Article•DOI•

3D all the way: Semantic segmentation of urban scenes from start to end in 3D

[...]

Andelo Martinovic¹, Jan Knopp¹, Hayko Riemenschneider², Luc Van Gool¹•Institutions (2)

Katholieke Universiteit Leuven¹, ETH Zurich²

07 Jun 2015

TL;DR: It is shown that a properly trained pure-3D approach produces high quality labelings, with significant speed benefits allowing us to analyze entire streets in a matter of minutes, and a novel facade separation based on semantic nuances between facades is proposed.

...read moreread less

Abstract: We propose a new approach for semantic segmentation of 3D city models. Starting from an SfM reconstruction of a street-side scene, we perform classification and facade splitting purely in 3D, obviating the need for slow image-based semantic segmentation methods. We show that a properly trained pure-3D approach produces high quality labelings, with significant speed benefits (20x faster) allowing us to analyze entire streets in a matter of minutes. Additionally, if speed is not of the essence, the 3D labeling can be combined with the results of a state-of-the-art 2D classifier, further boosting the performance. Further, we propose a novel facade separation based on semantic nuances between facades. Finally, inspired by the use of architectural principles for 2D facade labeling, we propose new 3D-specific principles and an efficient optimization scheme based on an integer quadratic programming formulation.

...read moreread less

148 citations

Proceedings Article•DOI•

DeepProposal: Hunting Objects by Cascading Deep Convolutional Layers

[...]

Amir Ghodrati¹, Ali Diba¹, Marco Pedersoli², Tinne Tuytelaars¹, Luc Van Gool¹ - Show less +1 more•Institutions (2)

Katholieke Universiteit Leuven¹, University of Grenoble²

07 Dec 2015

TL;DR: In this article, an inverse coarse-to-fine cascade is proposed to select the most promising object locations and refine their boxes in a coarse to-fine manner, which combines the best of both worlds.

...read moreread less

Abstract: In this paper we evaluate the quality of the activation layers of a convolutional neural network (CNN) for the generation of object proposals. We generate hypotheses in a sliding-window fashion over different activation layers and show that the final convolutional layers can find the object of interest with high recall but poor localization due to the coarseness of the feature maps. Instead, the first layers of the network can better localize the object of interest but with a reduced recall. Based on this observation we design a method for proposing object locations that is based on CNN features and that combines the best of both worlds. We build an inverse cascade that, going from the final to the initial convolutional layers of the CNN, selects the most promising object locations and refines their boxes in a coarse-to-fine manner. The method is efficient, because i) it uses the same features extracted for detection, ii) it aggregates features using integral images, and iii) it avoids a dense evaluation of the proposals due to the inverse coarse-to-fine cascade. The method is also accurate, it outperforms most of the previously proposed object proposals approaches and when plugged into a CNN-based detector produces state-of-the-art detection performance.

...read moreread less

111 citations

Posted Content•

DeepProposal: Hunting Objects by Cascading Deep Convolutional Layers

[...]

Amir Ghodrati¹, Ali Diba¹, Marco Pedersoli², Tinne Tuytelaars¹, Luc Van Gool¹ - Show less +1 more•Institutions (2)

Katholieke Universiteit Leuven¹, University of Grenoble²

15 Oct 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: An inverse cascade is built that, going from the final to the initial convolutional layers of the CNN, selects the most promising object locations and refines their boxes in a coarse-to-fine manner and is efficient.

...read moreread less

Abstract: In this paper we evaluate the quality of the activation layers of a convolutional neural network (CNN) for the gen- eration of object proposals. We generate hypotheses in a sliding-window fashion over different activation layers and show that the final convolutional layers can find the object of interest with high recall but poor localization due to the coarseness of the feature maps. Instead, the first layers of the network can better localize the object of interest but with a reduced recall. Based on this observation we design a method for proposing object locations that is based on CNN features and that combines the best of both worlds. We build an inverse cascade that, going from the final to the initial convolutional layers of the CNN, selects the most promising object locations and refines their boxes in a coarse-to-fine manner. The method is efficient, because i) it uses the same features extracted for detection, ii) it aggregates features using integral images, and iii) it avoids a dense evaluation of the proposals due to the inverse coarse-to-fine cascade. The method is also accurate; it outperforms most of the previously proposed object proposals approaches and when plugged into a CNN-based detector produces state-of-the- art detection performance.

...read moreread less

110 citations

Posted Content•

Is Image Super-resolution Helpful for Other Vision Tasks?

[...]

Dengxin Dai¹, Yujian Wang¹, Yuhua Chen¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

23 Sep 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: Zhang et al. as discussed by the authors presented the first comprehensive study and analysis of the usefulness of super-resolution for other vision applications, including edge detection, semantic image segmentation, digit recognition, and scene recognition.

...read moreread less

Abstract: Despite the great advances made in the field of image super-resolution (ISR) during the last years, the performance has merely been evaluated perceptually. Thus, it is still unclear whether ISR is helpful for other vision tasks. In this paper, we present the first comprehensive study and analysis of the usefulness of ISR for other vision applications. In particular, six ISR methods are evaluated on four popular vision tasks, namely edge detection, semantic image segmentation, digit recognition, and scene recognition. We show that applying ISR to input images of other vision systems does improve their performance when the input images are of low-resolution. We also study the correlation between four standard perceptual evaluation criteria (namely PSNR, SSIM, IFC, and NQM) and the usefulness of ISR to the vision tasks. Experiments show that they correlate well with each other in general, but perceptual criteria are still not accurate enough to be used as full proxies for the usefulness. We hope this work will inspire the community to evaluate ISR methods also in real vision applications, and to adopt ISR as a pre-processing step of other vision tasks if the resolution of their input images is low.

...read moreread less

74 citations

Proceedings Article•DOI•

Sparse Flow: Sparse Matching for Small to Large Displacement Optical Flow

[...]

Radu Timofte, Luc Van Gool¹•Institutions (1)

Katholieke Universiteit Leuven¹

05 Jan 2015

TL;DR: This work first extracts sparse pixel correspondences by means of a matching procedure and then applies a variational approach to obtain a refined optical flow, coined 'Sparse Flow', which is competitive on standard optical flow benchmarks with large displacements, while showing excellent performance for small and medium displacements.

...read moreread less

Abstract: Despite recent advances, the extraction of optical flow with large displacements is still challenging for state-of the-art methods. The approaches that are the most successful at handling large displacements blend sparse correspondences from a matching algorithm with an optimization that refines the optical flow. We follow the scheme of Deep-Flow [33]. We first extract sparse pixel correspondences by means of a matching procedure and then apply a variational approach to obtain a refined optical flow. In our approach, coined 'Sparse Flow', the novelty lies in the matching. This uses an efficient sparse decomposition of a pixel's surrounding patch as a linear sum of those found around candidate corresponding pixels. As matching pixel the one dominating the decomposition is chosen. The pixel pairs matching in both directions, i.e. in a forward-backward fashion, are used as guiding points in the variational approach. Sparse-Flow is competitive on standard optical flow benchmarks with large displacements, while showing excellent performance for small and medium displacements. Moreover, it is fast in comparison to methods with a similar performance.

...read moreread less

73 citations

Proceedings Article•DOI•

Superpixel meshes for fast edge-preserving surface reconstruction

[...]

András Bódis-Szomorú¹, Hayko Riemenschneider¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

07 Jun 2015

TL;DR: This work proposes a novel surface reconstruction method based on image edges, superpixels and second-order smoothness constraints, producing meshes comparable to classic MVS surfaces in quality but orders of magnitudes faster.

...read moreread less

Abstract: Multi-View-Stereo (MVS) methods aim for the highest detail possible, however, such detail is often not required. In this work, we propose a novel surface reconstruction method based on image edges, superpixels and second-order smoothness constraints, producing meshes comparable to classic MVS surfaces in quality but orders of magnitudes faster. Our method performs per-view dense depth optimization directly over sparse 3D Ground Control Points (GCPs), hence, removing the need for view pairing, image rectification, and stereo depth estimation, and allowing for full per-image parallelization. We use Structure-from-Motion (SfM) points as GCPs, but the method is not specific to these, e.g. LiDAR or RGB-D can also be used. The resulting meshes are compact and inherently edge-aligned with image gradients, enabling good-quality lightweight per-face flat renderings. Our experiments demonstrate on a variety of 3D datasets the superiority in speed and competitive surface quality.

...read moreread less

Posted Content•

Some like it hot - visual guidance for preference prediction

[...]

Rasmus Rothe¹, Radu Timofte¹, Luc Van Gool²•Institutions (2)

ETH Zurich¹, Katholieke Universiteit Leuven²

27 Oct 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work improves the state of-the-art, but also predicts - based on someone's known preferences - how much that particular person is attracted to a novel face, and validates the collaborative filtering solution on the standard MovieLens rating dataset.

...read moreread less

Abstract: For people first impressions of someone are of determining importance. They are hard to alter through further information. This begs the question if a computer can reach the same judgement. Earlier research has already pointed out that age, gender, and average attractiveness can be estimated with reasonable precision. We improve the state-of-the-art, but also predict - based on someone's known preferences - how much that particular person is attracted to a novel face. Our computational pipeline comprises a face detector, convolutional neural networks for the extraction of deep features, standard support vector regression for gender, age and facial beauty, and - as the main novelties - visual regularized collaborative filtering to infer inter-person preferences as well as a novel regression technique for handling visual queries without rating history. We validate the method using a very large dataset from a dating site as well as images from celebrities. Our experiments yield convincing results, i.e. we predict 76% of the ratings correctly solely based on an image, and reveal some sociologically relevant conclusions. We also validate our collaborative filtering solution on the standard MovieLens rating dataset, augmented with movie posters, to predict an individual's movie rating. We demonstrate our algorithms on this http URL which went viral around the Internet with more than 50 million pictures evaluated in the first month.

...read moreread less

Proceedings Article•DOI•

Boosting Object Proposals: From Pascal to COCO

[...]

Jordi Pont-Tuset¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

07 Dec 2015

TL;DR: This paper studies the transition from the Pascal Visual Object Challenge dataset to the updated, bigger, and more challenging Microsoft Common Objects in Context, and proposes various lines of research to take advantage of the new benchmark and improve the techniques.

...read moreread less

Abstract: Computer vision in general, and object proposals in particular, are nowadays strongly influenced by the databases on which researchers evaluate the performance of their algorithms. This paper studies the transition from the Pascal Visual Object Challenge dataset, which has been the benchmark of reference for the last years, to the updated, bigger, and more challenging Microsoft Common Objects in Context. We first review and deeply analyze the new challenges, and opportunities, that this database presents. We then survey the current state of the art in object proposals and evaluate it focusing on how it generalizes to the new dataset. In sight of these results, we propose various lines of research to take advantage of the new benchmark and improve the techniques. We explore one of these lines, which leads to an improvement over the state of the art of +5.2%.

...read moreread less

Proceedings Article•DOI•

From categories to subcategories: Large-scale image classification with partial class label refinement

[...]

Marko Ristin¹, Juergen Gall², Matthieu Guillaumin¹, Luc Van Gool¹•Institutions (2)

ETH Zurich¹, University of Bonn²

07 Jun 2015

TL;DR: This work investigates how coarse category labels can be used to improve the classification of subcategories and adopts the framework of Random Forests and proposes a regularized objective function that takes into account relations between categories and subc categories.

...read moreread less

Abstract: The number of digital images is growing extremely rapidly, and so is the need for their classification. But, as more images of pre-defined categories become available, they also become more diverse and cover finer semantic differences. Ultimately, the categories themselves need to be divided into subcategories to account for that semantic refinement. Image classification in general has improved significantly over the last few years, but it still requires a massive amount of manually annotated data. Subdividing categories into subcategories multiples the number of labels, aggravating the annotation problem. Hence, we can expect the annotations to be refined only for a subset of the already labeled data, and exploit coarser labeled data to improve classification. In this work, we investigate how coarse category labels can be used to improve the classification of subcategories. To this end, we adopt the framework of Random Forests and propose a regularized objective function that takes into account relations between categories and subcategories. Compared to approaches that disregard the extra coarse labeled data, we achieve a relative improvement in subcategory classification accuracy of up to 22% in our large-scale image classification experiments.

...read moreread less

Proceedings Article•DOI•

Metric imitation by manifold transfer for efficient vision applications

[...]

Dengxin Dai¹, Till Kroeger¹, Radu Timofte¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

07 Jun 2015

TL;DR: Experiments show that MI is able to provide good metrics while avoiding expensive data labeling efforts and that it achieves state-of-the-art performance for image super-resolution.

...read moreread less

Abstract: Metric learning has proved very successful. However, human annotations are necessary. In this paper, we propose an unsupervised method, dubbed Metric Imitation (MI), where metrics over cheap features (target features, TFs) are learned by imitating the standard metrics over more sophisticated, off-the-shelf features (source features, SFs) by transferring view-independent property manifold structures. In particular, MI consists of: 1) quantifying the properties of source metrics as manifold geometry, 2) transferring the manifold from source domain to target domain, and 3) learning a mapping of TFs so that the manifold is approximated as well as possible in the mapped feature domain. MI is useful in at least two scenarios where: 1) TFs are more efficient computationally and in terms of memory than SFs; and 2) SFs contain privileged information, but are not available during testing. For the former, MI is evaluated on image clustering, category-based image retrieval, and instance-based object retrieval, with three SFs and three TFs. For the latter, MI is tested on the task of example-based image super-resolution, where high-resolution patches are taken as SFs and low-resolution patches as TFs. Experiments show that MI is able to provide good metrics while avoiding expensive data labeling efforts and that it achieves state-of-the-art performance for image super-resolution. In addition, manifold transfer is an interesting direction of transfer learning.

...read moreread less

Proceedings Article•DOI•

Joint vanishing point extraction and tracking

[...]

Till Kroeger¹, Dengxin Dai¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

07 Jun 2015

TL;DR: A novel vanishing point (VP) detection and tracking algorithm for calibrated monocular image sequences by combining VP extraction on a Gaussian sphere with recent advances in multi-target tracking on probabilistic occupancy fields is presented.

...read moreread less

Abstract: We present a novel vanishing point (VP) detection and tracking algorithm for calibrated monocular image sequences. Previous VP detection and tracking methods usually assume known camera poses for all frames or detect and track separately. We advance the state-of-the-art by combining VP extraction on a Gaussian sphere with recent advances in multi-target tracking on probabilistic occupancy fields. The solution is obtained by solving a Linear Program (LP). This enables the joint detection and tracking of multiple VPs over sequences. Unlike existing works we do not need known camera poses, and at the same time avoid detecting and tracking in separate steps. We also propose an extension to enforce VP orthogonality. We augment an existing video dataset consisting of 48 monocular videos with multiple annotated VPs in 14448 frames for evaluation. Although the method is designed for unknown camera poses, it is also helpful in scenarios with known poses, since a multi-frame approach in VP detection helps to regularize in frames with weak VP line support.

...read moreread less

Journal Article•DOI•

Weakly supervised motion segmentation with particle matching

[...]

Hodjat Rahmati¹, Ralf Dragon², Ole Morten Aamo¹, Lars Adde¹, Øyvind Stavdahl¹, Luc Van Gool² - Show less +2 more•Institutions (2)

Norwegian University of Science and Technology¹, ETH Zurich²

01 Nov 2015-Computer Vision and Image Understanding

TL;DR: A particle matching technique is used to reduce the dependency on prior knowledge in a semi-supervised motion segmentation algorithm by automatically matching particles in frames over which fast motion or occlusion occur.

...read moreread less

Journal Article•DOI•

Iterative Nearest Neighbors

[...]

Radu Timofte¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

01 Jan 2015-Pattern Recognition

TL;DR: A novel sparse representation, the Iterative Nearest Neighbors (INN), that combines the power of SR and LLE with the computational simplicity of k NN is proposed and proves on par or better performance with MP and OMP on sparse signal recovery task.

...read moreread less

Proceedings Article•DOI•

Make my day - high-fidelity color denoising with Near-Infrared

[...]

Hiroto Honda¹, Radu Timofte², Luc Van Gool³•Institutions (3)

Toshiba¹, ETH Zurich², Katholieke Universiteit Leuven³

07 Jun 2015

TL;DR: The algorithm that is proposed - coined `Make My Day' or MMD for short - is akin to the previously published BM3D denoising algorithm and outperforms other state-of-art Denoising methods in terms of PSNR, texture quality, and color fidelity.

...read moreread less

Abstract: We address the task of restoring RGB images taken under low illumination (e.g. night time), when an aligned near infrared (NIR or simply N) image taken under stronger NIR illumination is available. Such restoration holds the promise that algorithms designed to work under daylight conditions could be used around the clock. Increasingly, RGBN cameras are becoming available, as car cameras tend to include a Near-Infrared (N) band, next to R, G, and B bands, and NIR artificial lighting is applied. Under low lighting conditions, the NIR band is less noisy than the others and this is all the more the case if stronger illumination is only available in the NIR band. We address the task of restoring the R, G, and B bands on the basis of the NIR band in such cases. Even if the NIR band is less strongly correlated with the R, G, and B bands than these bands are mutually, there is sufficient such correlation to pick up important textural and gradient information in the NIR band and inject it into the others. The algorithm that we propose - coined ‘Make My Day’ or MMD for short - is akin to the previously published BM3D denoising algorithm. MMD denoises the three (visible - NIR) differential images to then add back the original NIR image. It not only effectively reduces the noise but also includes the texture and edge information in the high spatial frequency range. MMD outperforms other state-of-art denoising methods in terms of PSNR, texture quality, and color fidelity. We publish our codes and images.

...read moreread less

Journal Article•DOI•

An Elastic Deformation Field Model for Object Detection and Tracking

[...]

Marco Pedersoli¹, Radu Timofte¹, Tinne Tuytelaars¹, Luc Van Gool¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 Jan 2015-International Journal of Computer Vision

TL;DR: Experiments show that the deformation field can better approximate real object deformations and therefore, for certain classes, produces even better detection accuracy than state-of-the-art DPM.

...read moreread less

Abstract: Deformable Parts Models (DPM) are the current state-of-the-art for object detection. Nevertheless they seem sub-optimal in the representation of deformations. Object deformations are often continuous and not confined to big parts. Therefore we propose to replace the DPM star model based on big parts by a deformation field. This consists of a grid of small parts connected with pairwise constraints which can better handle continuous deformations. The naive application of this model for object detection would consist of a bounded sliding window approach: for each possible location of the image the best part configuration within a limited bound around this location is found. This is computationally very expensive.Instead, we propose a different inference procedure, where an iterative image-level search finds the best object hypothesis. We show that this approach is faster than bounded sliding windows yet produces comparable accuracy. Experiments further show that the deformation field can better approximate real object deformations and therefore, for certain classes, produces even better detection accuracy than state-of-the-art DPM. Finally, the same approach is adapted to model-free tracking, showing improved accuracy also in this case.

...read moreread less

Proceedings Article•DOI•

Robust aerial object tracking in images with lens flare

[...]

Andreas Nussberger¹, Helmut Grabner¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

26 May 2015

TL;DR: An efficient method to detect lens flares within aerial images based on the position of the sun with respect to the observer is presented and this approach is able to compensate for errors in the parameters influencing the calculation of the lens flare direction.

...read moreread less

Abstract: The goal of integrating drones into the civil airspace requires a technical system which robustly detects, tracks and finally avoids aerial objects. Electro-optical cameras have proven to be an adequate sensor to detect traffic, especially for smaller aircraft, gliders or paragliders. However the very challenging environmental conditions and image artifacts such as lens flares often result in a high number of false detections. Depending on the solar radiation lens flares are very common in aerial images and hard to distinguish from aerial objects on a collision course due to their similar size, shape, brightness and trajectories. In this paper we present an efficient method to detect lens flares within aerial images based on the position of the sun with respect to the observer. Using the date, time, position and attitude of the observer we predict the lens flare direction within the image. Once the direction is known the position, size and shape of the lens flares are extracted. Experiments show that our approach is able to compensate for errors in the parameters influencing the calculation of the lens flare direction. We further integrate the lens flare detection into an aerial object tracking framework. A detailed evaluation of the framework with and without lens flare filter shows that false tracks due to lens flares are successfully suppressed without degrading the overall tracking system performance.

...read moreread less

Proceedings Article•DOI•

Learned Collaborative Representations for Image Classification

[...]

Jiqing Wu¹, Radu Timofte¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

05 Jan 2015

TL;DR: This paper proposes a learned collaborative representation based classifier (LCRC) based on the fixed point theorem and uses a weights formulation similar to WCRC as the starting point and shows that the learning procedure is stable and convergent, and that LCRC is able to improve in performance over CRC and WCRC, while keeping the same computational efficiency at test.

...read moreread less

Abstract: The collaborative representation-based classifier (CRC) is proposed as an alternative to the sparse representation based classifier (SRC) for image face recognition. CRC solves an l2-regularized least squares formulation, with algebraic solution, while SRC optimizes over an I1-regularized least squares problem. As an extension of CRC, the weighted collaborative representation-based classifier (WCRC) is further proposed. The weights in WCRC are picked intuitively, it remains unclear why such choice of weights works and how we optimize those weights. In this paper, we propose a learned collaborative representation based classifier (LCRC) and attempt to answer the above questions. Our learning technique is based on the fixed point theorem and we use a weights formulation similar to WCRC as the starting point. Through extensive experiments on face datasets we show that the learning procedure is stable and convergent, and that LCRC is able to improve in performance over CRC and WCRC, while keeping the same computational efficiency at test.

...read moreread less

Proceedings Article•DOI•

A Gaussian Process Latent Variable Model for BRDF Inference

[...]

Stamatios Georgoulis¹, Vincent Vanweddingen¹, Marc Proesmans¹, Luc Van Gool²•Institutions (2)

Katholieke Universiteit Leuven¹, ETH Zurich²

07 Dec 2015

TL;DR: This paper proposes a novel method to infer the higher dimensional properties of the material's BRDF, based on the statistical distribution of known material characteristics observed in real-life samples, and evaluates the method based on a large set of experiments generated from real-world BRDFs and newly measured materials.

...read moreread less

Abstract: The problem of estimating a full BRDF from partial observations has already been studied using either parametric or non-parametric approaches. The goal in each case is to best match this sparse set of input measurements. In this paper we address the problem of inferring higher order reflectance information starting from the minimal input of a single BRDF slice. We begin from the prototypical case of a homogeneous sphere, lit by a head-on light source, which only holds information about less than 0.001% of the whole BRDF domain. We propose a novel method to infer the higher dimensional properties of the material's BRDF, based on the statistical distribution of known material characteristics observed in real-life samples. We evaluated our method based on a large set of experiments generated from real-world BRDFs and newly measured materials. Although inferring higher dimensional BRDFs from such modest training is not a trivial problem, our method performs better than state-of-the-art parametric, semi-parametric and non-parametric approaches. Finally, we discuss interesting applications on material re-lighting, and flash-based photography.

...read moreread less

Proceedings Article•DOI•

DLDR: Deep Linear Discriminative Retrieval for Cultural Event Classification from a Single Image

[...]

Rasmus Rothe¹, Radu Timofte¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

07 Dec 2015

TL;DR: This paper uses convolutional neural networks with VGG-16 architecture, pretrained on ImageNet or the Places205 dataset for image classification, and fine-tuned on cultural events data to solve the classification of cultural events from a single image with a deep learning based method.

...read moreread less

Abstract: In this paper we tackle the classification of cultural events from a single image with a deep learning based method. We use convolutional neural networks (CNNs) with VGG-16 architecture [17], pretrained on ImageNet or the Places205 dataset for image classification, and fine-tuned on cultural events data. CNN features are robustly extracted at 4 different layers in each image. At each layer Linear Discriminant Analysis (LDA) is employed for discriminative dimensionality reduction. An image is represented by the concatenated LDA-projected features from all layers or by the concatenation of CNN pooled features at each layer. The classification is then performed through the Iterative Nearest Neighbors-based Classifier (INNC) [20]. Classification scores are obtained for different image representation setups at train and test. The average of the scores is the output of our deep linear discriminative retrieval (DLDR) system. With 0.80 mean average precision (mAP) DLDR is a top entry for the ChaLearn LAP 2015 cultural event recognition challenge.

...read moreread less

Book Chapter•DOI•

Segmentation Using SubMarkov Random Walk

[...]

Xingping Dong¹, Jianbing Shen¹, Luc Van Gool²•Institutions (2)

Beijing Institute of Technology¹, ETH Zurich²

13 Jan 2015

TL;DR: This paper unify the proposed subRW and the other popular random walk algorithms, and design a subRW algorithm with label prior to solve the segmentation problem of objects with thin and elongated parts.

...read moreread less

Abstract: In this paper, we propose a subMarkov random walk (subRW) with the label prior with added auxiliary nodes for seeded image segmentation. We unify the proposed subRW and the other popular random walk algorithms. This unifying view can transfer the intrinsic findings between different random walk algorithms, and offer the new ideas for designing the novel random walk algorithms by changing the auxiliary nodes. According to the second benefit, we design a subRW algorithm with label prior to solve the segmentation problem of objects with thin and elongated parts. The experimental results on natural images with twigs demonstrate that our algorithm achieves better performance than the previous random walk algorithms.

...read moreread less

Proceedings Article•

Automatic handwritten mensural notation interpreter: From manuscript to MIDI performance

[...]

Yu-Hui Huang¹, Xuan-Li Chen², Serafina Beck, David Burn, Luc Van Gool³ - Show less +1 more•Institutions (3)

RWTH Aachen University¹, Technische Universität München², ETH Zurich³

01 Jan 2015

TL;DR: A novel automatic recognition framework for hand-written mensural music which takes a scanned manuscript as input and yields as output modern music scores, and works as a complete pipeline which integrates both recognition and transcription.

...read moreread less

Abstract: This paper presents a novel automatic recognition framework for hand-written mensural music. It takes a scanned manuscript as input and yields as output modern music scores. Compared to the previous mensural Optical Music Recognition (OMR) systems, ours shows not only promising performance in music recognition, but also works as a complete pipeline which integrates both recognition and transcription. There are three main parts in this pipeline: i) region-ofinterest detection, ii) music symbol detection and classification, and iii) transcription to modern music. In addition to the output in modern notation, our system can generate a MIDI file as well. It provides an easy platform for the musicologists to analyze old manuscripts. Moreover, it renders these valuable cultural heritage resources available to non-specialists as well, as they can now access such ancient music in a better understandable form.

...read moreread less

Proceedings Article•DOI•

Efficient regression priors for post-processing demosaiced images

[...]

Jiqing Wu¹, Radu Timofte¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

10 Dec 2015

TL;DR: This paper proposes an efficient novel post-processing algorithm based on the adjusted anchored neighborhood regression (A+) method from image super-resolution literature that greatly improves the results of the demosaicing methods, and achieves image quality as competitive as SAPCA but orders of magnitude faster.

...read moreread less

Abstract: Color demosaicing is a process of reconstructing lost pixels in an incomplete color image. By extracting spatial-spectral correlations of RGB channels various interpolation methods have been proposed with low computational complexity. Meanwhile, optimization strategies such as sparsity and adaptive PCA based algorithm (SAPCA) were developed. SAPCA outperforms many interpolation techniques by impressive margins at the cost of dramatically increasing the computational time. In this paper we propose an efficient novel post-processing algorithm based on the adjusted anchored neighborhood regression (A+) method from image super-resolution literature. We greatly improve the results of the demosaicing methods, and achieve image quality as competitive as SAPCA but orders of magnitude faster.

...read moreread less

Proceedings Article•DOI•

Discovery of Sets of Mutually Orthogonal Vanishing Points in Videos

[...]

Till Kroeger, Dengxin Dai, Radu Timofte, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

06 Jan 2015

TL;DR: This paper proposes a novel method for detecting and tracking groups of mutually orthogonal vanishing points (MOVP), also known as Manhattan frames, jointly from monocular videos, and shows that the method outperforms greedy MOVP tracking method considerably.

...read moreread less

Abstract: While vanishing point (VP) estimation has received extensive attention, most approaches focus on static images or perform detection and tracking separately. In this paper, we focus on man-made environments and propose a novel method for detecting and tracking groups of mutually orthogonal vanishing points (MOVP), also known as Manhattan frames, jointly from monocular videos. The method is unique in that it is designed to enforce orthogonality in groups of VPs, temporal consistency of each individual MOVP, and orientation consistency of all putative MOVP. To this end, the method consists of three steps: 1) proposal of MOVP candidates by directly incorporating mutual orthogonality; 2) extracting consistent tracks of MOVPs by minimizing the flow cost over a network where nodes are putative MOVPs and edges are putative links across time; and 3) refinement of all MOVPs by enforcing consistency between lines, their identified vanishing directions and consistency of global camera orientation. The method is evaluated on six newly collected and annotated videos of urban scenes. Extensive experiments show that the method outperforms greedy MOVP tracking method considerably. In addition, we also test the method for camera orientation estimation and show that it obtains very promising results on a challenging street-view dataset.

...read moreread less

Proceedings Article•DOI•

Discriminative learning of apparel features

[...]

Rasmus Rothe¹, Marko Ristin¹, Matthias Dantone¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

18 May 2015

TL;DR: This work proposes to learn discriminative features based on a small set of annotated images to organize product databases by image classification, which allows for fast feature extraction and training, is easy to implement and does not require powerful dedicated hardware.

...read moreread less

Abstract: Fashion is a major segment in e-commerce with growing importance and a steadily increasing number of products. Since manual annotation of apparel items is very tedious, the product databases need to be organized automatically, e.g. by image classification. Common image classification approaches are based on features engineered for general purposes which perform poorly on specific images of apparel. We therefore propose to learn discriminative features based on a small set of annotated images. We experimentally evaluate our method on a dataset with 30,000 images containing apparel items, and compare it to other engineered and learned sets of features. The classification accuracy of our features is significantly superior to designed HOG and SIFT features (43.7% and 16.1% relative improvement, respectively). Our method allows for fast feature extraction and training, is easy to implement and, unlike deep convolutional networks, does not require powerful dedicated hardware.

...read moreread less

Real-time Photometric Stereo

[...]

Andrew Pevar, Lieven Verswyvel, Stam Georgoulis, Nico Cornelis, Marc Proesmans, Luc Van Gool - Show less +2 more

25 Sep 2015

TL;DR: A novel GPU-accelerated implementation that calculates the shape normals, as well as the albedo and ambient lighting through the Photometric Stereo technique, providing to users the ability for real-time feedback on the recording process, thereby altering the way in which dome-shaped devices can be used.

...read moreread less

Abstract: Dome-shaped devices consisting of a single digital camera and multiple light sources have been used in the past for the 3D scanning of objects. They leverage Photometric Stereo techniques in order to build detailed 3D models of these objects. Their advantage is that they can pick up even subtle details of the shape. Yet, these systems typically suffer from high recording and processing times. This paper introduces a novel GPU-accelerated implementation that calculates the shape normals, as well as the albedo and ambient lighting through the Photometric Stereo technique, providing to users the ability for real-time feedback on the recording process. An originally serial algorithm was mapped to the architecture of an NVIDIA GPU and the CUDA programming platform. To maximize performance, various optimizations were applied, like reducing the total amount of memory accesses, coalescing the memory accesses into the minimal number of transactions, reducing register usage to avoid spilling, hiding latency and maximizing thread occupancy. Our method reduces the processing time, accelerating the original implementation by a factor of 950, thereby altering the way in which such devices can be used.

...read moreread less