scispace - formally typeset
Search or ask a question
Author

Luc Van Gool

Other affiliations: Microsoft, ETH Zurich, Politehnica University of Timișoara  ...read more
Bio: Luc Van Gool is an academic researcher from Katholieke Universiteit Leuven. The author has contributed to research in topics: Computer science & Object detection. The author has an hindex of 133, co-authored 1307 publications receiving 107743 citations. Previous affiliations of Luc Van Gool include Microsoft & ETH Zurich.


Papers
More filters
Journal ArticleDOI
28 Jun 2010
TL;DR: A new lightweight grammar representation is proposed that compactly encodes facade structures and allows fast per‐pixel access and is called F‐shade, a prototype rendering system that renders an urban model from the compact representation directly on the GPU.
Abstract: In this paper we propose a real-time rendering approach for procedural cities. Our first contribution is a new lightweight grammar representation that compactly encodes facade structures and allows fast per-pixel access. We call this grammar F-shade. Our second contribution is a prototype rendering system that renders an urban model from the compact representation directly on the GPU. Our suggested approach explores an interesting connection from procedural modeling to real-time rendering. Evaluating procedural descriptions at render time uses less memory than the generation of intermediate geometry. This enables us to render large urban models directly from GPU memory.

41 citations

Book ChapterDOI
16 Sep 2002
TL;DR: Several successfully reconstructed complex roof structures corroborate the potential of the model-based reconstruction of complex polyhedral building roofs and infer missing parts of the roof model by invoking the geometric regime once more.
Abstract: This paper investigates into model-based reconstruction of complex polyhedral building roofs. A roof is modelled as a structured ensemble of planar polygonal faces. The modelling is done in two different regimes. One focuses on geometry, whereas the other is ruled by semantics. Inside the geometric regime, 3D line segments are grouped into planes and further into faces using a Bayesian analysis. In the second regime, the preliminary geometric models are subject to a semantic interpretation. The knowledge gained in this step is used to infer missing parts of the roof model (by invoking the geometric regime once more) and to adjust the overall roof topology. Several successfully reconstructed complex roof structures corroborate the potential of the approach.

41 citations

Book ChapterDOI
Gabriele Fanelli1, Angela Yao1, Pierre-Luc Noel1, Juergen Gall1, Luc Van Gool1 
10 Sep 2010
TL;DR: This work presents a user-independent approach for the recognition of facial expressions from image sequences based on the eye centers' locations into tracks from which features representing shape and motion are extracted.
Abstract: Automatic recognition of facial expression is a necessary step toward the design of more natural human-computer interaction systems. This work presents a user-independent approach for the recognition of facial expressions from image sequences. The faces are normalized in scale and rotation based on the eye centers' locations into tracks from which we extract features representing shape and motion. Classification and localization of the center of the expression in the video sequences are performed using a Hough transform voting method based on randomized forests. We tested our approach on two publicly available databases and achieved encouraging results comparable to the state of the art.

41 citations

Journal ArticleDOI
TL;DR: A nonmathematical account explains the basic philosophy and trade-offs underlying invariance-based research and the principles are explained for the relatively simple case of planar-object recognition under arbitrary viewpoints.
Abstract: It is remarkable how well the human visual system can cope with changing viewpoints when it comes to recognising shapes. The state of the art in machine vision is still quite remote from solving such tasks. Nevertheless, a surge in invariance-based research has led to the development of methods for solving recognition problems still considered hard until recently. A nonmathematical account explains the basic philosophy and trade-offs underlying this strand of research. The principles are explained for the relatively simple case of planar-object recognition under arbitrary viewpoints. Well-known Euclidean concepts form the basis of invariance in this case. Introducing constraints in addition to that of planarity may further simplify the invariants. On the other hand, there are problems for which no invariants exist.

41 citations

Proceedings ArticleDOI
26 Feb 2022
TL;DR: A pipeline to generate Neural Radiance Fields (NeRF) of an object or a scene of a specific class, conditioned on a single input image, based on $\pi$-GAN, a generative model for unconditional 3D-aware image synthesis, which maps random latent codes to radiance fields of a class of objects.
Abstract: We propose a pipeline to generate Neural Radiance Fields (NeRF) of an object or a scene of a specific class, conditioned on a single input image. This is a challenging task, as training NeRF requires multiple views of the same scene, coupled with corresponding poses, which are hard to obtain. Our method is based on $\pi$-GAN, a generative model for unconditional 3D-aware image synthesis, which maps random latent codes to radiance fields of a class of objects. We jointly optimize (1) the $\pi$-GAN objective to utilize its high-fidelity 3D-aware generation and (2) a carefully designed reconstruction objective. The latter includes an encoder coupled with $\pi$-GAN generator to form an autoencoder. Unlike previous few-shot NeRF approaches, our pipeline is unsupervised, capable of being trained with independent images without 3D, multi-view, or pose supervision. Applications of our pipeline include 3d avatar generation, object-centric novel view synthesis with a single input image, and 3d-aware super-resolution, to name a few.

41 citations


Cited by
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Proceedings Article
04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

55,235 citations

Proceedings Article
01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

49,914 citations

Posted Content
TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

44,703 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

40,257 citations