scispace - formally typeset
Search or ask a question
Author

Luc Van Gool

Other affiliations: Microsoft, ETH Zurich, Politehnica University of Timișoara  ...read more
Bio: Luc Van Gool is an academic researcher from Katholieke Universiteit Leuven. The author has contributed to research in topics: Computer science & Object detection. The author has an hindex of 133, co-authored 1307 publications receiving 107743 citations. Previous affiliations of Luc Van Gool include Microsoft & ETH Zurich.


Papers
More filters
Journal ArticleDOI
TL;DR: This work presents a 3-D measurement technique capable of optically measuring microchip devices using a camera-projector system and improves the dynamic range of the imaging system through the use of a set of gray-code and phase-shift measures with different CCD integration times.
Abstract: The industry dealing with microchip inspection requires fast, flexible, repeatable, and stable 3-D measuring systems. The typical devices used for this purpose are coordinate measurement machines (CMMs). These systems have limitations such as high cost, low measurement speed, and small quantity of measured 3-D points. Now optical techniques are beginning to replace the typical touch probes because of their noncontact nature, their full-field measurement capability, their high measurement density, as well as their low cost and high measurement speed. However, typical properties of microchip devices, which include a strongly spatially varying reflectance, make impossible the direct use of the classical optical 3-D measurement techniques. We present a 3-D measurement technique capable of optically measuring these devices using a camera-projector system. The proposed method improves the dynamic range of the imaging system through the use of a set of gray-code (GC) and phase- shift (PS) measures with different CCD integration times. A set of extended-range GC and PS images are obtained and used to acquire a dense 3-D measure of the object. We measure the 3-D shape of an integrated circuit and obtained satisfactory results.

17 citations

Posted Content
TL;DR: This work develops a sensor setup that provides data for a 360-degree view of the area surrounding the vehicle, the driving route to the destination, and the low-level driving maneuvers by human drivers, and learns a novel driving model by integrating information from the surround-view cameras and the route planner.
Abstract: For people, having a rear-view mirror and side-view mirrors is vital for safe driving. They deliver a better view of what happens around the car. Human drivers also heavily exploit their mental map for navigation. Nonetheless, several methods have been published that learn driving models with only a front-facing camera and without a route planner. This lack of information renders the self-driving task quite intractable. Hence, we investigate the problem with a more realistic setting, which consists of a surround-view camera system with eight cameras, a route planner, and a CAN bus reader. In particular, we develop a sensor setup that provides data for a 360-degree view of the area surrounding the vehicle, the driving route to the destination, and the low-level driving maneuvers (e.g. steering angle and speed) by human drivers. With such sensor setup we collect a new driving dataset, covering diverse driving scenarios and varying weather/illumination conditions. Finally, we learn a novel driving model by integrating information from the surround-view cameras and the route planner. Two route planners are exploited: one based on OpenStreetMap and the other on TomTom Maps. The route planners are exploited in two ways: 1) by representing the planned routes as a stack of GPS coordinates, and 2) by rendering the planned routes on a map and recording the progression into a video. Our experiments show that: 1) 360-degree surround-view cameras help avoid failures made with a single front-view camera for the driving task; and 2) a route planner helps the driving task significantly. We acknowledge that our method is not the best-ever driving model, but that is not our focus. Rather, it provides a strong basis for further academic research, especially on driving relevant tasks by integrating information from street-view images and the planned driving routes. Code and data will be made available.

17 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: The algorithm that is proposed - coined `Make My Day' or MMD for short - is akin to the previously published BM3D denoising algorithm and outperforms other state-of-art Denoising methods in terms of PSNR, texture quality, and color fidelity.
Abstract: We address the task of restoring RGB images taken under low illumination (e.g. night time), when an aligned near infrared (NIR or simply N) image taken under stronger NIR illumination is available. Such restoration holds the promise that algorithms designed to work under daylight conditions could be used around the clock. Increasingly, RGBN cameras are becoming available, as car cameras tend to include a Near-Infrared (N) band, next to R, G, and B bands, and NIR artificial lighting is applied. Under low lighting conditions, the NIR band is less noisy than the others and this is all the more the case if stronger illumination is only available in the NIR band. We address the task of restoring the R, G, and B bands on the basis of the NIR band in such cases. Even if the NIR band is less strongly correlated with the R, G, and B bands than these bands are mutually, there is sufficient such correlation to pick up important textural and gradient information in the NIR band and inject it into the others. The algorithm that we propose - coined ‘Make My Day’ or MMD for short - is akin to the previously published BM3D denoising algorithm. MMD denoises the three (visible - NIR) differential images to then add back the original NIR image. It not only effectively reduces the noise but also includes the texture and edge information in the high spatial frequency range. MMD outperforms other state-of-art denoising methods in terms of PSNR, texture quality, and color fidelity. We publish our codes and images.

17 citations

Posted Content
TL;DR: The proposed Efficient Video Segmentation (EVS) pipeline achieves accuracy levels competitive to the existing real-time methods for semantic image segmentation (mIoU above 60%), while achieving much higher frame rates.
Abstract: This paper tackles the problem of real-time semantic segmentation of high definition videos using a hybrid GPU / CPU approach. We propose an Efficient Video Segmentation(EVS) pipeline that combines: (i) On the CPU, a very fast optical flow method, that is used to exploit the temporal aspect of the video and propagate semantic information from one frame to the next. It runs in parallel with the GPU. (ii) On the GPU, two Convolutional Neural Networks: A main segmentation network that is used to predict dense semantic labels from scratch, and a Refiner that is designed to improve predictions from previous frames with the help of a fast Inconsistencies Attention Module (IAM). The latter can identify regions that cannot be propagated accurately. We suggest several operating points depending on the desired frame rate and accuracy. Our pipeline achieves accuracy levels competitive to the existing real-time methods for semantic image segmentation(mIoU above 60%), while achieving much higher frame rates. On the popular Cityscapes dataset with high resolution frames (2048 x 1024), the proposed operating points range from 80 to 1000 Hz on a single GPU and CPU.

17 citations

Journal ArticleDOI
TL;DR: In this paper, the authors define a formal framework for the representation and processing of incongruent events and derive algorithms to detect these events from different types of hierarchies, different applications and a variety of data types.
Abstract: Unexpected stimuli are a challenge to any machine learning algorithm. Here, we identify distinct types of unexpected events when general-level and specific-level classifiers give conflicting predictions. We define a formal framework for the representation and processing of incongruent events: Starting from the notion of label hierarchy, we show how partial order on labels can be deduced from such hierarchies. For each event, we compute its probability in different ways, based on adjacent levels in the label hierarchy. An incongruent event is an event where the probability computed based on some more specific level is much smaller than the probability computed based on some more general level, leading to conflicting predictions. Algorithms are derived to detect incongruent events from different types of hierarchies, different applications, and a variety of data types. We present promising results for the detection of novel visual and audio objects, and new patterns of motion in video. We also discuss the detection of Out-Of-Vocabulary words in speech recognition, and the detection of incongruent events in a multimodal audiovisual scenario.

17 citations


Cited by
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Proceedings Article
04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

55,235 citations

Proceedings Article
01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

49,914 citations

Posted Content
TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

44,703 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

40,257 citations