3D Scene priors for road detection

doi:10.1109/CVPR.2010.5540228

Home
/
Papers
/
3D Scene priors for road detection

Proceedings Article•DOI•

3D Scene priors for road detection

Jose M. Alvarez¹, Theo Gevers², Antonio M. López¹•Institutions (2)

Autonomous University of Barcelona¹, University of Amsterdam²

13 Jun 2010-pp 57-64

TL;DR: The low-level, contextual and temporal cues are combined in a Bayesian framework to classify road sequences and the proposed method provides highest road detection accuracy when compared to state–of–the–art methods.

read less

Abstract: Vision–based road detection is important in different areas of computer vision such as autonomous driving, car collision warning and pedestrian crossing detection. However, current vision–based road detection methods are usually based on low–level features and they assume structured roads, road homogeneity, and uniform lighting conditions. Therefore, in this paper, contextual 3D information is used in addition to low–level cues. Low–level photometric invariant cues are derived from the appearance of roads. Contextual cues used include horizon lines, vanishing points, 3D scene layout and 3D road stages. Moreover, temporal road cues are included. All these cues are sensitive to different imaging conditions and hence are considered as weak cues. Therefore, they are combined to improve the overall performance of the algorithm. To this end, the low-level, contextual and temporal cues are combined in a Bayesian framework to classify road sequences. Large scale experiments on road sequences show that the road detection method is robust to varying imaging conditions, road types, and scenarios (tunnels, urban and highway). Further, using the combined cues outperforms all other individual cues. Finally, the proposed method provides highest road detection accuracy when compared to state–of–the–art methods.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

A new performance measure and evaluation benchmark for road detection algorithms

[...]

Jannik Fritsch¹, Tobias Kühnl¹, Andreas Geiger²•Institutions (2)

Honda¹, Max Planck Society²

01 Oct 2013

TL;DR: A novel, behavior-based metric which judges the utility of the extracted ego-lane area for driver assistance applications by fitting a driving corridor to the road detection results in the BEV is proposed.

...read moreread less

Abstract: Detecting the road area and ego-lane ahead of a vehicle is central to modern driver assistance systems. While lane-detection on well-marked roads is already available in modern vehicles, finding the boundaries of unmarked or weakly marked roads and lanes as they appear in inner-city and rural environments remains an unsolved problem due to the high variability in scene layout and illumination conditions, amongst others. While recent years have witnessed great interest in this subject, to date no commonly agreed upon benchmark exists, rendering a fair comparison amongst methods difficult. In this paper, we introduce a novel open-access dataset and benchmark for road area and ego-lane detection. Our dataset comprises 600 annotated training and test images of high variability from the KITTI autonomous driving project, capturing a broad spectrum of urban road scenes. For evaluation, we propose to use the 2D Bird's Eye View (BEV) space as vehicle control usually happens in this 2D world, requiring detection results to be represented in this very same space. Furthermore, we propose a novel, behavior-based metric which judges the utility of the extracted ego-lane area for driver assistance applications by fitting a driving corridor to the road detection results in the BEV. We believe this to be important for a meaningful evaluation as pixel-level performance is of limited value for vehicle control. State-of-the-art road detection algorithms are used to demonstrate results using classical pixel-level metrics in perspective and BEV space as well as the novel behavior-based performance measure. All data and annotations are made publicly available on the KITTI online evaluation website in order to serve as a common benchmark for road terrain detection algorithms.

...read moreread less

608 citations

Cites background or methods from "3D Scene priors for road detection"

...Consequently, many approaches put higher emphasis on appearance cues such as the color and texture of the road area [5], [6], [7], [8], [9], [10], [11], [12]....
[...]
...These visual properties of the road area have been used for estimating the overall road shape [12] or for segmenting the complete road area [5], [6], [8], [10]....
[...]
...Metrics include the classical true positive (TP) and false positive (FP) rates on the pixel/patch level [20], [21], [22], the accuracy [6] as well as precision/recall and the derived F-measure [7], [10], [18]....
[...]
...Similar to [7], [10], [18], we employ the F-measure derived from the precision and recall values (Eq....
[...]
...These baselines can be viewed as scene priors similar to the one used as input to [10]....
[...]

Book•

Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art

[...]

Joel Janai¹, Fatma Güney², Aseem Behl¹, Andreas Geiger¹•Institutions (2)

Max Planck Society¹, Koç University²

03 Jul 2020

TL;DR: This survey includes both the historically most relevant literature as well as the current state of the art on several specific topics, including recognition, reconstruction, motion estimation, tracking, scene understanding, and end-to-end learning for autonomous driving.

...read moreread less

Abstract: Recent years have witnessed enormous progress in AI-related fields such as computer vision, machine learning, and autonomous vehicles. As with any rapidly growing field, it becomes increasingly difficult to stay up-to-date or enter the field as a beginner. While several survey papers on particular sub-problems have appeared, no comprehensive survey on problems, datasets, and methods in computer vision for autonomous vehicles has been published. This monograph attempts to narrow this gap by providing a survey on the state-of-the-art datasets and techniques. Our survey includes both the historically most relevant literature as well as the current state of the art on several specific topics, including recognition, reconstruction, motion estimation, tracking, scene understanding, and end-to-end learning for autonomous driving. Towards this goal, we analyze the performance of the state of the art on several challenging benchmarking datasets, including KITTI, MOT, and Cityscapes. Besides, we discuss open problems and current research challenges. To ease accessibility and accommodate missing references, we also provide a website that allows navigating topics as well as methods and provides additional information.

...read moreread less

579 citations

Additional excerpts

...Alvarez et al. (2010) propose a Bayesian framework to clas-...
[...]

Journal Article•DOI•

3D Traffic Scene Understanding From Movable Platforms

[...]

Andreas Geiger, Martin Lauer¹, Christian Wojek, Christoph Stiller¹, Raquel Urtasun² - Show less +1 more•Institutions (2)

Karlsruhe Institute of Technology¹, Toyota²

01 May 2014-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel probabilistic generative model for multi-object traffic scene understanding from movable platforms which reasons jointly about the 3D scene layout as well as the location and orientation of objects in the scene is presented.

...read moreread less

Abstract: In this paper, we present a novel probabilistic generative model for multi-object traffic scene understanding from movable platforms which reasons jointly about the 3D scene layout as well as the location and orientation of objects in the scene. In particular, the scene topology, geometry, and traffic activities are inferred from short video sequences. Inspired by the impressive driving capabilities of humans, our model does not rely on GPS, lidar, or map knowledge. Instead, it takes advantage of a diverse set of visual cues in the form of vehicle tracklets, vanishing points, semantic scene labels, scene flow, and occupancy grids. For each of these cues, we propose likelihood functions that are integrated into a probabilistic generative model. We learn all model parameters from training data using contrastive divergence. Experiments conducted on videos of 113 representative intersections show that our approach successfully infers the correct layout in a variety of very challenging scenarios. To evaluate the importance of each feature cue, experiments using different feature combinations are conducted. Furthermore, we show how by employing context derived from the proposed method we are able to improve over the state-of-the-art in terms of object detection and object orientation estimation in challenging and cluttered urban environments.

...read moreread less

453 citations

Posted Content•

Computer Vision for Autonomous Vehicles: Problems, Datasets and State of the Art

[...]

Joel Janai¹, Fatma Güney², Aseem Behl¹, Andreas Geiger¹•Institutions (2)

Max Planck Society¹, Koç University²

18 Apr 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the authors provide a survey on the state-of-the-art datasets and techniques for autonomous driving, including recognition, reconstruction, motion estimation, tracking, scene understanding, and end-to-end learning.

...read moreread less

Abstract: Recent years have witnessed enormous progress in AI-related fields such as computer vision, machine learning, and autonomous vehicles. As with any rapidly growing field, it becomes increasingly difficult to stay up-to-date or enter the field as a beginner. While several survey papers on particular sub-problems have appeared, no comprehensive survey on problems, datasets, and methods in computer vision for autonomous vehicles has been published. This book attempts to narrow this gap by providing a survey on the state-of-the-art datasets and techniques. Our survey includes both the historically most relevant literature as well as the current state of the art on several specific topics, including recognition, reconstruction, motion estimation, tracking, scene understanding, and end-to-end learning for autonomous driving. Towards this goal, we analyze the performance of the state of the art on several challenging benchmarking datasets, including KITTI, MOT, and Cityscapes. Besides, we discuss open problems and current research challenges. To ease accessibility and accommodate missing references, we also provide a website that allows navigating topics as well as methods and provides additional information.

...read moreread less

114 citations

Posted Content•

Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs

[...]

Vikash K. Mansinghka¹, Tejas Kulkarni¹, Yura Perov², Josh Tenenbaum¹•Institutions (2)

Massachusetts Institute of Technology¹, Siberian Federal University²

29 Jun 2013-arXiv: Artificial Intelligence

TL;DR: In this article, the authors present a generative probabilistic graphics program for reading sequences of degraded and adversarially obscured alphanumeric characters and inferring 3D road models from vehicle-mounted camera images.

...read moreread less

Abstract: The idea of computer vision as the Bayesian inverse problem to computer graphics has a long history and an appealing elegance, but it has proved difficult to directly implement. Instead, most vision tasks are approached via complex bottom-up processing pipelines. Here we show that it is possible to write short, simple probabilistic graphics programs that define flexible generative models and to automatically invert them to interpret real-world images. Generative probabilistic graphics programs consist of a stochastic scene generator, a renderer based on graphics software, a stochastic likelihood model linking the renderer's output and the data, and latent variables that adjust the fidelity of the renderer and the tolerance of the likelihood model. Representations and algorithms from computer graphics, originally designed to produce high-quality images, are instead used as the deterministic backbone for highly approximate and stochastic generative models. This formulation combines probabilistic programming, computer graphics, and approximate Bayesian computation, and depends only on general-purpose, automatic inference techniques. We describe two applications: reading sequences of degraded and adversarially obscured alphanumeric characters, and inferring 3D road models from vehicle-mounted camera images. Each of the probabilistic graphics programs we present relies on under 20 lines of probabilistic code, and supports accurate, approximately Bayesian inferences about ambiguous real-world images.

...read moreread less

83 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Adaptive background mixture models for real-time tracking

[...]

Chris Stauffer¹, W.E.L. Grimson¹•Institutions (1)

Massachusetts Institute of Technology¹

23 Jun 1999

TL;DR: This paper discusses modeling each pixel as a mixture of Gaussians and using an on-line approximation to update the model, resulting in a stable, real-time outdoor tracker which reliably deals with lighting changes, repetitive motions from clutter, and long-term scene changes.

...read moreread less

Abstract: A common method for real-time segmentation of moving regions in image sequences involves "background subtraction", or thresholding the error between an estimate of the image without moving objects and the current image. The numerous approaches to this problem differ in the type of background model used and the procedure used to update the model. This paper discusses modeling each pixel as a mixture of Gaussians and using an on-line approximation to update the model. The Gaussian, distributions of the adaptive mixture model are then evaluated to determine which are most likely to result from a background process. Each pixel is classified based on whether the Gaussian distribution which represents it most effectively is considered part of the background model. This results in a stable, real-time outdoor tracker which reliably deals with lighting changes, repetitive motions from clutter, and long-term scene changes. This system has been run almost continuously for 16 months, 24 hours a day, through rain and snow.

...read moreread less

7,660 citations

"3D Scene priors for road detection" refers methods in this paper

...To this end, an exponentially weighted moving average (EWMA) [20, 21] is used to express the dynamic structure of the data (previously detected road)....
[...]
...EWMA uses a decay factor that weighs the influence of each past result....
[...]
...Further, EWMA assumes that the road detected in the current frame is correlated (similar) to the road detected in previous frames....
[...]
...Using EWMA, the weights are computed as follows: E[p(xi = R)t ] = 1 ∑Tj=1 λ j−1 T ∑ j=1 λ j−1 p(xi = R)(t− j), (1) where E[p(xi = R)t ] is the expected probability of a pixel being a road at discrete instant time t (current frame) and p(xi = R)(t− j) is the probability of a pixel being a road j frames before the frame being analyzed....
[...]

Journal Article•DOI•

Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

[...]

Aude Oliva¹, Antonio Torralba²•Institutions (2)

Brigham and Women's Hospital¹, Carleton College²

01 May 2001-International Journal of Computer Vision

TL;DR: The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.

...read moreread less

Abstract: In this paper, we propose a computational model of the recognition of real world scenes that bypasses the segmentation and the processing of individual objects or regions. The procedure is based on a very low dimensional representation of the scene, that we term the Spatial Envelope. We propose a set of perceptual dimensions (naturalness, openness, roughness, expansion, ruggedness) that represent the dominant spatial structure of a scene. Then, we show that these dimensions may be reliably estimated using spectral and coarsely localized information. The model generates a multidimensional space in which scenes sharing membership in semantic categories (e.g., streets, highways, coasts) are projected closed together. The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.

...read moreread less

6,882 citations

"3D Scene priors for road detection" refers methods in this paper

...This method estimates the horizon line by applying non–linear mixtures of linear regressors to the description of an image obtained using gist descriptors [13]....
[...]

Journal Article•DOI•

On combining classifiers

[...]

Josef Kittler¹, M. Hatef², Robert P. W. Duin³, Jiri Matas¹•Institutions (3)

University of Surrey¹, ERA Technology Ltd², Delft University of Technology³

01 Mar 1998-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A common theoretical framework for combining classifiers which use distinct pattern representations is developed and it is shown that many existing schemes can be considered as special cases of compound classification where all the pattern representations are used jointly to make a decision.

...read moreread less

Abstract: We develop a common theoretical framework for combining classifiers which use distinct pattern representations and show that many existing schemes can be considered as special cases of compound classification where all the pattern representations are used jointly to make a decision. An experimental comparison of various classifier combination schemes demonstrates that the combination rule developed under the most restrictive assumptions-the sum rule-outperforms other classifier combinations schemes. A sensitivity analysis of the various schemes to estimation errors is carried out to show that this finding can be justified theoretically.

...read moreread less

5,670 citations

Additional excerpts

...In general, combining multiple classifiers is a powerful technique to improve the performance of single classifiers [9, 7]....
[...]

Journal Article•DOI•

Combining Pattern Classifiers: Methods and Algorithms

[...]

Subhash C Bagui

01 Nov 2005-Technometrics

TL;DR: This chapter discusses the development of the Spatial Point Pattern Analysis Code in S–PLUS, which was developed in 1993 by P. J. Diggle and D. C. Griffith.

...read moreread less

Abstract: (2005). Combining Pattern Classifiers: Methods and Algorithms. Technometrics: Vol. 47, No. 4, pp. 517-518.

...read moreread less

3,933 citations

"3D Scene priors for road detection" refers methods in this paper

...Finally, the proposed method provides highest road detection accuracy when compared to state–of–the–art methods....
[...]

Proceedings Article•DOI•

A Bayesian hierarchical model for learning natural scene categories

[...]

Li Fei-Fei¹, Pietro Perona¹•Institutions (1)

California Institute of Technology¹

20 Jun 2005

TL;DR: This work proposes a novel approach to learn and recognize natural scene categories by representing the image of a scene by a collection of local regions, denoted as codewords obtained by unsupervised learning.

...read moreread less

Abstract: We propose a novel approach to learn and recognize natural scene categories. Unlike previous work, it does not require experts to annotate the training set. We represent the image of a scene by a collection of local regions, denoted as codewords obtained by unsupervised learning. Each region is represented as part of a "theme". In previous work, such themes were learnt from hand-annotations of experts, while our method learns the theme distributions as well as the codewords distribution over the themes without supervision. We report satisfactory categorization performances on a large set of 13 categories of complex scenes.

...read moreread less

3,920 citations