scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

3D Scene priors for road detection

TL;DR: The low-level, contextual and temporal cues are combined in a Bayesian framework to classify road sequences and the proposed method provides highest road detection accuracy when compared to state–of–the–art methods.
Abstract: Vision–based road detection is important in different areas of computer vision such as autonomous driving, car collision warning and pedestrian crossing detection. However, current vision–based road detection methods are usually based on low–level features and they assume structured roads, road homogeneity, and uniform lighting conditions. Therefore, in this paper, contextual 3D information is used in addition to low–level cues. Low–level photometric invariant cues are derived from the appearance of roads. Contextual cues used include horizon lines, vanishing points, 3D scene layout and 3D road stages. Moreover, temporal road cues are included. All these cues are sensitive to different imaging conditions and hence are considered as weak cues. Therefore, they are combined to improve the overall performance of the algorithm. To this end, the low-level, contextual and temporal cues are combined in a Bayesian framework to classify road sequences. Large scale experiments on road sequences show that the road detection method is robust to varying imaging conditions, road types, and scenarios (tunnels, urban and highway). Further, using the combined cues outperforms all other individual cues. Finally, the proposed method provides highest road detection accuracy when compared to state–of–the–art methods.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
01 Oct 2013
TL;DR: A novel, behavior-based metric which judges the utility of the extracted ego-lane area for driver assistance applications by fitting a driving corridor to the road detection results in the BEV is proposed.
Abstract: Detecting the road area and ego-lane ahead of a vehicle is central to modern driver assistance systems. While lane-detection on well-marked roads is already available in modern vehicles, finding the boundaries of unmarked or weakly marked roads and lanes as they appear in inner-city and rural environments remains an unsolved problem due to the high variability in scene layout and illumination conditions, amongst others. While recent years have witnessed great interest in this subject, to date no commonly agreed upon benchmark exists, rendering a fair comparison amongst methods difficult. In this paper, we introduce a novel open-access dataset and benchmark for road area and ego-lane detection. Our dataset comprises 600 annotated training and test images of high variability from the KITTI autonomous driving project, capturing a broad spectrum of urban road scenes. For evaluation, we propose to use the 2D Bird's Eye View (BEV) space as vehicle control usually happens in this 2D world, requiring detection results to be represented in this very same space. Furthermore, we propose a novel, behavior-based metric which judges the utility of the extracted ego-lane area for driver assistance applications by fitting a driving corridor to the road detection results in the BEV. We believe this to be important for a meaningful evaluation as pixel-level performance is of limited value for vehicle control. State-of-the-art road detection algorithms are used to demonstrate results using classical pixel-level metrics in perspective and BEV space as well as the novel behavior-based performance measure. All data and annotations are made publicly available on the KITTI online evaluation website in order to serve as a common benchmark for road terrain detection algorithms.

608 citations


Cites background or methods from "3D Scene priors for road detection"

  • ...Consequently, many approaches put higher emphasis on appearance cues such as the color and texture of the road area [5], [6], [7], [8], [9], [10], [11], [12]....

    [...]

  • ...These visual properties of the road area have been used for estimating the overall road shape [12] or for segmenting the complete road area [5], [6], [8], [10]....

    [...]

  • ...Metrics include the classical true positive (TP) and false positive (FP) rates on the pixel/patch level [20], [21], [22], the accuracy [6] as well as precision/recall and the derived F-measure [7], [10], [18]....

    [...]

  • ...Similar to [7], [10], [18], we employ the F-measure derived from the precision and recall values (Eq....

    [...]

  • ...These baselines can be viewed as scene priors similar to the one used as input to [10]....

    [...]

Book
03 Jul 2020
TL;DR: This survey includes both the historically most relevant literature as well as the current state of the art on several specific topics, including recognition, reconstruction, motion estimation, tracking, scene understanding, and end-to-end learning for autonomous driving.
Abstract: Recent years have witnessed enormous progress in AI-related fields such as computer vision, machine learning, and autonomous vehicles. As with any rapidly growing field, it becomes increasingly difficult to stay up-to-date or enter the field as a beginner. While several survey papers on particular sub-problems have appeared, no comprehensive survey on problems, datasets, and methods in computer vision for autonomous vehicles has been published. This monograph attempts to narrow this gap by providing a survey on the state-of-the-art datasets and techniques. Our survey includes both the historically most relevant literature as well as the current state of the art on several specific topics, including recognition, reconstruction, motion estimation, tracking, scene understanding, and end-to-end learning for autonomous driving. Towards this goal, we analyze the performance of the state of the art on several challenging benchmarking datasets, including KITTI, MOT, and Cityscapes. Besides, we discuss open problems and current research challenges. To ease accessibility and accommodate missing references, we also provide a website that allows navigating topics as well as methods and provides additional information.

579 citations


Additional excerpts

  • ...Alvarez et al. (2010) propose a Bayesian framework to clas-...

    [...]

Journal ArticleDOI
TL;DR: A novel probabilistic generative model for multi-object traffic scene understanding from movable platforms which reasons jointly about the 3D scene layout as well as the location and orientation of objects in the scene is presented.
Abstract: In this paper, we present a novel probabilistic generative model for multi-object traffic scene understanding from movable platforms which reasons jointly about the 3D scene layout as well as the location and orientation of objects in the scene. In particular, the scene topology, geometry, and traffic activities are inferred from short video sequences. Inspired by the impressive driving capabilities of humans, our model does not rely on GPS, lidar, or map knowledge. Instead, it takes advantage of a diverse set of visual cues in the form of vehicle tracklets, vanishing points, semantic scene labels, scene flow, and occupancy grids. For each of these cues, we propose likelihood functions that are integrated into a probabilistic generative model. We learn all model parameters from training data using contrastive divergence. Experiments conducted on videos of 113 representative intersections show that our approach successfully infers the correct layout in a variety of very challenging scenarios. To evaluate the importance of each feature cue, experiments using different feature combinations are conducted. Furthermore, we show how by employing context derived from the proposed method we are able to improve over the state-of-the-art in terms of object detection and object orientation estimation in challenging and cluttered urban environments.

453 citations

Posted Content
TL;DR: In this paper, the authors provide a survey on the state-of-the-art datasets and techniques for autonomous driving, including recognition, reconstruction, motion estimation, tracking, scene understanding, and end-to-end learning.
Abstract: Recent years have witnessed enormous progress in AI-related fields such as computer vision, machine learning, and autonomous vehicles. As with any rapidly growing field, it becomes increasingly difficult to stay up-to-date or enter the field as a beginner. While several survey papers on particular sub-problems have appeared, no comprehensive survey on problems, datasets, and methods in computer vision for autonomous vehicles has been published. This book attempts to narrow this gap by providing a survey on the state-of-the-art datasets and techniques. Our survey includes both the historically most relevant literature as well as the current state of the art on several specific topics, including recognition, reconstruction, motion estimation, tracking, scene understanding, and end-to-end learning for autonomous driving. Towards this goal, we analyze the performance of the state of the art on several challenging benchmarking datasets, including KITTI, MOT, and Cityscapes. Besides, we discuss open problems and current research challenges. To ease accessibility and accommodate missing references, we also provide a website that allows navigating topics as well as methods and provides additional information.

114 citations

Posted Content
TL;DR: In this article, the authors present a generative probabilistic graphics program for reading sequences of degraded and adversarially obscured alphanumeric characters and inferring 3D road models from vehicle-mounted camera images.
Abstract: The idea of computer vision as the Bayesian inverse problem to computer graphics has a long history and an appealing elegance, but it has proved difficult to directly implement. Instead, most vision tasks are approached via complex bottom-up processing pipelines. Here we show that it is possible to write short, simple probabilistic graphics programs that define flexible generative models and to automatically invert them to interpret real-world images. Generative probabilistic graphics programs consist of a stochastic scene generator, a renderer based on graphics software, a stochastic likelihood model linking the renderer's output and the data, and latent variables that adjust the fidelity of the renderer and the tolerance of the likelihood model. Representations and algorithms from computer graphics, originally designed to produce high-quality images, are instead used as the deterministic backbone for highly approximate and stochastic generative models. This formulation combines probabilistic programming, computer graphics, and approximate Bayesian computation, and depends only on general-purpose, automatic inference techniques. We describe two applications: reading sequences of degraded and adversarially obscured alphanumeric characters, and inferring 3D road models from vehicle-mounted camera images. Each of the probabilistic graphics programs we present relies on under 20 lines of probabilistic code, and supports accurate, approximately Bayesian inferences about ambiguous real-world images.

83 citations

References
More filters
Proceedings ArticleDOI
23 Jun 1999
TL;DR: This paper discusses modeling each pixel as a mixture of Gaussians and using an on-line approximation to update the model, resulting in a stable, real-time outdoor tracker which reliably deals with lighting changes, repetitive motions from clutter, and long-term scene changes.
Abstract: A common method for real-time segmentation of moving regions in image sequences involves "background subtraction", or thresholding the error between an estimate of the image without moving objects and the current image. The numerous approaches to this problem differ in the type of background model used and the procedure used to update the model. This paper discusses modeling each pixel as a mixture of Gaussians and using an on-line approximation to update the model. The Gaussian, distributions of the adaptive mixture model are then evaluated to determine which are most likely to result from a background process. Each pixel is classified based on whether the Gaussian distribution which represents it most effectively is considered part of the background model. This results in a stable, real-time outdoor tracker which reliably deals with lighting changes, repetitive motions from clutter, and long-term scene changes. This system has been run almost continuously for 16 months, 24 hours a day, through rain and snow.

7,660 citations


"3D Scene priors for road detection" refers methods in this paper

  • ...To this end, an exponentially weighted moving average (EWMA) [20, 21] is used to express the dynamic structure of the data (previously detected road)....

    [...]

  • ...EWMA uses a decay factor that weighs the influence of each past result....

    [...]

  • ...Further, EWMA assumes that the road detected in the current frame is correlated (similar) to the road detected in previous frames....

    [...]

  • ...Using EWMA, the weights are computed as follows: E[p(xi = R)t ] = 1 ∑Tj=1 λ j−1 T ∑ j=1 λ j−1 p(xi = R)(t− j), (1) where E[p(xi = R)t ] is the expected probability of a pixel being a road at discrete instant time t (current frame) and p(xi = R)(t− j) is the probability of a pixel being a road j frames before the frame being analyzed....

    [...]

Journal ArticleDOI
TL;DR: The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.
Abstract: In this paper, we propose a computational model of the recognition of real world scenes that bypasses the segmentation and the processing of individual objects or regions. The procedure is based on a very low dimensional representation of the scene, that we term the Spatial Envelope. We propose a set of perceptual dimensions (naturalness, openness, roughness, expansion, ruggedness) that represent the dominant spatial structure of a scene. Then, we show that these dimensions may be reliably estimated using spectral and coarsely localized information. The model generates a multidimensional space in which scenes sharing membership in semantic categories (e.g., streets, highways, coasts) are projected closed together. The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.

6,882 citations


"3D Scene priors for road detection" refers methods in this paper

  • ...This method estimates the horizon line by applying non–linear mixtures of linear regressors to the description of an image obtained using gist descriptors [13]....

    [...]

Journal ArticleDOI
TL;DR: A common theoretical framework for combining classifiers which use distinct pattern representations is developed and it is shown that many existing schemes can be considered as special cases of compound classification where all the pattern representations are used jointly to make a decision.
Abstract: We develop a common theoretical framework for combining classifiers which use distinct pattern representations and show that many existing schemes can be considered as special cases of compound classification where all the pattern representations are used jointly to make a decision. An experimental comparison of various classifier combination schemes demonstrates that the combination rule developed under the most restrictive assumptions-the sum rule-outperforms other classifier combinations schemes. A sensitivity analysis of the various schemes to estimation errors is carried out to show that this finding can be justified theoretically.

5,670 citations


Additional excerpts

  • ...In general, combining multiple classifiers is a powerful technique to improve the performance of single classifiers [9, 7]....

    [...]

Journal ArticleDOI
TL;DR: This chapter discusses the development of the Spatial Point Pattern Analysis Code in S–PLUS, which was developed in 1993 by P. J. Diggle and D. C. Griffith.
Abstract: (2005). Combining Pattern Classifiers: Methods and Algorithms. Technometrics: Vol. 47, No. 4, pp. 517-518.

3,933 citations


"3D Scene priors for road detection" refers methods in this paper

  • ...Finally, the proposed method provides highest road detection accuracy when compared to state–of–the–art methods....

    [...]

Proceedings ArticleDOI
20 Jun 2005
TL;DR: This work proposes a novel approach to learn and recognize natural scene categories by representing the image of a scene by a collection of local regions, denoted as codewords obtained by unsupervised learning.
Abstract: We propose a novel approach to learn and recognize natural scene categories. Unlike previous work, it does not require experts to annotate the training set. We represent the image of a scene by a collection of local regions, denoted as codewords obtained by unsupervised learning. Each region is represented as part of a "theme". In previous work, such themes were learnt from hand-annotations of experts, while our method learns the theme distributions as well as the codewords distribution over the themes without supervision. We report satisfactory categorization performances on a large set of 13 categories of complex scenes.

3,920 citations


"3D Scene priors for road detection" refers methods in this paper

  • ...Following the approach in [16], images are described using opponent SIFT descriptor with dense sampling [17] (10 pixels sampling grid)....

    [...]