scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

A deep learning approach to traffic lights: Detection, tracking, and classification

01 May 2017-pp 1370-1377
TL;DR: A complete system consisting of a traffic light detector, tracker, and classifier based on deep learning, stereo vision, and vehicle odometry which perceives traffic lights in real-time is proposed.
Abstract: Reliable traffic light detection and classification is crucial for automated driving in urban environments. Currently, there are no systems that can reliably perceive traffic lights in real-time, without map-based information, and in sufficient distances needed for smooth urban driving. We propose a complete system consisting of a traffic light detector, tracker, and classifier based on deep learning, stereo vision, and vehicle odometry which perceives traffic lights in real-time. Within the scope of this work, we present three major contributions. The first is an accurately labeled traffic light dataset of 5000 images for training and a video sequence of 8334 frames for evaluation. The dataset is published as the Bosch Small Traffic Lights Dataset and uses our results as baseline. It is currently the largest publicly available labeled traffic light dataset and includes labels down to the size of only 1 pixel in width. The second contribution is a traffic light detector which runs at 10 frames per second on 1280×720 images. When selecting the confidence threshold that yields equal error rate, we are able to detect traffic lights as small as 4 pixels in width. The third contribution is a traffic light tracker which uses stereo vision and vehicle odometry to compute the motion estimate of traffic lights and a neural network to correct the aforementioned motion estimate.
Citations
More filters
Posted Content
TL;DR: This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century's time (from the 1990s to 2019), and makes an in-deep analysis of their challenges as well as technical improvements in recent years.
Abstract: Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today's object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century's time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years.

802 citations


Cites background or methods from "A deep learning approach to traffic..."

  • ...BSTL [84] 2017 The largest traffic light detection dataset....

    [...]

  • ...In deep learning era, some well-known detectors such as Faster RCNN and SSD were applied in traffic sign/light detection tasks [83, 84, 378, 379]....

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors systematically summarize methodologies and discuss challenges for deep multi-modal object detection and semantic segmentation in autonomous driving and provide an overview of on-board sensors on test vehicles, open datasets, and background information for object detection.
Abstract: Recent advancements in perception for autonomous driving are driven by deep learning. In order to achieve robust and accurate scene understanding, autonomous vehicles are usually equipped with different sensors (e.g. cameras, LiDARs, Radars), and multiple sensing modalities can be fused to exploit their complementary properties. In this context, many methods have been proposed for deep multi-modal perception problems. However, there is no general guideline for network architecture design, and questions of “what to fuse”, “when to fuse”, and “how to fuse” remain open. This review paper attempts to systematically summarize methodologies and discuss challenges for deep multi-modal object detection and semantic segmentation in autonomous driving. To this end, we first provide an overview of on-board sensors on test vehicles, open datasets, and background information for object detection and semantic segmentation in autonomous driving research. We then summarize the fusion methodologies and discuss challenges and open questions. In the appendix, we provide tables that summarize topics and methods. We also provide an interactive online platform to navigate each reference: https://boschresearch.github.io/multimodalperception/ .

674 citations

Journal ArticleDOI
TL;DR: A detailed description of the architecture of the autonomy system of the self-driving car developed at the Universidade Federal do Espirito Santo (UFES), named Intelligent Autonomous Robotics Automobile (IARA), is presented.
Abstract: We survey research on self-driving cars published in the literature focusing on autonomous cars developed since the DARPA challenges, which are equipped with an autonomy system that can be categorized as SAE level 3 or higher. The architecture of the autonomy system of self-driving cars is typically organized into the perception system and the decision-making system. The perception system is generally divided into many subsystems responsible for tasks such as self-driving-car localization, static obstacles mapping, moving obstacles detection and tracking, road mapping, traffic signalization detection and recognition, among others. The decision-making system is commonly partitioned as well into many subsystems responsible for tasks such as route planning, path planning, behavior selection, motion planning, and control. In this survey, we present the typical architecture of the autonomy system of self-driving cars. We also review research on relevant methods for perception and decision making. Furthermore, we present a detailed description of the architecture of the autonomy system of the self-driving car developed at the Universidade Federal do Espirito Santo (UFES), named Intelligent Autonomous Robotics Automobile (IARA). Finally, we list prominent self-driving car research platforms developed by academia and technology companies, and reported in the media.

543 citations

01 Jan 2009
TL;DR: In this paper, a real-time traffic light recognition system for on-vehicle camera applications is presented, which is mainly based on a spot detection algorithm and is able to detect lights from a high distance with the main advantage of being not so sensitive to motion blur and illumination variations.
Abstract: This paper introduces a new real-time traffic light recognition system for on-vehicle camera applications. This approach has been tested with good results in urban scenes. Thanks to the use of our generic “Adaptive Templates” it would be possible to recognize different kinds of traffic lights from various countries. This approach is mainly based on a spot detection algorithm therefore able to detect lights from a high distance with the main advantage of being not so sensitive to motion blur and illumination variations. The detected spots together with other shape analysis form strong hypothesis we feed our Adaptive Templates Matcher with. Even though it is still in progress, our system was validated in real conditions in our prototype vehicle and also using registered video sequences. The authors noticed a high rate of correctly recognized traffic lights and very few false alarms. Processing is performed in real-time on 640x480 images using a 2.9GHz single core desktop computer.

154 citations

Posted Content
TL;DR: Instance-level denoising on the feature map is performed to enhance the detection to small and cluttered objects and a novel IoU constant factor is added to the smooth L1 loss to address the long standing boundary problem.
Abstract: Small and cluttered objects are common in real-world which are challenging for detection. The difficulty is further pronounced when the objects are rotated, as traditional detectors often routinely locate the objects in horizontal bounding box such that the region of interest is contaminated with background or nearby interleaved objects. In this paper, we first innovatively introduce the idea of denoising to object detection. Instance-level denoising on the feature map is performed to enhance the detection to small and cluttered objects. To handle the rotation variation, we also add a novel IoU constant factor to the smooth L1 loss to address the long standing boundary problem, which to our analysis, is mainly caused by the periodicity of angular (PoA) and exchangeability of edges (EoE). By combing these two features, our proposed detector is termed as SCRDet++. Extensive experiments are performed on large aerial images public datasets DOTA, DIOR, UCAS-AOD as well as natural image dataset COCO, scene text dataset ICDAR2015, small traffic light dataset BSTLD and our newly released S$^2$TLD by this paper. The results show the effectiveness of our approach. Project page at this https URL.

129 citations


Cites background or methods from "A deep learning approach to traffic..."

  • ...The training strategy is consistent with BSTLD....

    [...]

  • ...We conduct extensive ablation study and experiments on multiple datasets including both aerial images from DOTA [10], DIOR [11], UCAS-AOD [27], as well as natural image dataset COCO [8], scene text dataset ICDAR2015 [28], small traffic light dataset BSTLD [29] and our newly released S2TLD to illustrate the promising effects of our techniques....

    [...]

  • ...We perform extensive ablation studies and comparative experiments on multiple aerial image datasets such as DOTA, DIOR, UCAS-AOD, small traffic light dataset BSTLD and our released S2TLD, and demonstrate that our method achieves the state-of-the-art detection accuracy....

    [...]

  • ...BSTLD [29]: BSTLD contains 13,427 camera images at a resolution of 720 × 1,280 pixels and contains about 24,000 annotated small traffic lights....

    [...]

  • ...In the experiment, we divide BSTLD training set into a training set and a test set according to the ratio of 6 : 4....

    [...]

References
More filters
Proceedings Article
03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

73,978 citations

Journal Article
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Abstract: Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

33,597 citations


"A deep learning approach to traffic..." refers methods in this paper

  • ...In addition, we employ two max-pooling and three dropout layers [17]....

    [...]

Journal ArticleDOI
TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Abstract: The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the 5 years of the challenge, and propose future directions and improvements.

30,811 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
Abstract: Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet [20], the VGG net [31], and GoogLeNet [32]) into fully convolutional networks and transfer their learned representations by fine-tuning [3] to the segmentation task. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes less than one fifth of a second for a typical image.

28,225 citations


"A deep learning approach to traffic..." refers background or methods in this paper

  • ...With the recent advances and performance of deep neural networks [10]–[13], significant improvements were made in several fields of machine learning and especially computer vision....

    [...]

  • ...Deep learning has been used for image classification [10], end-to-end object detection [11], pixel-precise object segmentation [13], and other applications....

    [...]

Proceedings ArticleDOI
27 Jun 2016
TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
Abstract: We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background. Finally, YOLO learns very general representations of objects. It outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

27,256 citations


"A deep learning approach to traffic..." refers background or methods in this paper

  • ...Removing the classification part from the YOLO architecture yields the following loss function: λcoord s2∑ i=0 B∑ j=0 1objij ( (xi − x̂i)2 + (yi − ŷi)2 ) + λcoord s2∑ i=0 B∑ j=0 1objij ( ( √ wi − √ ŵi) 2 + ( √ hi − √ ĥi) 2 ) + λnoobj s2∑ i=0 B∑ j=0 1noobjij (pi) 2 + s2∑ i=0 B∑ j=0 1objij (pi − p̂i)2 (1) For a detailed explanation of all terms, please refer to [11]....

    [...]

  • ...As in [11], this part of the loss function is only used if an object overlaps with the prediction....

    [...]

  • ...The detection of traffic lights is carried out by an end-to-end trained neural network [11], which we adapt to detect traffic lights as small as 3 × 10 pixels....

    [...]

  • ...Initial tests using the original YOLO architecture showed lower performance in detection when the classification part of the network was used....

    [...]

  • ...Deep learning has been used for image classification [10], end-to-end object detection [11], pixel-precise object segmentation [13], and other applications....

    [...]