A deep learning approach to traffic lights: Detection, tracking, and classification

doi:10.1109/ICRA.2017.7989163

Home
/
Papers
/
A deep learning approach to traffic lights: Detection, tracking, and classification

Proceedings Article•DOI•

A deep learning approach to traffic lights: Detection, tracking, and classification

Karsten Behrendt¹, Libor Novak², Rami Botros²•Institutions (2)

Bosch¹, Czech Technical University in Prague²

01 May 2017-pp 1370-1377

TL;DR: A complete system consisting of a traffic light detector, tracker, and classifier based on deep learning, stereo vision, and vehicle odometry which perceives traffic lights in real-time is proposed.

read less

Abstract: Reliable traffic light detection and classification is crucial for automated driving in urban environments. Currently, there are no systems that can reliably perceive traffic lights in real-time, without map-based information, and in sufficient distances needed for smooth urban driving. We propose a complete system consisting of a traffic light detector, tracker, and classifier based on deep learning, stereo vision, and vehicle odometry which perceives traffic lights in real-time. Within the scope of this work, we present three major contributions. The first is an accurately labeled traffic light dataset of 5000 images for training and a video sequence of 8334 frames for evaluation. The dataset is published as the Bosch Small Traffic Lights Dataset and uses our results as baseline. It is currently the largest publicly available labeled traffic light dataset and includes labels down to the size of only 1 pixel in width. The second contribution is a traffic light detector which runs at 10 frames per second on 1280×720 images. When selecting the confidence threshold that yields equal error rate, we are able to detect traffic lights as small as 4 pixels in width. The third contribution is a traffic light tracker which uses stereo vision and vehicle odometry to compute the motion estimate of traffic lights and a neural network to correct the aforementioned motion estimate.

...read moreread less

Citations

PDF

Open Access

More filters

Posted Content•

Object Detection in 20 Years: A Survey

[...]

Zhengxia Zou¹, Zhenwei Shi², Yuhong Guo, Jieping Ye¹•Institutions (2)

University of Michigan¹, Beihang University²

13 May 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century's time (from the 1990s to 2019), and makes an in-deep analysis of their challenges as well as technical improvements in recent years.

...read moreread less

Abstract: Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today's object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century's time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years.

...read moreread less

802 citations

Cites background or methods from "A deep learning approach to traffic..."

...BSTL [84] 2017 The largest traffic light detection dataset....
[...]
...In deep learning era, some well-known detectors such as Faster RCNN and SSD were applied in traffic sign/light detection tasks [83, 84, 378, 379]....
[...]

Journal Article•DOI•

Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges

[...]

Di Feng¹, Christian Haase-Schutz¹, Lars Rosenbaum¹, Heinz Hertlein¹, Claudius Gläser¹, Fabian Timm¹, Werner Wiesbeck², Klaus Dietmayer³ - Show less +4 more•Institutions (3)

Bosch¹, Karlsruhe Institute of Technology², University of Ulm³

01 Mar 2021-IEEE Transactions on Intelligent Transportation Systems

TL;DR: In this article, the authors systematically summarize methodologies and discuss challenges for deep multi-modal object detection and semantic segmentation in autonomous driving and provide an overview of on-board sensors on test vehicles, open datasets, and background information for object detection.

...read moreread less

Abstract: Recent advancements in perception for autonomous driving are driven by deep learning. In order to achieve robust and accurate scene understanding, autonomous vehicles are usually equipped with different sensors (e.g. cameras, LiDARs, Radars), and multiple sensing modalities can be fused to exploit their complementary properties. In this context, many methods have been proposed for deep multi-modal perception problems. However, there is no general guideline for network architecture design, and questions of “what to fuse”, “when to fuse”, and “how to fuse” remain open. This review paper attempts to systematically summarize methodologies and discuss challenges for deep multi-modal object detection and semantic segmentation in autonomous driving. To this end, we first provide an overview of on-board sensors on test vehicles, open datasets, and background information for object detection and semantic segmentation in autonomous driving research. We then summarize the fusion methodologies and discuss challenges and open questions. In the appendix, we provide tables that summarize topics and methods. We also provide an interactive online platform to navigate each reference: https://boschresearch.github.io/multimodalperception/ .

...read moreread less

674 citations

Journal Article•DOI•

Self-driving cars: A survey

[...]

Claudine Badue¹, Rânik Guidolini¹, Raphael V. Carneiro¹, Pedro Azevedo¹, Vinicius B. Cardoso¹, Avelino Forechi, Luan F. R. Jesus¹, Rodrigo F. Berriel¹, Thiago M. Paixão, Filipe Mutz, Lucas de Paula Veronese¹, Thiago Oliveira-Santos¹, Alberto F. De Souza¹ - Show less +9 more•Institutions (1)

Universidade Federal do Espírito Santo¹

01 Mar 2021-Expert Systems With Applications

TL;DR: A detailed description of the architecture of the autonomy system of the self-driving car developed at the Universidade Federal do Espirito Santo (UFES), named Intelligent Autonomous Robotics Automobile (IARA), is presented.

...read moreread less

Abstract: We survey research on self-driving cars published in the literature focusing on autonomous cars developed since the DARPA challenges, which are equipped with an autonomy system that can be categorized as SAE level 3 or higher. The architecture of the autonomy system of self-driving cars is typically organized into the perception system and the decision-making system. The perception system is generally divided into many subsystems responsible for tasks such as self-driving-car localization, static obstacles mapping, moving obstacles detection and tracking, road mapping, traffic signalization detection and recognition, among others. The decision-making system is commonly partitioned as well into many subsystems responsible for tasks such as route planning, path planning, behavior selection, motion planning, and control. In this survey, we present the typical architecture of the autonomy system of self-driving cars. We also review research on relevant methods for perception and decision making. Furthermore, we present a detailed description of the architecture of the autonomy system of the self-driving car developed at the Universidade Federal do Espirito Santo (UFES), named Intelligent Autonomous Robotics Automobile (IARA). Finally, we list prominent self-driving car research platforms developed by academia and technology companies, and reported in the media.

...read moreread less

543 citations

Real Time Visual Traffic Lights Recognition Based on Spot Light Detection and Adaptive Traffic Lights Templates

[...]

Raoul de Charette¹, Fawzi Nashashibi¹•Institutions (1)

Mines ParisTech¹

01 Jan 2009

TL;DR: In this paper, a real-time traffic light recognition system for on-vehicle camera applications is presented, which is mainly based on a spot detection algorithm and is able to detect lights from a high distance with the main advantage of being not so sensitive to motion blur and illumination variations.

...read moreread less

Abstract: This paper introduces a new real-time traffic light recognition system for on-vehicle camera applications. This approach has been tested with good results in urban scenes. Thanks to the use of our generic “Adaptive Templates” it would be possible to recognize different kinds of traffic lights from various countries. This approach is mainly based on a spot detection algorithm therefore able to detect lights from a high distance with the main advantage of being not so sensitive to motion blur and illumination variations. The detected spots together with other shape analysis form strong hypothesis we feed our Adaptive Templates Matcher with. Even though it is still in progress, our system was validated in real conditions in our prototype vehicle and also using registered video sequences. The authors noticed a high rate of correctly recognized traffic lights and very few false alarms. Processing is performed in real-time on 640x480 images using a 2.9GHz single core desktop computer.

...read moreread less

154 citations

Posted Content•

SCRDet++: Detecting Small, Cluttered and Rotated Objects via Instance-Level Feature Denoising and Rotation Loss Smoothing

[...]

Xue Yang, Junchi Yan, Xiaokang Yang, Jin Tang, Wenlong Liao, Tao He - Show less +2 more

28 Apr 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: Instance-level denoising on the feature map is performed to enhance the detection to small and cluttered objects and a novel IoU constant factor is added to the smooth L1 loss to address the long standing boundary problem.

...read moreread less

Abstract: Small and cluttered objects are common in real-world which are challenging for detection. The difficulty is further pronounced when the objects are rotated, as traditional detectors often routinely locate the objects in horizontal bounding box such that the region of interest is contaminated with background or nearby interleaved objects. In this paper, we first innovatively introduce the idea of denoising to object detection. Instance-level denoising on the feature map is performed to enhance the detection to small and cluttered objects. To handle the rotation variation, we also add a novel IoU constant factor to the smooth L1 loss to address the long standing boundary problem, which to our analysis, is mainly caused by the periodicity of angular (PoA) and exchangeability of edges (EoE). By combing these two features, our proposed detector is termed as SCRDet++. Extensive experiments are performed on large aerial images public datasets DOTA, DIOR, UCAS-AOD as well as natural image dataset COCO, scene text dataset ICDAR2015, small traffic light dataset BSTLD and our newly released S$^2$TLD by this paper. The results show the effectiveness of our approach. Project page at this https URL.

...read moreread less

129 citations

Cites background or methods from "A deep learning approach to traffic..."

...The training strategy is consistent with BSTLD....
[...]
...We conduct extensive ablation study and experiments on multiple datasets including both aerial images from DOTA [10], DIOR [11], UCAS-AOD [27], as well as natural image dataset COCO [8], scene text dataset ICDAR2015 [28], small traffic light dataset BSTLD [29] and our newly released S2TLD to illustrate the promising effects of our techniques....
[...]
...We perform extensive ablation studies and comparative experiments on multiple aerial image datasets such as DOTA, DIOR, UCAS-AOD, small traffic light dataset BSTLD and our released S2TLD, and demonstrate that our method achieves the state-of-the-art detection accuracy....
[...]
...BSTLD [29]: BSTLD contains 13,427 camera images at a resolution of 720 × 1,280 pixels and contains about 24,000 annotated small traffic lights....
[...]
...In the experiment, we divide BSTLD training set into a training set and a test set according to the ratio of 6 : 4....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

Collapse

References

PDF

Open Access

More filters

Proceedings Article•

ImageNet Classification with Deep Convolutional Neural Networks

[...]

Alex Krizhevsky¹, Ilya Sutskever¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

...read moreread less

73,978 citations

Journal Article•

Dropout: a simple way to prevent neural networks from overfitting

[...]

Nitish Srivastava¹, Geoffrey E. Hinton¹, Alex Krizhevsky¹, Ilya Sutskever¹, Ruslan Salakhutdinov¹ - Show less +1 more•Institutions (1)

University of Toronto¹

01 Jan 2014-Journal of Machine Learning Research

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

...read moreread less

Abstract: Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

...read moreread less

33,597 citations

"A deep learning approach to traffic..." refers methods in this paper

...In addition, we employ two max-pooling and three dropout layers [17]....
[...]

Journal Article•DOI•

ImageNet Large Scale Visual Recognition Challenge

[...]

Olga Russakovsky¹, Jia Deng², Hao Su¹, Jonathan Krause¹, Sanjeev Satheesh¹, Sean Ma¹, Zhiheng Huang¹, Andrej Karpathy¹, Aditya Khosla³, Michael S. Bernstein¹, Alexander C. Berg⁴, Li Fei-Fei¹ - Show less +8 more•Institutions (4)

Stanford University¹, University of Michigan², Massachusetts Institute of Technology³, University of North Carolina at Chapel Hill⁴

01 Dec 2015-International Journal of Computer Vision

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.

...read moreread less

Abstract: The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the 5 years of the challenge, and propose future directions and improvements.

...read moreread less

30,811 citations

Proceedings Article•DOI•

Fully convolutional networks for semantic segmentation

[...]

Jonathan Long¹, Evan Shelhamer¹, Trevor Darrell¹•Institutions (1)

University of California, Berkeley¹

07 Jun 2015

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.

...read moreread less

Abstract: Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet [20], the VGG net [31], and GoogLeNet [32]) into fully convolutional networks and transfer their learned representations by fine-tuning [3] to the segmentation task. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes less than one fifth of a second for a typical image.

...read moreread less

28,225 citations

"A deep learning approach to traffic..." refers background or methods in this paper

...With the recent advances and performance of deep neural networks [10]–[13], significant improvements were made in several fields of machine learning and especially computer vision....
[...]
...Deep learning has been used for image classification [10], end-to-end object detection [11], pixel-precise object segmentation [13], and other applications....
[...]

Proceedings Article•DOI•

You Only Look Once: Unified, Real-Time Object Detection

[...]

Joseph Redmon¹, Santosh K. Divvala², Ross Girshick³, Ali Farhadi²•Institutions (3)

University of Washington¹, Allen Institute for Artificial Intelligence², Facebook³

27 Jun 2016

TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

...read moreread less

Abstract: We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background. Finally, YOLO learns very general representations of objects. It outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

...read moreread less

27,256 citations

"A deep learning approach to traffic..." refers background or methods in this paper

...Removing the classification part from the YOLO architecture yields the following loss function: λcoord s2∑ i=0 B∑ j=0 1objij ( (xi − x̂i)2 + (yi − ŷi)2 ) + λcoord s2∑ i=0 B∑ j=0 1objij ( ( √ wi − √ ŵi) 2 + ( √ hi − √ ĥi) 2 ) + λnoobj s2∑ i=0 B∑ j=0 1noobjij (pi) 2 + s2∑ i=0 B∑ j=0 1objij (pi − p̂i)2 (1) For a detailed explanation of all terms, please refer to [11]....
[...]
...As in [11], this part of the loss function is only used if an object overlaps with the prediction....
[...]
...The detection of traffic lights is carried out by an end-to-end trained neural network [11], which we adapt to detect traffic lights as small as 3 × 10 pixels....
[...]
...Initial tests using the original YOLO architecture showed lower performance in detection when the classification part of the network was used....
[...]
...Deep learning has been used for image classification [10], end-to-end object detection [11], pixel-precise object segmentation [13], and other applications....
[...]