scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Real-time Detection of Vehicle and Traffic Light for Intelligent and Connected Vehicles Based on YOLOv3 Network

14 Jul 2019-pp 388-392
TL;DR: Through the experimental analysis of the measured images in urban environment, it is shown that the designed model can not only satisfy the real-time requirements, but also improve the accuracy of the detection of vehicles and traffic lights.
Abstract: Real-time detection of vehicle and traffic light is essential for intelligent and connected vehicles especially in urban environment. In this paper, a new vehicle and traffic light dataset is established and a real-time detection model of vehicle and traffic light based on You Look Only Once (YOLO) network is presented. A joint training method for target classification and detection is proposed by YOLOv3, aiming to balance the detection accuracy and speed. The YOLOv3 network has lower requirements on hardware devices than other target detection algorithms like Faster R-CNN. Through the experimental analysis of the measured images in urban environment, it is shown that the designed model can not only satisfy the real-time requirements, but also improve the accuracy of the detection of vehicles and traffic lights.
Citations
More filters
Journal ArticleDOI
31 Oct 2020-Sensors
TL;DR: This work proposes a novel vehicle detection model named Priority Vehicle Image Detection Network (PVIDNet), based on YOLOV3, a lightweight design strategy for the PVIDNet model using an activation function to decrease the execution time, a traffic control algorithm based on the Brazilian Traffic Code, and a database containing Brazilian vehicle images.
Abstract: Minimizing human intervention in engines, such as traffic lights, through automatic applications and sensors has been the focus of many studies. Thus, Deep Learning (DL) algorithms have been studied for traffic signs and vehicle identification in an urban traffic context. However, there is a lack of priority vehicle classification algorithms with high accuracy, fast processing, and a lightweight solution. For filling those gaps, a vehicle detection system is proposed, which is integrated with an intelligent traffic light. Thus, this work proposes (1) a novel vehicle detection model named Priority Vehicle Image Detection Network (PVIDNet), based on YOLOV3, (2) a lightweight design strategy for the PVIDNet model using an activation function to decrease the execution time of the proposed model, (3) a traffic control algorithm based on the Brazilian Traffic Code, and (4) a database containing Brazilian vehicle images. The effectiveness of the proposed solutions were evaluated using the Simulation of Urban MObility (SUMO) tool. Results show that PVIDNet reached an accuracy higher than 0.95, and the waiting time of priority vehicles was reduced by up to 50%, demonstrating the effectiveness of the proposed solution.

19 citations


Cites background or methods from "Real-time Detection of Vehicle and ..."

  • ...According to [47], YOLOV3 is an improved version of YOLOV2 with the aim to obtain a higher accuracy through the use of scales forecasts, multi-label classification prediction, and a resource extractor characteristic....

    [...]

  • ...In [47], a method of real-time detection of vehicles and traffic...

    [...]

Journal ArticleDOI
01 May 2022-Sensors
TL;DR: This work trained and applied transfer learning-based fine-tuning on several state-of-the-art YOLO (You Only Look Once) networks and proposed a multi-vehicle tracking algorithm that obtains the per-lane count, classification, and speed of vehicles in real time.
Abstract: Accurate vehicle classification and tracking are increasingly important subjects for intelligent transport systems (ITSs) and for planning that utilizes precise location intelligence. Deep learning (DL) and computer vision are intelligent methods; however, accurate real-time classification and tracking come with problems. We tackle three prominent problems (P1, P2, and P3): the need for a large training dataset (P1), the domain-shift problem (P2), and coupling a real-time multi-vehicle tracking algorithm with DL (P3). To address P1, we created a training dataset of nearly 30,000 samples from existing cameras with seven classes of vehicles. To tackle P2, we trained and applied transfer learning-based fine-tuning on several state-of-the-art YOLO (You Only Look Once) networks. For P3, we propose a multi-vehicle tracking algorithm that obtains the per-lane count, classification, and speed of vehicles in real time. The experiments showed that accuracy doubled after fine-tuning (71% vs. up to 30%). Based on a comparison of four YOLO networks, coupling the YOLOv5-large network to our tracking algorithm provided a trade-off between overall accuracy (95% vs. up to 90%), loss (0.033 vs. up to 0.036), and model size (91.6 MB vs. up to 120.6 MB). The implications of these results are in spatial information management and sensing for intelligent transport planning.

16 citations

Journal ArticleDOI
20 Jul 2020-Sensors
TL;DR: A method to detect a traffic light from images captured by a high-speed camera that can recognize a blinking traffic light that can detect traffic lights with a different appearance without tuning parameters and without datasets having to be learned is proposed.
Abstract: LEDs are widely employed as traffic lights. Because most LED traffic lights are driven by alternative power, they blink at high frequencies, even at twice their frequencies. We propose a method to detect a traffic light from images captured by a high-speed camera that can recognize a blinking traffic light. This technique is robust under various illuminations because it can detect traffic lights by extracting information from the blinking pixels at a specific frequency. The method is composed of six modules, which includes a band-pass filter and a Kalman filter. All the modules run simultaneously to achieve real-time processing and can run at 500 fps for images with a resolution of 800 × 600. This technique was verified on an original dataset captured by a high-speed camera under different illumination conditions such as a sunset or night scene. The recall and accuracy justify the generalization of the proposed detection system. In particular, it can detect traffic lights with a different appearance without tuning parameters and without datasets having to be learned.

11 citations


Cites background or methods from "Real-time Detection of Vehicle and ..."

  • ...Network models such as RetinaNet [26], and YOLO [8,22] have also been studied....

    [...]

  • ...Some leaning based methods include not only traffic light detection but also car detection [8] and approaches that recognize which lane a traffic light belongs to have been developed [10,11]....

    [...]

  • ...Because of the rapid development of machine learning techniques, this is currently one of the most popular approaches [8,9]....

    [...]

Journal ArticleDOI
TL;DR: In this paper , a novel YOLOv3-SPP+ model has been developed to improve the detection performance with dividing the image from finer to coarser levels and enhancing local features.

8 citations

Journal ArticleDOI
TL;DR: In this paper, an automated video analysis, detection and tracking system was introduced to evaluate the traffic conditions, analyze blocked vehicle behaviors at grade crossings, and predict the decongestion time under a simplified scenario.

8 citations

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
Abstract: We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background. Finally, YOLO learns very general representations of objects. It outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

27,256 citations


"Real-time Detection of Vehicle and ..." refers methods in this paper

  • ...Many typical algorithms were presented after the R-CNN, such as SPPNet [4], Fast R-CNN [5], Faster R-CNN [6], R-FCN [7], FPN [8], Mask RCNN [9], SSD [10], YOLO [11], YOLOv2 [12], YOLOv3 [13] and so on....

    [...]

Journal ArticleDOI
TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.
Abstract: State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [1] and Fast R-CNN [2] have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features—using the recently popular terminology of neural networks with ’attention’ mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model [3] , our detection system has a frame rate of 5 fps ( including all steps ) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

26,458 citations


"Real-time Detection of Vehicle and ..." refers methods in this paper

  • ...In 2014, the R-CNN algorithm was proposed by Girshick et al, who introduced CNN for the first time in the field of objection detection....

    [...]

  • ...Many typical algorithms were presented after the R-CNN, such as SPPNet [4], Fast R-CNN [5], Faster R-CNN [6], R-FCN [7], FPN [8], Mask RCNN [9], SSD [10], YOLO [11], YOLOv2 [12], YOLOv3 [13] and so on....

    [...]

  • ...The YOLOv3 network has lower requirements on hardware devices than other target detection algorithms like Faster R-CNN....

    [...]

Posted Content
TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.
Abstract: State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features---using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

23,183 citations

Book ChapterDOI
08 Oct 2016
TL;DR: The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.
Abstract: We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. SSD is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stages and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, COCO, and ILSVRC datasets confirm that SSD has competitive accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. For \(300 \times 300\) input, SSD achieves 74.3 % mAP on VOC2007 test at 59 FPS on a Nvidia Titan X and for \(512 \times 512\) input, SSD achieves 76.9 % mAP, outperforming a comparable state of the art Faster R-CNN model. Compared to other single stage methods, SSD has much better accuracy even with a smaller input image size. Code is available at https://github.com/weiliu89/caffe/tree/ssd.

19,543 citations


"Real-time Detection of Vehicle and ..." refers methods in this paper

  • ...Many typical algorithms were presented after the R-CNN, such as SPPNet [4], Fast R-CNN [5], Faster R-CNN [6], R-FCN [7], FPN [8], Mask RCNN [9], SSD [10], YOLO [11], YOLOv2 [12], YOLOv3 [13] and so on....

    [...]

  • ...B. YOLOv2 In view of the fact that the accuracy of YOLO is not high enough, it is easy to miss detection, and the effect of objects with uncommon long aspect ratio is poor, combined with the characteristics of SSD, YOLOv2 is proposed....

    [...]

  • ...2) Remove the fully connected layer: Like the SSD, the model contains only the convolution and the average pooling layer....

    [...]

Proceedings ArticleDOI
21 Jul 2017
TL;DR: This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles.
Abstract: Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But pyramid representations have been avoided in recent object detectors that are based on deep convolutional networks, partially because they are slow to compute and memory intensive. In this paper, we exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost. A top-down architecture with lateral connections is developed for building high-level semantic feature maps at all scales. This architecture, called a Feature Pyramid Network (FPN), shows significant improvement as a generic feature extractor in several applications. Using a basic Faster R-CNN system, our method achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles, surpassing all existing single-model entries including those from the COCO 2016 challenge winners. In addition, our method can run at 5 FPS on a GPU and thus is a practical and accurate solution to multi-scale object detection. Code will be made publicly available.

16,727 citations


"Real-time Detection of Vehicle and ..." refers methods in this paper

  • ...Many typical algorithms were presented after the R-CNN, such as SPPNet [4], Fast R-CNN [5], Faster R-CNN [6], R-FCN [7], FPN [8], Mask RCNN [9], SSD [10], YOLO [11], YOLOv2 [12], YOLOv3 [13] and so on....

    [...]