scispace - formally typeset
Search or ask a question
Book ChapterDOI

Cyclist Detection Using Tiny YOLO v2

TL;DR: The Tiny Y OLO v2 algorithm is used here and requires less computational resources and higher real-time performance than the YOLO method, which is extremely desirable for the convenience of such autonomous vehicles.
Abstract: This paper seeks to evaluate the performance of the state of the art object classification algorithms for the purpose of cyclist detection using the Tsinghua–Daimler Cyclist Benchmark. This model focuses on detecting cyclists on the road for its use in development of autonomous road vehicles and advanced driver-assistance systems for hybrid vehicles. The Tiny YOLO v2 algorithm is used here and requires less computational resources and higher real-time performance than the YOLO method, which is extremely desirable for the convenience of such autonomous vehicles. The model has been trained using the training images in the mentioned benchmark and has been tested for the test images available for the same. The average IoU for all the truth objects is calculated and the precision-recall graph for different thresholds was plotted.
Citations
More filters
Journal ArticleDOI
TL;DR: The proposed model is very useful for framers, as they can identify the plant diseases as soon as it appears and thus, do proper measures to prevent the spread of the disease.
Abstract: In the era of twenty-first century, artificial intelligence plays a vital role in the day to day life of human beings. Now days it has been used for many application such as medical, communication, object detection, object identification, object tracking. This paper is focused on the identification of diseases in bell pepper plant in large fields using deep learning approach. Bell Pepper farmers, in general do not notice if their plants are infected with bacterial spot disease. The spread of the disease usually causes a decrease in the yield. The solution is to detect if bacterial spot disease is present in the bell pepper plant at an early stage. We do some random sampling of few pictures from different parts of the farm. YOLOv5 is used for detecting the bacterial spot disease in bell pepper plant from the symptoms seen on the leaves. With YOLO v5 we are able to detect even a small spot of disease with considerable speed and accuracy. It takes the full image in a single instant and predicts bounding boxes and class probability. The input to the model is random picture from the farm by using a mobile phone. By viewing the output of the program, farmers can find out whether bacterial spot disease has in any way affected the plants in their farm. The proposed model is very useful for framers, as they can identify the plant diseases as soon as it appears and thus, do proper measures to prevent the spread of the disease. The motive of this paper is to come up with a method of detecting the bacterial spot disease in bell pepper plant from pictures taken from the farm.

43 citations

Journal ArticleDOI
TL;DR: A performance comparison between the main algorithms reported in the literature for object detection, such as SSD, Faster R-CNN and R-FCN along with Mobilenet V2, InceptionV2, Res net50, ResNet101 feature extractors is presented and multi-class detection with eight different classes according to orientations is proposed.
Abstract: In this work, orientation detection using Deep Learning is acknowledged for a particularly vulnerable class of road users,the cyclists. Knowing the cyclists' orientation is of great relevance since it provides a good notion about their future trajectory, which is crucial to avoid accidents in the context of intelligent transportation systems. Using Transfer Learning with pre-trained models and TensorFlow, we present a performance comparison between the main algorithms reported in the literature for object detection,such as SSD, Faster R-CNN and R-FCN along with MobilenetV2, InceptionV2, ResNet50, ResNet101 feature extractors. Moreover, we propose multi-class detection with eight different classes according to orientations. To do so, we introduce a new dataset called "Detect-Bike", containing 20,229 cyclist instances over 11,103 images, which has been labeled based on cyclist's orientation. Then, the same Deep Learning methods used for detection are trained to determine the target's heading. Our experimental results and vast evaluation showed satisfactory performance of all of the studied methods for the cyclists and their orientation detection, especially using Faster R-CNN with ResNet50 proved to be precise but significantly slower. Meanwhile, SSD using InceptionV2 provided good trade-off between precision and execution time, and is to be preferred for real-time embedded applications.

6 citations

DOI
TL;DR: In this article , Li et al. proposed to estimate the cyclist's intention to cross a road based on the body and head orientation using deep neural networks, which can detect the cyclists and extract the area of the cyclist from red green blue (RGB) images.
Abstract: Improving the safety of bicycle riders is one of the critical issues for autonomous driving. The crossing intention of the cyclist is expected to be predicted from the on- board camera of an autonomous vehicle. In a real traffic situation, a cyclist usually turns his or her head to check the situation of the back of him or her before he or she crosses the road. Therefore, the action of turning the head is an important cue to indicate the intention of crossing a road. This research proposes to estimate the cyclist’s intention based on the body and head orientation using deep neural networks. The proposed system first detects the cyclists and extracts the area of the cyclist from red green blue (RGB) images based on a segmentation neural network. After that, the image of each cyclist is processed by a pose estimation neural network to detect each joint of the cyclist. Subsequently, the heat-map image of each joint of the cyclist is imported into a classification neural network to estimate the body and head orientation. The body orientation and head orientation are jointly used for the prediction of the cyclist’s intention. In a separate process, the cyclist’s position is estimated based on the disparity image generated from a stereo camera. Finally, two results, the cyclist’s position and intention, are integrated to predict the trajectory of the cyclist. A series of experiments have been performed and the experimental results demonstrate that the proposed system has a satisfactory performance. In addition, the comparison experiments show that the model with only heat-map images as input has the best accuracy in the body and head orientation estimation.

2 citations

Journal ArticleDOI
TL;DR: In this paper , Li et al. proposed to estimate the cyclist's intention to cross a road based on the body and head orientation using deep neural networks, which can detect the cyclists and extract the area of the cyclist from red green blue (RGB) images.
Abstract: Improving the safety of bicycle riders is one of the critical issues for autonomous driving. The crossing intention of the cyclist is expected to be predicted from the on- board camera of an autonomous vehicle. In a real traffic situation, a cyclist usually turns his or her head to check the situation of the back of him or her before he or she crosses the road. Therefore, the action of turning the head is an important cue to indicate the intention of crossing a road. This research proposes to estimate the cyclist’s intention based on the body and head orientation using deep neural networks. The proposed system first detects the cyclists and extracts the area of the cyclist from red green blue (RGB) images based on a segmentation neural network. After that, the image of each cyclist is processed by a pose estimation neural network to detect each joint of the cyclist. Subsequently, the heat-map image of each joint of the cyclist is imported into a classification neural network to estimate the body and head orientation. The body orientation and head orientation are jointly used for the prediction of the cyclist’s intention. In a separate process, the cyclist’s position is estimated based on the disparity image generated from a stereo camera. Finally, two results, the cyclist’s position and intention, are integrated to predict the trajectory of the cyclist. A series of experiments have been performed and the experimental results demonstrate that the proposed system has a satisfactory performance. In addition, the comparison experiments show that the model with only heat-map images as input has the best accuracy in the body and head orientation estimation.

2 citations

Journal ArticleDOI
01 Sep 2021
TL;DR: In this article, a multi-class detection with eight classes according to orientations is proposed to identify the cyclist heading and predict its intentions, and the vulnerability of the cyclists is evaluated for each instance in the field of view, taking into account their proximity and predicted intentions according to their heading angle, and a risk level is assigned to each cyclist.
Abstract: Timely detection of vulnerable road users is of great relevance to avoid accidents in the context of intelligent transportation systems. In this work, detection and tracking is acknowledged for a particularly vulnerable class of road users, the cyclists. We present a performance comparison between the main deep learning-based algorithms reported in the literature for object detection, such as SSD, Faster R-CNN and R-FCN along with InceptionV2, ResNet50, ResNet101, Mobilenet V2 feature extractors. In order to identify the cyclist heading and predict its intentions, we propose a multi-class detection with eight classes according to orientations. To do so, we introduce a new dataset called “CIMAT-Cyclist”, containing 20,229 cyclist instances over 11,103 images, labeled based on the cyclist’s orientation. To improve the performance in cyclists’ detection, the Kalman filter is used for tracking, coupled together with the Kuhn–Munkres algorithm for multi-target association. Finally, the vulnerability of the cyclists is evaluated for each instance in the field of view, taking into account their proximity and predicted intentions according to their heading angle, and a risk level is assigned to each cyclist. Experimental results validate the proposed strategy in real scenarios, showing good performance.

2 citations

References
More filters
Posted Content
TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.
Abstract: State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features---using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

23,183 citations

Proceedings ArticleDOI
Ross Girshick1
07 Dec 2015
TL;DR: Fast R-CNN as discussed by the authors proposes a Fast Region-based Convolutional Network method for object detection, which employs several innovations to improve training and testing speed while also increasing detection accuracy and achieves a higher mAP on PASCAL VOC 2012.
Abstract: This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection. Fast R-CNN builds on previous work to efficiently classify object proposals using deep convolutional networks. Compared to previous work, Fast R-CNN employs several innovations to improve training and testing speed while also increasing detection accuracy. Fast R-CNN trains the very deep VGG16 network 9x faster than R-CNN, is 213x faster at test-time, and achieves a higher mAP on PASCAL VOC 2012. Compared to SPPnet, Fast R-CNN trains VGG16 3x faster, tests 10x faster, and is more accurate. Fast R-CNN is implemented in Python and C++ (using Caffe) and is available under the open-source MIT License at https://github.com/rbgirshick/fast-rcnn.

14,824 citations

Posted Content
Ross Girshick1
TL;DR: This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection that builds on previous work to efficiently classify object proposals using deep convolutional networks.
Abstract: This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection. Fast R-CNN builds on previous work to efficiently classify object proposals using deep convolutional networks. Compared to previous work, Fast R-CNN employs several innovations to improve training and testing speed while also increasing detection accuracy. Fast R-CNN trains the very deep VGG16 network 9x faster than R-CNN, is 213x faster at test-time, and achieves a higher mAP on PASCAL VOC 2012. Compared to SPPnet, Fast R-CNN trains VGG16 3x faster, tests 10x faster, and is more accurate. Fast R-CNN is implemented in Python and C++ (using Caffe) and is available under the open-source MIT License at this https URL.

14,747 citations

Proceedings ArticleDOI
16 Jun 2012
TL;DR: The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world.
Abstract: Today, visual recognition systems are still rarely employed in robotics applications. Perhaps one of the main reasons for this is the lack of demanding benchmarks that mimic such scenarios. In this paper, we take advantage of our autonomous driving platform to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection. Our recording platform is equipped with four high resolution video cameras, a Velodyne laser scanner and a state-of-the-art localization system. Our benchmarks comprise 389 stereo and optical flow image pairs, stereo visual odometry sequences of 39.2 km length, and more than 200k 3D object annotations captured in cluttered scenarios (up to 15 cars and 30 pedestrians are visible per image). Results from state-of-the-art algorithms reveal that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world. Our goal is to reduce this bias by providing challenging benchmarks with novel difficulties to the computer vision community. Our benchmarks are available online at: www.cvlibs.net/datasets/kitti

11,283 citations

Posted Content
TL;DR: YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories, is introduced and a method to jointly train on object detection and classification is proposed, both novel and drawn from prior work.
Abstract: We introduce YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories. First we propose various improvements to the YOLO detection method, both novel and drawn from prior work. The improved model, YOLOv2, is state-of-the-art on standard detection tasks like PASCAL VOC and COCO. At 67 FPS, YOLOv2 gets 76.8 mAP on VOC 2007. At 40 FPS, YOLOv2 gets 78.6 mAP, outperforming state-of-the-art methods like Faster RCNN with ResNet and SSD while still running significantly faster. Finally we propose a method to jointly train on object detection and classification. Using this method we train YOLO9000 simultaneously on the COCO detection dataset and the ImageNet classification dataset. Our joint training allows YOLO9000 to predict detections for object classes that don't have labelled detection data. We validate our approach on the ImageNet detection task. YOLO9000 gets 19.7 mAP on the ImageNet detection validation set despite only having detection data for 44 of the 200 classes. On the 156 classes not in COCO, YOLO9000 gets 16.0 mAP. But YOLO can detect more than just 200 classes; it predicts detections for more than 9000 different object categories. And it still runs in real-time.

8,505 citations