scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Reliable and Rapid Traffic Congestion Detection Approach Based on Deep Residual Learning and Motion Trajectories

02 Oct 2020-IEEE Access (IEEE)-Vol. 8, pp 182180-182192
TL;DR: This article proposes a rapid and reliable traffic congestion detection method based on the modeling of video dynamics using deep residual learning and motion trajectories that achieves competitive results when compared to state-of-the-art methods.
Abstract: Traffic congestion detection systems help manage traffic in crowded cities by analyzing videos of vehicles. Existing systems largely depend on texture and motion features. Such systems face several challenges, including illumination changes caused by variations in weather conditions, complexity of scenes, vehicle occlusion, and the ambiguity of stopped vehicles. To overcome these issues, this article proposes a rapid and reliable traffic congestion detection method based on the modeling of video dynamics using deep residual learning and motion trajectories. The proposed method efficiently uses both motion and deep texture features to overcome the limitations of existing methods. Unlike other methods that simply extract texture features from a single frame, we use an efficient representation learning method to capture the latent structures in traffic videos by modeling the evolution of texture features. This representation yields a noticeable improvement in detection results under various weather conditions. Regarding motion features, we propose an algorithm to distinguish stopped vehicles and background objects, whereas most existing motion-based approaches fail to address this issue. Both types of obtained features are used to construct an ensemble classification model based on the support vector machine algorithm. Two benchmark datasets are considered to demonstrate the robustness of the proposed method: the UCSD dataset and NU1 video dataset. The proposed method achieves competitive results (97.64% accuracy) when compared to state-of-the-art methods.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
29 Apr 2021-Sensors
TL;DR: In this paper, a survey of intersection management approaches aiming at improving the efficiency or guaranteeing the safety when vehicles pass the crossing is presented, respectively surveyed from the perspectives of efficiency and safety.
Abstract: Intersection management is a sophisticated event in the intelligent transportation system due to a variety of behavior for traffic participants. This paper primarily overviews recent studies on the scenes of intersection, aiming at improving the efficiency or guaranteeing the safety when vehicles pass the crossing. These studies are respectively surveyed from the perspectives of efficiency and safety. Firstly, recent contributions to efficiency-oriented, intersection management overviews from four scenes, including congestion avoidance, green light optimized speed advisory (GLOSA), trajectory planning, and emergency vehicle priority preemption control. Furthermore, the studies on intersection collision detection and abnormal information warning are surveyed in the safety category. The corresponding algorithms for velocity and route management presented in the surveyed works are discussed.

6 citations

Journal ArticleDOI
TL;DR: A deep neural network (DNN), which has two input paths, is proposed for traffic congestion recognition, which handles the evolution of motion as well as texture through its two inputs simultaneously via Long Short-Term Memory (LSTM) layers.
Abstract: Cities with high population density have a serious problem with traffic congestion. Intelligent transportation systems try to overcome these problems by finding smart ways to detect traffic congestion. One of the essential issues in these systems is selecting the appropriate features to detect traffic congestion. Most of the current methods utilize motion or texture features only, which have their limitations. In this paper, a deep neural network (DNN), which has two input paths, is proposed for traffic congestion recognition. It handles the evolution of motion as well as texture through its two inputs simultaneously via Long Short-Term Memory (LSTM) layers. Gaussian noise layers are used to increase the generalization ability of the DNN and to enable training on small datasets without over-fitting. Experimental results applied to the UCSD and NU videos datasets assert the robustness of the proposed method. It achieves an accuracy of 98 % which is high in comparison to the state-of-the-art methods.

1 citations

Book ChapterDOI
01 Jan 2022

1 citations

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a method to realize fine congestion detection using road surveillance video, where the road area is modeled as a grid to construct three region levels: segment, lane, and cell.
Abstract: Is video congestion detection a visual classification problem? Most studies say yes. They classify road states into 1–3 congestion levels to determine whether congestion occurs. However, in real traffic scenes, congestion is a dynamic process, including generation, evolution, and dissipation. Congestion detection should not only be a classification problem but also the detection of the entire congestion process. This paper proposes a novel method to realize fine congestion detection using road surveillance video. The first step is to analyze road scene and detect vehicle information. The road area is modeled as a grid to construct three region levels: segment, lane, and cell. Vehicle position sequences and speeds are detected by using object detection, multi-object tracking, and sparse optical flow. The second step is to discriminate congestion state. Traffic flow, average speed, and visual impact parameters of each region level are counted. And a three-dimensional model is formed to obtain the short-time congestion state. The third step is to describe the congestion spatiotemporal changes by accumulating multiple short-term congestion states. Experiments are organized from three aspects: whether congestion occurs, continuous congestion degree, and spatiotemporal changes of congestion. The results indicate that the proposed method has higher detection accuracy while reflecting the whole spatiotemporal changes of congestion degree.
Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a method to realize fine congestion detection using road surveillance video, where the road area is modeled as a grid to construct three region levels: segment, lane, and cell.
Abstract: Is video congestion detection a visual classification problem? Most studies say yes. They classify road states into 1–3 congestion levels to determine whether congestion occurs. However, in real traffic scenes, congestion is a dynamic process, including generation, evolution, and dissipation. Congestion detection should not only be a classification problem but also the detection of the entire congestion process. This paper proposes a novel method to realize fine congestion detection using road surveillance video. The first step is to analyze road scene and detect vehicle information. The road area is modeled as a grid to construct three region levels: segment, lane, and cell. Vehicle position sequences and speeds are detected by using object detection, multi-object tracking, and sparse optical flow. The second step is to discriminate congestion state. Traffic flow, average speed, and visual impact parameters of each region level are counted. And a three-dimensional model is formed to obtain the short-time congestion state. The third step is to describe the congestion spatiotemporal changes by accumulating multiple short-term congestion states. Experiments are organized from three aspects: whether congestion occurs, continuous congestion degree, and spatiotemporal changes of congestion. The results indicate that the proposed method has higher detection accuracy while reflecting the whole spatiotemporal changes of congestion degree.
References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Proceedings Article
04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

55,235 citations


"Reliable and Rapid Traffic Congesti..." refers background or methods in this paper

  • ...For example, VGG19 obtains its best accuracy (94.49...

    [...]

  • ...It should be noted that if the depths of the VGG19, GoogleNet, and inceptionv3 models are increased, accuracy becomes saturated and then decreases....

    [...]

  • ...VGG19 [23] contains a total of 47 layers with several successive convolution layers, and each layer is followed by a rectified linear unit layer....

    [...]

  • ...considered for feature extraction, namely VGG19 [23], GoogleNet [24], inceptionv3 [25], and ResNet101 [26]....

    [...]

  • ...In this study, several pre-trained CNN models were considered for feature extraction, namely VGG19 [23], GoogleNet [24], inceptionv3 [25], and ResNet101 [26]....

    [...]

Proceedings Article
01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

49,914 citations

Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

40,257 citations