scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

DeepLane: camera-assisted GPS for driving lane detection

TL;DR: DeepLane is a system that leverages the back camera of a windshield-mounted smartphone to provide an accurate estimate of the vehicle's current lane and is implemented as an Android-app that runs at 5 fps on CPU and upto 15 fps on smart-phone's GPU and can also assist existing navigation applications with lane-level information.
Abstract: Current smartphone-based navigation applications fail to provide lane-level information due to poor GPS accuracy. Detecting and tracking a vehicle's lane position on the road assists in lane-level navigation. For instance, it would be important to know whether a vehicle is in the correct lane for safely making a turn, perhaps even alerting the driver in advance if it is not, or whether the vehicle's speed is compliant with a lane-specific speed limit. Recent efforts have used road network information and inertial sensors to estimate lane position. While inertial sensors can detect lane shifts over short windows, it would suffer from error accumulation over time. In this paper we present DeepLane, a system that leverages the back camera of a windshield-mounted smartphone to provide an accurate estimate of the vehicle's current lane. We employ a deep learning based technique to classify the vehicle's lane position. DeepLane does not depend on any infrastructure support such as lane markings and works even when there are no lane markings, a characteristic of many roads in developing regions.We perform extensive evaluation of DeepLane on real world datasets collected in developed and developing regions. DeepLane can detect vehicle's lane position with an accuracy of over 90% in both day and night conditions. We have implemented DeepLane as an Android-app that runs at 5 fps on CPU and upto 15 fps on smart-phone's GPU and can also assist existing navigation applications with lane-level information.
Citations
More filters
Proceedings ArticleDOI
15 Oct 2018
TL;DR: The objective of HAMS is to provide ADAS-like functionality with low-cost devices that can be retrofitted onto the large installed base of vehicles that lack specialized and expensive sensors such as LIDAR and RADAR.
Abstract: Road safety is a major public health issue the world over. Many studies have found that the primary factors responsible for road accidents center on the driver and her/his driving. Hence, there is the need to monitor driver's state and her/his driving, with a view to providing effective feedback. Our proposed demo is of HAMS, a windshield-mounted, smartphone-based system that uses the front camera to monitor the driver and back camera to monitor her/his driving behaviour. The objective of HAMS is to provide ADAS-like functionality with low-cost devices that can be retrofitted onto the large installed base of vehicles that lack specialized and expensive sensors such as LIDAR and RADAR. Our demo would show HAMS in action on an Android smartphone to monitor the state of the driver, specifically such as drowsiness, distraction and gaze, and vehicle ranging, lane detection running on pre-recorded videos from drives.

24 citations


Cites methods from "DeepLane: camera-assisted GPS for d..."

  • ...We have built a three-way lane classifier to detect whether the vehicle is in the left, right, or middle lane [4]....

    [...]

Journal ArticleDOI
04 Sep 2020
TL;DR: InSight is presented, a windshield-mounted smartphone-based system that can be retrofitted to the vehicle to monitor the state of the driver, specifically driver fatigue ( based on frequent yawning and eye closure) and driver distraction (based on their direction of gaze).
Abstract: Road safety is a major public health issue across the globe and over two-thirds of the road accidents occur at nighttime under low-light conditions or darkness. The state of the driver and her/his actions are the key factors impacting road safety. How can we monitor these in a cost-effective manner and in low-light conditions? RGB cameras present in smartphones perform poorly in low-lighting conditions due to lack of information captured. Hence, existing monitoring solutions rely upon specialized hardware such as infrared cameras or thermal cameras in low-light conditions, but are limited to only high-end vehicles owing to the cost of the hardware. We present InSight, a windshield-mounted smartphone-based system that can be retrofitted to the vehicle to monitor the state of the driver, specifically driver fatigue (based on frequent yawning and eye closure) and driver distraction (based on their direction of gaze). Challenges arise from designing an accurate, yet low-cost and non-intrusive system to continuously monitor the state of the driver. In this paper, we present two novel and practical approaches for continuous driver monitoring in low-light conditions: (i) Image synthesis: enabling monitoring in low-light conditions using just the smartphone RGB camera by synthesizing a thermal image from RGB with a Generative Adversarial Network, and (ii) Near-IR LED: using a low-cost near-IR (NIR) LED attachment to the smartphone, where the NIR LED acts as a light source to illuminate the driver's face, which is not visible to the human eyes, but can be captured by standard smartphone cameras without any specialized hardware. We show that the proposed techniques can capture the driver's face accurately in low-lighting conditions to monitor driver's state. Further, since NIR and thermal imagery is significantly different than RGB images, we present a systematic approach to generate labelled data, which is used to train existing computer vision models. We present an extensive evaluation of both the approaches with data collected from 15 drivers in controlled basement area and on real roads in low-light conditions. The proposed NIR LED setup has an accuracy (Fl-score) of 85% and 93.8% in detecting driver fatigue and distraction, respectively in low-light.

20 citations


Cites methods from "DeepLane: camera-assisted GPS for d..."

  • ...Finally, we employ the techniques proposed in the literature to make use of the low-end GPU to run the GAN network [18, 46, 55]....

    [...]

Proceedings ArticleDOI
10 Nov 2019
TL;DR: This paper presents ALT, a low-cost smartphone-based system for automating key aspects of the driver license test, and proposes a hybrid visual SLAM technique that combines visual features and a sparse set of planar markers, placed optimally in the environment, to derive accurate trajectory information.
Abstract: Can a smartphone administer a driver license test? We ask this question because of the inadequacy of manual testing and the expense of outfitting an automated testing track with sensors such as cameras, leading to less-than-thorough testing and ultimately compromising road safety. We present ALT, a low-cost smartphone-based system for automating key aspects of the driver license test. A windshield-mounted smartphone serves as the sole sensing platform, with the front camera being used to monitor driver's gaze, and the rear camera, together with inertial sensors, being used to evaluate driving maneuvers such as parallel parking. The sensors are also used in tandem, for instance, to check that the driver scanned their mirror during a lane change. The key challenges in ALT arise from the variation in the subject (driver) and the environment (vehicle geometry, camera orientation, etc.), little or no infrastructure support to keep costs low, and also the limitations of the smartphone (low-end GPU). The main contributions of this paper are: (a) robust detection of driver's gaze by combining head pose and eye gaze information, and performing auto-calibration to accommodate environmental variation, (b) a hybrid visual SLAM technique that combines visual features and a sparse set of planar markers, placed optimally in the environment, to derive accurate trajectory information, and (c) an efficient realization on smartphones using both CPU and GPU resources. We perform extensive experiments, both in controlled settings and on an actual driving test track, to validate the efficacy of ALT.

13 citations


Cites background from "DeepLane: camera-assisted GPS for d..."

  • ...[18, 20, 30, 42, 50], however, our focus is on drivermonitoring (i....

    [...]

Journal ArticleDOI
Wonik Seo1, Sanghoon Cha2, Yeonjae Kim1, Jaehyuk Huh1, Jongse Park1 
TL;DR: With the proliferation of applications with machine learning (ML), the importance of edge platforms has been growing to process streaming sensor, data locally without resorting to remote servers.
Abstract: With the proliferation of applications with machine learning (ML), the importance of edge platforms has been growing to process streaming sensor, data locally without resorting to remote servers. Such edge platforms are commonly equipped with heterogeneous computing processors such as GPU, DSP, and other accelerators, but their computational and energy budget are severely constrained compared to the data center servers. However, as an edge platform must perform the processing of multiple machine learning models concurrently for multimodal sensor data, its scheduling problem poses a new challenge to map heterogeneous machine learning computation to heterogeneous computing processors. Furthermore, processing of each input must provide a certain level of bounded response latency, making the scheduling decision critical for the edge platform. This article proposes a set of new heterogeneity-aware ML inference scheduling policies for edge platforms. Based on the regularity of computation in common ML tasks, the scheduler uses the pre-profiled behavior of each ML model and routes requests to the most appropriate processors. It also aims to satisfy the service-level objective (SLO) requirement while reducing the energy consumption for each request. For such SLO supports, the challenge of ML computation on GPUs and DSP is its inflexible preemption capability. To avoid the delay caused by a long task, the proposed scheduler decomposes a large ML task to sub-tasks by its layer in the DNN model.

8 citations

Proceedings ArticleDOI
01 Dec 2020
TL;DR: In this paper, the authors proposed an Advanced black-box Adversarial Attack (A3) for deep driving maneuver classification models, which uses a binary partition technique to reduce the perturbation search space.
Abstract: Connected and autonomous vehicles (CAV) have been introduced to increase roadway safety and traffic flow efficiency. In CAV scenarios, an autonomous vehicle shares its current and near-future driving maneuvers in terms of different driving signals (e.g., speed, brake pedal pressure) with its nearby vehicles using wireless communication technologies. Deep neural network (DNN) models are usually used to process the driving maneuver time-series data over other machine learning algorithms due to the high prediction accuracy of DNN models. In this scenario, an attacker can send false driving maneuver signals to fool the DNN model to misclassify an input. The existing black-box adversarial attacks (which are for image datasets) require many queries to the DNN model to check if a generated attack will be successful (hence long time) or high amount of perturbation (low imperceptibility), and thus cannot be applied to the time-sensitive CAV scenarios featured by multi-dimensional time series driving data. In this paper, we present an Advanced black-box Adversarial Attack $({\mathrm {A}}^{3})$ for the deep driving maneuver classification models. We first formulate an optimization problem for the attack generation with continuous search space to reduce the search time. To solve the optimization problem, A3 innovatively combines the binary search and optimization algorithm to improve the time-efficiency of searching the optimal solution. It first uses a binary partition technique to reduce the perturbation search space in solving the problem to improve time-efficiency. It then uses the zeroth-order stochastic gradient descent approach, which is featured by searching a solution faster for high-dimensional datasets, thus further improving time-efficiency. We evaluate the proposed A3 attack in terms of different metrics using two real driving datasets. The experimental results show that the A3 attack requires up to 84.12% fewer queries and 57.67% less perturbation with 94.87% higher success rates than the existing black-box adversarial attacks.

6 citations

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Proceedings Article
03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

73,978 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

40,257 citations

Journal ArticleDOI
TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0%, respectively, which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully connected layers we employed a recently developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

33,301 citations


"DeepLane: camera-assisted GPS for d..." refers background or methods in this paper

  • ...There exists several DNN models such as AlexNet [24], VGG [10], ResNet [19] and Inception [31], that are trained on the ImageNet dataset for classification....

    [...]

  • ...AlexNet [24]: AlexNet has 5 convolutional layers followed with 3 fully connected layers and a final softmax layer for classification as shown in Figure 5(a)....

    [...]

  • ...For instance, various DNN architectures such as AlexNet [24] and VGG [10] have been designed and trained on datasets such as ImageNet [5], which consists of 1....

    [...]

  • ...Deep Neural Network (DNN) based techniques [10, 13, 24] have shown a lot of promise for identifying and classifying objects in a scene by learning relevant features, without the need for feature engineering by hand....

    [...]

Journal ArticleDOI
TL;DR: Quantitative assessments show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared to other architectures, including FCN and DeconvNet.
Abstract: We present a novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation termed SegNet. This core trainable segmentation engine consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer. The architecture of the encoder network is topologically identical to the 13 convolutional layers in the VGG16 network [1] . The role of the decoder network is to map the low resolution encoder feature maps to full input resolution feature maps for pixel-wise classification. The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature map(s). Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling. This eliminates the need for learning to upsample. The upsampled maps are sparse and are then convolved with trainable filters to produce dense feature maps. We compare our proposed architecture with the widely adopted FCN [2] and also with the well known DeepLab-LargeFOV [3] , DeconvNet [4] architectures. This comparison reveals the memory versus accuracy trade-off involved in achieving good segmentation performance. SegNet was primarily motivated by scene understanding applications. Hence, it is designed to be efficient both in terms of memory and computational time during inference. It is also significantly smaller in the number of trainable parameters than other competing architectures and can be trained end-to-end using stochastic gradient descent. We also performed a controlled benchmark of SegNet and other architectures on both road scenes and SUN RGB-D indoor scene segmentation tasks. These quantitative assessments show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared to other architectures. We also provide a Caffe implementation of SegNet and a web demo at http://mi.eng.cam.ac.uk/projects/segnet/ .

13,468 citations