Deep learning-based techniques have achieved state-of-the-art performance on a wide variety of recognition and classification tasks. However, these networks are typically computationally expensive to train, requiring weeks of computation on many GPUs; as a result, many users outsource the training procedure to the cloud or rely on pre-trained models that are then fine-tuned for a specific task. In this paper, we show that the outsourced training introduces new security risks: an adversary can create a maliciously trained network (a backdoored neural network, or a BadNet) that has the state-of-the-art performance on the user's training and validation samples but behaves badly on specific attacker-chosen inputs. We first explore the properties of BadNets in a toy example, by creating a backdoored handwritten digit classifier. Next, we demonstrate backdoors in a more realistic scenario by creating a U.S. street sign classifier that identifies stop signs as speed limits when a special sticker is added to the stop sign; we then show in addition that the backdoor in our U.S. street sign detector can persist even if the network is later retrained for another task and cause a drop in an accuracy of 25% on average when the backdoor trigger is present. These results demonstrate that backdoors in neural networks are both powerful and-because the behavior of neural networks is difficult to explicate-stealthy. This paper provides motivation for further research into techniques for verifying and inspecting neural networks, just as we have developed tools for verifying and debugging software.

/pdf/badnets-evaluating-backdooring-attacks-on-deep-neural-48tjmo4el5.pdf

BadNets: Evaluating Backdooring Attacks on Deep Neural Networks

Synthetic aperture radar (SAR) imagery has been used as a promising data source for monitoring maritime activities, and its application for oil and ship detection has been the focus of many previous research studies. Many object detection methods ranging from traditional to deep learning approaches have been proposed. However, majority of them are computationally intensive and have accuracy problems. The huge volume of the remote sensing data also brings a challenge for real time object detection. To mitigate this problem a high performance computing (HPC) method has been proposed to accelerate SAR imagery analysis, utilizing the GPU based computing methods. In this paper, we propose an enhanced GPU based deep learning method to detect ship from the SAR images. The You Only Look Once version 2 (YOLOv2) deep learning framework is proposed to model the architecture and training the model. YOLOv2 is a state-of-the-art real-time object detection system, which outperforms Faster Region-Based Convolutional Network (Faster R-CNN) and Single Shot Multibox Detector (SSD) methods. Additionally, in order to reduce computational time with relatively competitive detection accuracy, we develop a new architecture with less number of layers called YOLOv2-reduced. In the experiment, we use two types of datasets: A SAR ship detection dataset (SSDD) dataset and a Diversified SAR Ship Detection Dataset (DSSDD). These two datasets were used for training and testing purposes. YOLOv2 test results showed an increase in accuracy of ship detection as well as a noticeable reduction in computational time compared to Faster R-CNN. From the experimental results, the proposed YOLOv2 architecture achieves an accuracy of 90.05% and 89.13% on the SSDD and DSSDD datasets respectively. The proposed YOLOv2-reduced architecture has a similarly competent detection performance as YOLOv2, but with less computational time on a NVIDIA TITAN X GPU. The experimental results shows that the deep learning can make a big leap forward in improving the performance of SAR image ship detection.

Ship Detection Based on YOLOv2 for SAR Imagery

In recent years, the deep learning is applied to the field of traffic sign detection methods which achieves excellent performance. However, there are two main challenges in traffic sign detection to be solve urgently. For one thing, some traffic signs of small size are more difficult to detect than those of large size so that the small traffic signs are undetected. For another, some false signs are always detected because of interferences caused by the illumination variation, bad weather and some signs similar to the true traffic signs. Therefore, to solve the undetection and false detection, we first propose a cascaded R-CNN to obtain the multiscale features in pyramids. Each layer of the cascaded network except the first layer fuses the output bounding box of the previous one layer for joint training. This method contributes to the traffic sign detection. Then, we propose a multiscale attention method to obtain the weighted multiscale features by dot-product and softmax, which is summed to fine the features to highlight the traffic sign features and improve the accuracy of the traffic sign detection. Finally, we increase the number of difficult negative samples for dataset balance and data augmentation in the training to relieve the interference by complex environment and similar false traffic signs. The data augment method expands the German traffic sign training dataset by simulation of complex environment changes. We conduct numerous experiments to verify the effectiveness of our proposed algorithm. The accuracy and recall rate of our method are 98.7% and 90.5% in GTSDB, 99.7% and 83.62% in CCTSDB and 98.9% and 85.6% in Lisa dataset respectively.

/pdf/a-cascaded-r-cnn-with-multiscale-attention-and-imbalanced-38iudbpw4n.pdf

A Cascaded R-CNN With Multiscale Attention and Imbalanced Samples for Traffic Sign Detection

This article presents a comprehensive survey of deep learning applications for object detection and scene perception in autonomous vehicles. Unlike existing review papers, we examine the theory underlying self-driving vehicles from deep learning perspective and current implementations, followed by their critical evaluations. Deep learning is one potential solution for object detection and scene perception problems, which can enable algorithm-driven and data-driven cars. In this article, we aim to bridge the gap between deep learning and self-driving cars through a comprehensive survey. We begin with an introduction to self-driving cars, deep learning, and computer vision followed by an overview of artificial general intelligence. Then, we classify existing powerful deep learning libraries and their role and significance in the growth of deep learning. Finally, we discuss several techniques that address the image perception issues in real-time driving, and critically evaluate recent implementations and tests conducted on self-driving cars. The findings and practices at various stages are summarized to correlate prevalent and futuristic techniques, and the applicability, scalability and feasibility of deep learning to self-driving cars for achieving safe driving without human intervention. Based on the current survey, several recommendations for further research are discussed at the end of this article.

Deep learning for object detection and scene perception in self-driving cars: Survey, challenges, and open issues

Deeper neural networks have achieved great results in the field of computer vision and have been successfully applied to tasks such as traffic sign recognition. However, as traffic sign recognition systems are often deployed in resource-constrained environments, it is critical for the network design to be slim and accurate in these instances. Accordingly, in this paper, we propose two novel lightweight networks that can obtain higher recognition precision while preserving less trainable parameters in the models. Knowledge distillation transfers the knowledge in a trained model, called the teacher network, to a smaller model, called the student network. Moreover, to improve the accuracy of traffic sign recognition, we also implement a new module in our teacher network that combines two streams of feature channels with dense connectivity. To enable easy deployment on mobile devices, our student network is a simple end-to-end architecture containing five convolutional layers and a fully connected layer. Furthermore, by referring to the values of batch normalization (BN) scaling factors towards zero to identify insignificant channels, we prune redundant channels from the student network, yielding a compact model with accuracy comparable to that of more complex models. Our teacher network exhibited an accuracy rate of 93.16% when trained and tested on the CIFAR-10 general dataset. Using the knowledge of our teacher network, we train the student network on the GTSRB and BTSC traffic sign datasets. Thus, our student model uses only 0.8 million parameters while still achieving accuracy of 99.61% and 99.13% respectively on both datasets. All experimental results show that our lightweight networks can be useful when deploying deep convolutional neural networks (CNNs) on mobile embedded devices.

Lightweight deep network for traffic sign classification

Traffic sign detection is an important task in traffic sign recognition systems. Chinese traffic signs have their unique features compared with traffic signs of other countries. Convolutional neural networks (CNNs) have achieved a breakthrough in computer vision tasks and made great success in traffic sign classification. In this paper, we present a Chinese traffic sign detection algorithm based on a deep convolutional network. To achieve real-time Chinese traffic sign detection, we propose an end-to-end convolutional network inspired by YOLOv2. In view of the characteristics of traffic signs, we take the multiple 1 × 1 convolutional layers in intermediate layers of the network and decrease the convolutional layers in top layers to reduce the computational complexity. For effectively detecting small traffic signs, we divide the input images into dense grids to obtain finer feature maps. Moreover, we expand the Chinese traffic sign dataset (CTSD) and improve the marker information, which is available online. All experimental results evaluated according to our expanded CTSD and German Traffic Sign Detection Benchmark (GTSDB) indicate that the proposed method is the faster and more robust. The fastest detection speed achieved was 0.017 s per image.

https://www.mdpi.com/1999-4893/10/4/127/pdf

A Real-Time Chinese Traffic Sign Detection Algorithm Based on Modified YOLOv2

Robust and accurate visual tracking is a challenging problem in computer vision. In this paper, we exploit spatial and semantic convolutional features extracted from convolutional neural networks in continuous object tracking. The spatial features retain higher resolution for precise localization and semantic features capture more semantic information and less fine-grained spatial details. Therefore, we localize the target by fusing these different features, which improves the tracking accuracy. Besides, we construct the multi-scale pyramid correlation filter of the target and extract its spatial features. This filter determines the scale level effectively and tackles target scale estimation. Finally, we further present a novel model updating strategy, and exploit peak sidelobe ratio (PSR) and skewness to measure the comprehensive fluctuation of response map for efficient tracking performance. Each contribution above is validated on 50 image sequences of tracking benchmark OTB-2013. The experimental comparison shows that our algorithm performs favorably against 12 state-of-the-art trackers.

Spatial and semantic convolutional features for robust visual object tracking

Over these years, object tracking algorithms combined with correlation filters and convolutional features have achieved excellent performance in accuracy and real-time speed. However, tracking failures in some challenging sequences are caused by the insensitivity of deeper convolutional features to target appearance changes and the unreasonable updating of correlation filters. In this paper, we propose dual model learning combined with multiple feature selection for accurate visual tracking. First, we fuse the handcrafted features with the multi-layer features extracted from the convolutional neural network to construct a correlation filter learning model, which can precisely localize the target. Second, we propose an index named hierarchical peak to sidelobe ratio (HPSR). The fluctuation of HPSR determines the activation of an online classifier learning model to redetect the target. Finally, the target locations predicted by the dual learning models mentioned above are combined to obtain the final target position. With the help of dual learning models, the accuracy and performance of tracking have been greatly improved. The results on the OTB-2013 and OTB-2015 datasets show that the proposed algorithm achieves the highest success rate and precision compared with the 12 state-of-the-art tracking algorithms. The proposed method is better adaptive to various challenges in visual object tracking.

/pdf/dual-model-learning-combined-with-multiple-feature-selection-1tth5hqymt.pdf

Dual Model Learning Combined With Multiple Feature Selection for Accurate Visual Tracking

Object tracking is a vital topic in computer vision. Although tracking algorithms have gained great development in recent years, its robustness and accuracy still need to be improved. In this paper, to overcome single feature with poor representation ability in a complex image sequence, we put forward a multifeature integration framework, including the gray features, Histogram of Gradient (HOG), color-naming (CN), and Illumination Invariant Features (IIF), which effectively improve the robustness of object tracking. In addition, we propose a model updating strategy and introduce a skewness to measure the confidence degree of tracking result. Unlike previous tracking algorithms, we judge the relationship of skewness values between two adjacent frames to decide the updating of target appearance model to use a dynamic learning rate. This way makes our tracker further improve the robustness of tracking and effectively prevents the target drifting caused by occlusion and deformation. Extensive experiments on large-scale benchmark containing 50 image sequences show that our tracker is better than most existing excellent trackers in tracking performance and can run at average speed over 43 fps.

/pdf/a-fast-object-tracker-based-on-integrated-multiple-features-qkgyeidcja.pdf

A Fast Object Tracker Based on Integrated Multiple Features and Dynamic Learning Rate

In face recognition, sometimes the number of available training samples for single category is insufficient. Therefore, the performances of models trained by convolutional neural network are not ideal. The small sample face recognition algorithm based on novel Siamese network is proposed in this paper, which doesn’t need rich samples for training. The algorithm designs and realizes a new Siamese network model, SiameseFace1, which uses pairs of face images as inputs and maps them to target space so that the L2 norm distance in target space can represent the semantic distance in input space. The mapping is represented by the neural network in supervised learning. Moreover, a more lightweight Siamese network model, SiameseFace2, is designed to reduce the network parameters without losing accuracy. We also present a new method to generate training data and expand the number of training samples for single category in AR and labeled faces in the wild (LFW) datasets, which improves the recognition accuracy of the models. Four loss functions are adopted to carry out experiments on AR and LFW datasets. The results show that the contrastive loss function combined with new Siamese network model in this paper can effectively improve the accuracy of face recognition.

Xiaokang Jin

Papers

A Real-Time Chinese Traffic Sign Detection Algorithm Based on Modified YOLOv2

Spatial and semantic convolutional features for robust visual object tracking

Dual Model Learning Combined With Multiple Feature Selection for Accurate Visual Tracking

A Fast Object Tracker Based on Integrated Multiple Features and Dynamic Learning Rate

Small Sample Face Recognition Algorithm based on Novel Siamese Network