scispace - formally typeset
Search or ask a question
Author

Jianming Zhang

Bio: Jianming Zhang is an academic researcher from Changsha University of Science and Technology. The author has contributed to research in topics: Convolutional neural network & Video tracking. The author has an hindex of 16, co-authored 25 publications receiving 999 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: A cascaded R-CNN to obtain the multiscale features in pyramids to solve the undetection and false detection of traffic sign detection and the data augment method expands the German traffic sign training dataset by simulation of complex environment changes.
Abstract: In recent years, the deep learning is applied to the field of traffic sign detection methods which achieves excellent performance. However, there are two main challenges in traffic sign detection to be solve urgently. For one thing, some traffic signs of small size are more difficult to detect than those of large size so that the small traffic signs are undetected. For another, some false signs are always detected because of interferences caused by the illumination variation, bad weather and some signs similar to the true traffic signs. Therefore, to solve the undetection and false detection, we first propose a cascaded R-CNN to obtain the multiscale features in pyramids. Each layer of the cascaded network except the first layer fuses the output bounding box of the previous one layer for joint training. This method contributes to the traffic sign detection. Then, we propose a multiscale attention method to obtain the weighted multiscale features by dot-product and softmax, which is summed to fine the features to highlight the traffic sign features and improve the accuracy of the traffic sign detection. Finally, we increase the number of difficult negative samples for dataset balance and data augmentation in the training to relieve the interference by complex environment and similar false traffic signs. The data augment method expands the German traffic sign training dataset by simulation of complex environment changes. We conduct numerous experiments to verify the effectiveness of our proposed algorithm. The accuracy and recall rate of our method are 98.7% and 90.5% in GTSDB, 99.7% and 83.62% in CCTSDB and 98.9% and 85.6% in Lisa dataset respectively.

182 citations

Journal ArticleDOI
TL;DR: This paper proposes an end-to-end convolutional network inspired by YOLOv2 to achieve real-time Chinese traffic sign detection and demonstrates that the proposed method is the faster and more robust.
Abstract: Traffic sign detection is an important task in traffic sign recognition systems. Chinese traffic signs have their unique features compared with traffic signs of other countries. Convolutional neural networks (CNNs) have achieved a breakthrough in computer vision tasks and made great success in traffic sign classification. In this paper, we present a Chinese traffic sign detection algorithm based on a deep convolutional network. To achieve real-time Chinese traffic sign detection, we propose an end-to-end convolutional network inspired by YOLOv2. In view of the characteristics of traffic signs, we take the multiple 1 × 1 convolutional layers in intermediate layers of the network and decrease the convolutional layers in top layers to reduce the computational complexity. For effectively detecting small traffic signs, we divide the input images into dense grids to obtain finer feature maps. Moreover, we expand the Chinese traffic sign dataset (CTSD) and improve the marker information, which is available online. All experimental results evaluated according to our expanded CTSD and German Traffic Sign Detection Benchmark (GTSDB) indicate that the proposed method is the faster and more robust. The fastest detection speed achieved was 0.017 s per image.

178 citations

Journal ArticleDOI
TL;DR: Two novel lightweight networks are proposed that can obtain higher recognition precision while preserving less trainable parameters in the models and can be useful when deploying deep convolutional neural networks (CNNs) on mobile embedded devices.
Abstract: Deeper neural networks have achieved great results in the field of computer vision and have been successfully applied to tasks such as traffic sign recognition. However, as traffic sign recognition systems are often deployed in resource-constrained environments, it is critical for the network design to be slim and accurate in these instances. Accordingly, in this paper, we propose two novel lightweight networks that can obtain higher recognition precision while preserving less trainable parameters in the models. Knowledge distillation transfers the knowledge in a trained model, called the teacher network, to a smaller model, called the student network. Moreover, to improve the accuracy of traffic sign recognition, we also implement a new module in our teacher network that combines two streams of feature channels with dense connectivity. To enable easy deployment on mobile devices, our student network is a simple end-to-end architecture containing five convolutional layers and a fully connected layer. Furthermore, by referring to the values of batch normalization (BN) scaling factors towards zero to identify insignificant channels, we prune redundant channels from the student network, yielding a compact model with accuracy comparable to that of more complex models. Our teacher network exhibited an accuracy rate of 93.16% when trained and tested on the CIFAR-10 general dataset. Using the knowledge of our teacher network, we train the student network on the GTSRB and BTSC traffic sign datasets. Thus, our student model uses only 0.8 million parameters while still achieving accuracy of 99.61% and 99.13% respectively on both datasets. All experimental results show that our lightweight networks can be useful when deploying deep convolutional neural networks (CNNs) on mobile embedded devices.

158 citations

Journal ArticleDOI
TL;DR: A novel model updating strategy is presented, and peak sidelobe ratio (PSR) and skewness are exploited to measure the comprehensive fluctuation of response map for efficient tracking performance.
Abstract: Robust and accurate visual tracking is a challenging problem in computer vision. In this paper, we exploit spatial and semantic convolutional features extracted from convolutional neural networks in continuous object tracking. The spatial features retain higher resolution for precise localization and semantic features capture more semantic information and less fine-grained spatial details. Therefore, we localize the target by fusing these different features, which improves the tracking accuracy. Besides, we construct the multi-scale pyramid correlation filter of the target and extract its spatial features. This filter determines the scale level effectively and tackles target scale estimation. Finally, we further present a novel model updating strategy, and exploit peak sidelobe ratio (PSR) and skewness to measure the comprehensive fluctuation of response map for efficient tracking performance. Each contribution above is validated on 50 image sequences of tracking benchmark OTB-2013. The experimental comparison shows that our algorithm performs favorably against 12 state-of-the-art trackers.

157 citations

Journal ArticleDOI
TL;DR: A tracking algorithm based on features extracted by residual network called Resnet features and cascaded correlation filters to improve precision and accuracy is proposed and performs favorably against other state-of-the-art trackers.
Abstract: Significant progress is made in the field of object tracking recently. Especially, trackers based on deep learning and correlation filters both have achieved excellent performance. However, object tracking still faces some challenging problems such as deformation and illumination. In such kinds of situations, the accuracy and precision of tracking algorithms plunge as a result. It is imminent to find a solution to this situation. In this paper, we propose a tracking algorithm based on features extracted by residual network called Resnet features and cascaded correlation filters to improve precision and accuracy. Firstly, features extracted by a deep residual network trained on other image processing datasets, are robust enough and retain higher resolution, therefore, we exploit Resnet-101 pretrained offline to obtain features extracted by middle and high layers for target appearance model representation. Resnet-101 is deeper compared with other deep neural networks which means it contains more semantic information. Then, the method we propose to combine our correlation filters is superior. We propose cascaded correlation filters generated by handcraft, middle-level and high-level features from residual network to gain better competence. Handcraft features localize target precisely because they contain more spatial details while Resnet features are robust to the target appearance change because they retain more semantic information. Finally, we conduct extensive experiments on OTB2013 and OTB2015 benchmark. The experimental results show that our tracker achieves high performance under all kinds of challenges and performs favorably against other state-of-the-art trackers.

142 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: It is shown that the outsourced training introduces new security risks: an adversary can create a maliciously trained network (a backdoored neural network, or a BadNet) that has the state-of-the-art performance on the user's training and validation samples but behaves badly on specific attacker-chosen inputs.
Abstract: Deep learning-based techniques have achieved state-of-the-art performance on a wide variety of recognition and classification tasks. However, these networks are typically computationally expensive to train, requiring weeks of computation on many GPUs; as a result, many users outsource the training procedure to the cloud or rely on pre-trained models that are then fine-tuned for a specific task. In this paper, we show that the outsourced training introduces new security risks: an adversary can create a maliciously trained network (a backdoored neural network, or a BadNet) that has the state-of-the-art performance on the user's training and validation samples but behaves badly on specific attacker-chosen inputs. We first explore the properties of BadNets in a toy example, by creating a backdoored handwritten digit classifier. Next, we demonstrate backdoors in a more realistic scenario by creating a U.S. street sign classifier that identifies stop signs as speed limits when a special sticker is added to the stop sign; we then show in addition that the backdoor in our U.S. street sign detector can persist even if the network is later retrained for another task and cause a drop in an accuracy of 25% on average when the backdoor trigger is present. These results demonstrate that backdoors in neural networks are both powerful and-because the behavior of neural networks is difficult to explicate-stealthy. This paper provides motivation for further research into techniques for verifying and inspecting neural networks, just as we have developed tools for verifying and debugging software.

589 citations

Journal ArticleDOI
TL;DR: A new sentiment analysis model-SLCABG, which is based on the sentiment lexicon and combines Convolutional Neural Network (CNN) and attention-based Bidirectional Gated Recurrent Unit (BiGRU).
Abstract: In recent years, with the rapid development of Internet technology, online shopping has become a mainstream way for users to purchase and consume. Sentiment analysis of a large number of user reviews on e-commerce platforms can effectively improve user satisfaction. This paper proposes a new sentiment analysis model-SLCABG, which is based on the sentiment lexicon and combines Convolutional Neural Network (CNN) and attention-based Bidirectional Gated Recurrent Unit (BiGRU). In terms of methods, the SLCABG model combines the advantages of sentiment lexicon and deep learning technology, and overcomes the shortcomings of existing sentiment analysis model of product reviews. The SLCABG model combines the advantages of the sentiment lexicon and deep learning techniques. First, the sentiment lexicon is used to enhance the sentiment features in the reviews. Then the CNN and the Gated Recurrent Unit (GRU) network are used to extract the main sentiment features and context features in the reviews and use the attention mechanism to weight. And finally classify the weighted sentiment features. In terms of data, this paper crawls and cleans the real book evaluation of dangdang.com, a famous Chinese e-commerce website, for training and testing, all of which are based on Chinese. The scale of the data has reached 100000 orders of magnitude, which can be widely used in the field of Chinese sentiment analysis. The experimental results show that the model can effectively improve the performance of text sentiment analysis.

242 citations

Journal ArticleDOI
TL;DR: Fogs can largely improve the performance of smart city analytics services than cloud only model in terms of job blocking probability and service utility.
Abstract: Analysis of Internet of Things (IoT) sensor data is a key for achieving city smartness. In this paper a multitier fog computing model with large-scale data analytics service is proposed for smart cities applications. The multitier fog is consisted of ad-hoc fogs and dedicated fogs with opportunistic and dedicated computing resources, respectively. The proposed new fog computing model with clear functional modules is able to mitigate the potential problems of dedicated computing infrastructure and slow response in cloud computing. We run analytics benchmark experiments over fogs formed by Rapsberry Pi computers with a distributed computing engine to measure computing performance of various analytics tasks, and create easy-to-use workload models. Quality of services (QoS) aware admission control, offloading, and resource allocation schemes are designed to support data analytics services, and maximize analytics service utilities. Availability and cost models of networking and computing resources are taken into account in QoS scheme design. A scalable system level simulator is developed to evaluate the fog-based analytics service and the QoS management schemes. Experiment results demonstrate the efficiency of analytics services over multitier fogs and the effectiveness of the proposed QoS schemes. Fogs can largely improve the performance of smart city analytics services than cloud only model in terms of job blocking probability and service utility.

219 citations

Journal ArticleDOI
TL;DR: An enhanced GPU based deep learning method to detect ship from the SAR images using the YOLOv2 architecture with less number of layers is proposed and the experimental results shows that the deep learning can make a big leap forward in improving the performance of SAR image ship detection.
Abstract: Synthetic aperture radar (SAR) imagery has been used as a promising data source for monitoring maritime activities, and its application for oil and ship detection has been the focus of many previous research studies. Many object detection methods ranging from traditional to deep learning approaches have been proposed. However, majority of them are computationally intensive and have accuracy problems. The huge volume of the remote sensing data also brings a challenge for real time object detection. To mitigate this problem a high performance computing (HPC) method has been proposed to accelerate SAR imagery analysis, utilizing the GPU based computing methods. In this paper, we propose an enhanced GPU based deep learning method to detect ship from the SAR images. The You Only Look Once version 2 (YOLOv2) deep learning framework is proposed to model the architecture and training the model. YOLOv2 is a state-of-the-art real-time object detection system, which outperforms Faster Region-Based Convolutional Network (Faster R-CNN) and Single Shot Multibox Detector (SSD) methods. Additionally, in order to reduce computational time with relatively competitive detection accuracy, we develop a new architecture with less number of layers called YOLOv2-reduced. In the experiment, we use two types of datasets: A SAR ship detection dataset (SSDD) dataset and a Diversified SAR Ship Detection Dataset (DSSDD). These two datasets were used for training and testing purposes. YOLOv2 test results showed an increase in accuracy of ship detection as well as a noticeable reduction in computational time compared to Faster R-CNN. From the experimental results, the proposed YOLOv2 architecture achieves an accuracy of 90.05% and 89.13% on the SSDD and DSSDD datasets respectively. The proposed YOLOv2-reduced architecture has a similarly competent detection performance as YOLOv2, but with less computational time on a NVIDIA TITAN X GPU. The experimental results shows that the deep learning can make a big leap forward in improving the performance of SAR image ship detection.

217 citations

Journal ArticleDOI
TL;DR: A cascaded R-CNN to obtain the multiscale features in pyramids to solve the undetection and false detection of traffic sign detection and the data augment method expands the German traffic sign training dataset by simulation of complex environment changes.
Abstract: In recent years, the deep learning is applied to the field of traffic sign detection methods which achieves excellent performance. However, there are two main challenges in traffic sign detection to be solve urgently. For one thing, some traffic signs of small size are more difficult to detect than those of large size so that the small traffic signs are undetected. For another, some false signs are always detected because of interferences caused by the illumination variation, bad weather and some signs similar to the true traffic signs. Therefore, to solve the undetection and false detection, we first propose a cascaded R-CNN to obtain the multiscale features in pyramids. Each layer of the cascaded network except the first layer fuses the output bounding box of the previous one layer for joint training. This method contributes to the traffic sign detection. Then, we propose a multiscale attention method to obtain the weighted multiscale features by dot-product and softmax, which is summed to fine the features to highlight the traffic sign features and improve the accuracy of the traffic sign detection. Finally, we increase the number of difficult negative samples for dataset balance and data augmentation in the training to relieve the interference by complex environment and similar false traffic signs. The data augment method expands the German traffic sign training dataset by simulation of complex environment changes. We conduct numerous experiments to verify the effectiveness of our proposed algorithm. The accuracy and recall rate of our method are 98.7% and 90.5% in GTSDB, 99.7% and 83.62% in CCTSDB and 98.9% and 85.6% in Lisa dataset respectively.

182 citations