scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Deep Quadruplet Appearance Learning for Vehicle Re-Identification

09 Jul 2019-IEEE Transactions on Vehicular Technology (Institute of Electrical and Electronics Engineers (IEEE))-Vol. 68, Iss: 9, pp 8512-8522
TL;DR: Extensive experiments conducted on two commonly used datasets VeRi-776 and VehicleID have demonstrated that the proposed DQAL approach outperforms multiple recently reported vehicle Re-ID methods.
Abstract: Vehicle re-identification (Re-ID) plays an important role in intelligent transportation systems It usually suffers from various challenges encountered on the real-life environments, such as viewpoint variations, illumination changes, object occlusions, and other complicated scenarios To effectively improve the vehicle Re-ID performance, a new method, called the deep quadruplet appearance learning (DQAL), is proposed in this paper The novelty of the proposed DQAL lies on the consideration of the special difficulty in vehicle Re-ID that the vehicles with the same model and color but different identities (IDs) are highly similar to each other For that, the proposed DQAL designs the concept of quadruplet and forms the quadruplets as the input, where each quadruplet is composed of the anchor (or target), positive, negative, and the specially considered high-similar (ie, the same model and color but different IDs with respect to the anchor) vehicle samples Then, the quadruplet network with the incorporation of the proposed quadruplet loss and softmax loss is developed to learn a more discriminative feature for vehicle Re-ID, especially discerning those difficult high-similar cases Extensive experiments conducted on two commonly used datasets VeRi-776 and VehicleID have demonstrated that the proposed DQAL approach outperforms multiple recently reported vehicle Re-ID methods
Citations
More filters
Journal ArticleDOI
TL;DR: Li et al. as discussed by the authors presented an efficient and layout-independent Automatic License Plate Recognition (ALPR) system based on the state-of-the-art YOLO object detector that contains a unified approach for license plate (LP) detection and layout classification to improve the recognition results using post-processing rules.
Abstract: This paper presents an efficient and layout-independent Automatic License Plate Recognition (ALPR) system based on the state-of-the-art YOLO object detector that contains a unified approach for license plate (LP) detection and layout classification to improve the recognition results using post-processing rules. The system is conceived by evaluating and optimizing different models, aiming at achieving the best speed/accuracy trade-off at each stage. The networks are trained using images from several datasets, with the addition of various data augmentation techniques, so that they are robust under different conditions. The proposed system achieved an average end-to-end recognition rate of 96.9% across eight public datasets (from five different regions) used in the experiments, outperforming both previous works and commercial systems in the ChineseLP, OpenALPR-EU, SSIG-SegPlate and UFPR-ALPR datasets. In the other datasets, the proposed approach achieved competitive results to those attained by the baselines. Our system also achieved impressive frames per second (FPS) rates on a high-end GPU, being able to perform in real time even when there are four vehicles in the scene. An additional contribution is that we manually labeled 38,351 bounding boxes on 6,239 images from public datasets and made the annotations publicly available to the research community.

96 citations

Journal ArticleDOI
TL;DR: Extensive experiments on three large scale vehicle databases demonstrate that the proposed SGN is superior to state-of-the-art vehicle re-identification approaches and a novel pyramidal graph network is designed to comprehensively explore the spatial significance of feature maps at multiple scales.
Abstract: Existing vehicle re-identification methods commonly use spatial pooling operations to aggregate feature maps extracted via off-the-shelf backbone networks, such as visual geometry group network (VGGNet), Google network (GoogLeNet) and residual network (ResNet). They ignore exploring the spatial significance of feature maps, eventually degrading the vehicle re-identification performance. In this paper, firstly, an innovative spatial graph network (SGN) is proposed to elaborately explore the spatial significance of feature maps. The SGN stacks multiple spatial graphs (SGs). Each SG assigns feature map's elements as nodes and utilizes spatial neighborhood relationships to determine edges among nodes. During the SGN's propagation, each node and its spatial neighbors on an SG are aggregated to the next SG. On the next SG, each aggregated node is re-weighted with a learnable parameter to find the significance at the corresponding location. Secondly, a novel pyramidal graph network (PGN) is designed to comprehensively explore the spatial significance of feature maps at multiple scales. The PGN organizes multiple SGNs in a pyramidal manner and makes each SGN handles feature maps of a specific scale. Finally, a hybrid pyramidal graph network (HPGN) is developed by embedding the PGN behind a ResNet-50 based backbone network. Extensive experiments on three large scale vehicle databases (i.e., VeRi776, VehicleID, and VeRi-Wild) demonstrate that the proposed HPGN is superior to state-of-the-art vehicle re-identification approaches in terms of accuracy, parameter cost, and computation cost. In addition, experiments show that the proposed PGN is universal to various backbone networks.

46 citations

Journal ArticleDOI
TL;DR: This survey gives a comprehensive review of the current five types of deep learning-based methods for vehicle re-identification, and compares them from characteristics, advantages, and disadvantages.
Abstract: Vehicle re-identification is one of the core technologies of intelligent transportation systems, and it is crucial for the construction of smart cities. With the rapid development of deep learning, vehicle re-identification technologies have made significant progress in recent years. Therefore, making a comprehensive survey about the vehicle re-identification methods based on deep learning is quite indispensable. There are mainly five types of deep learning-based methods designed for vehicle re-identification, i.e. methods based on local features, methods based on representation learning, methods based on metric learning, methods based on unsupervised learning, and methods based on attention mechanism. The major contributions of our survey come from three aspects. First, we give a comprehensive review of the current five types of deep learning-based methods for vehicle re-identification, and we further compare them from characteristics, advantages, and disadvantages. Second, we sort out vehicle public datasets and compare them from multiple dimensions. Third, we further discuss the challenges and possible research directions of vehicle re-identification in the future based on our survey.

39 citations


Cites background from "Deep Quadruplet Appearance Learning..."

  • ...[123] proposed a deep quadruplet appearance learning (DQAL), which lied on the consideration of the special difficulty in vehicle re-identification that the vehicles with the same model and color but different IDs are highly similar to each other, each quadruplet...

    [...]

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a hybrid pyramidal graph network (HPGN) to explore the spatial significance of feature maps at multiple scales, and the proposed HPGN is superior to state-of-the-art vehicle re-ID approaches in terms of accuracy, parameter cost, and computation cost.
Abstract: Existing vehicle re-identification methods commonly use spatial pooling operations to aggregate feature maps extracted via off-the-shelf backbone networks, such as visual geometry group network (VGGNet), Google network (GoogLeNet) and residual network (ResNet). They ignore exploring the spatial significance of feature maps, eventually degrading the vehicle re-identification performance. In this paper, firstly, an innovative spatial graph network (SGN) is proposed to elaborately explore the spatial significance of feature maps. The SGN stacks multiple spatial graphs (SGs). Each SG assigns feature map’s elements as nodes and utilizes spatial neighborhood relationships to determine edges among nodes. During the SGN’s propagation, each node and its spatial neighbors on an SG are aggregated to the next SG. On the next SG, each aggregated node is re-weighted with a learnable parameter to find the significance at the corresponding location. Secondly, a novel pyramidal graph network (PGN) is designed to comprehensively explore the spatial significance of feature maps at multiple scales. The PGN organizes multiple SGNs in a pyramidal manner and makes each SGN handles feature maps of a specific scale. Finally, a hybrid pyramidal graph network (HPGN) is developed by embedding the PGN behind a ResNet-50 based backbone network. Extensive experiments on three large scale vehicle databases (i.e., VeRi776, VehicleID, and VeRi-Wild) demonstrate that the proposed HPGN is superior to state-of-the-art vehicle re-identification approaches in terms of accuracy, parameter cost, and computation cost. In addition, experiments show that the proposed PGN is universal to various backbone networks.

15 citations

Posted Content
TL;DR: A comprehensive survey of recent achievements in scene classification using deep learning covering different aspects of scene classification, including challenges, benchmark datasets, taxonomy, and quantitative performance comparisons of the reviewed methods is provided.
Abstract: Scene classification, aiming at classifying a scene image to one of the predefined scene categories by comprehending the entire image, is a longstanding, fundamental and challenging problem in computer vision. The rise of large-scale datasets, which constitute the corresponding dense sampling of diverse real-world scenes, and the renaissance of deep learning techniques, which learn powerful feature representations directly from big raw data, have been bringing remarkable progress in the field of scene representation and classification. To help researchers master needed advances in this field, the goal of this paper is to provide a comprehensive survey of recent achievements in scene classification using deep learning. More than 200 major publications are included in this survey covering different aspects of scene classification, including challenges, benchmark datasets, taxonomy, and quantitative performance comparisons of the reviewed methods. In retrospect of what has been achieved so far, this paper is also concluded with a list of promising research opportunities.

13 citations


Cites background from "Deep Quadruplet Appearance Learning..."

  • ...As a longstanding, fundamental and challenging problem in computer vision, scene classification has been an active area of research for several decades, and has a wide range of applications, such as content based image retrieval [5], [6], robot navigation [7], [8], intelligent video surveillance [9], [10], augmented reality [11], [12], and disaster detection applications [13] (e....

    [...]

References
More filters
Proceedings Article
04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

55,235 citations


"Deep Quadruplet Appearance Learning..." refers methods in this paper

  • ...layers [42] is employed as the base network, which can also be other commonly-used deep convolutional network....

    [...]

  • ...2, namely, VGG-ILSVRC-16-layers [42], is pre-trained with vehicle classification task by following the traditional fine-tuning strategy (trained on ILSVRC-2012 dataset [44])....

    [...]

Proceedings Article
01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

49,914 citations

Journal ArticleDOI
TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Abstract: The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the 5 years of the challenge, and propose future directions and improvements.

30,811 citations

Journal Article
TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Abstract: We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large datasets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of datasets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are significantly better than those produced by the other techniques on almost all of the datasets.

30,124 citations


"Deep Quadruplet Appearance Learning..." refers background in this paper

  • ...4 shows the feature distributions by t-SNE [45] of the triplet loss and the proposed DQAL....

    [...]

Posted Content
TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Abstract: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a day on a single K40 or Titan GPU ($\approx$ 2.5 ms per image). By separating model representation from actual implementation, Caffe allows experimentation and seamless switching among platforms for ease of development and deployment from prototyping machines to cloud environments. Caffe is maintained and developed by the Berkeley Vision and Learning Center (BVLC) with the help of an active community of contributors on GitHub. It powers ongoing research projects, large-scale industrial applications, and startup prototypes in vision, speech, and multimedia.

12,531 citations

Trending Questions (1)
Does vehicle number change after re registration?

Extensive experiments conducted on two commonly used datasets VeRi-776 and VehicleID have demonstrated that the proposed DQAL approach outperforms multiple recently reported vehicle Re-ID methods.