scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

A Structured Graph Attention Network for Vehicle Re-Identification

TL;DR: A Structured Graph ATtention network (SGAT) is proposed to fully exploit these relationships and allow the message propagation to update the features of graph nodes and achieve significant improvements over the state-of-the-art methods.
Abstract: Vehicle re-identification aims to identify the same vehicle across different surveillance cameras and plays an important role in public security. Existing approaches mainly focus on exploring informative regions or learning an appropriate distance metric. However, they not only neglect the inherent structured relationship between discriminative regions within an image, but also ignore the extrinsic structured relationship among images. The inherent and extrinsic structured relationships are crucial to learning effective vehicle representation. In this paper, we propose a Structured Graph ATtention network (SGAT) to fully exploit these relationships and allow the message propagation to update the features of graph nodes. SGAT creates two graphs for one probe image. One is an inherent structured graph based on the geometric relationship between the landmarks that can use features of their neighbors to enhance themselves. The other is an extrinsic structured graph guided by the attribute similarity to update image representations. Experimental results on two public vehicle re-identification datasets including VeRi-776 and VehicleID have shown that our proposed method achieves significant improvements over the state-of-the-art methods.
Citations
More filters
Posted Content
TL;DR: In this paper, the authors propose to represent videos as space-time region graphs which capture temporal shape dynamics and functional relationships between humans and objects, and perform reasoning on this graph representation via Graph Convolutional Networks.
Abstract: How do humans recognize the action "opening a book" ? We argue that there are two important cues: modeling temporal shape dynamics and modeling functional relationships between humans and objects. In this paper, we propose to represent videos as space-time region graphs which capture these two important cues. Our graph nodes are defined by the object region proposals from different frames in a long range video. These nodes are connected by two types of relations: (i) similarity relations capturing the long range dependencies between correlated objects and (ii) spatial-temporal relations capturing the interactions between nearby objects. We perform reasoning on this graph representation via Graph Convolutional Networks. We achieve state-of-the-art results on both Charades and Something-Something datasets. Especially for Charades, we obtain a huge 4.4% gain when our model is applied in complex environments.

278 citations

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a novel end-to-end three-branch embedding network (TBE-Net) with feature complementary learning and part-aware ability, which integrates complementary features, global appearance, and local region features into a unified framework for subtle feature learning.
Abstract: Vehicle re-identification (Re-ID) is one of the promising applications in the field of computer vision. Existing vehicle Re-ID methods mainly focus on global appearance features or pre-defined local region features, which have difficulties in handling inter-class similarities and intra-class differences among vehicles in various traffic scenarios. This paper proposes a novel end-to-end three-branch embedding network (TBE-Net) with feature complementary learning and part-aware ability. The proposed TBE-Net integrates complementary features, global appearance, and local region features into a unified framework for subtle feature learning, thereby obtaining more integral and diverse vehicle features to re-identify the vehicle from similar ones. The local region feature branch in the proposed TBE-Net contains an attention module that highlights the major differences among local regions by adaptively assigning large weights to the critical local regions and small weights to insignificant local regions, thereby enhancing the perception sensitivity of the network to subtle discrepancies. The complementary branch in the proposed TBE-Net exploits different pooling operations to obtain more comprehensive structural features and multi-granularity features as a supplement to the global appearance and local region features. The abundant features help accommodate the ever-changing critical local regions in vehicles’ images due to the sensors’ settings, such as the position and shooting angle of surveillance cameras. The extensive experiments on VehicleID and VeRi-776 datasets show that the proposed TBE-Net outperforms the state-of-the-art methods.

21 citations

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a vehicle attribute transformer (VAT) for vehicle re-identification, which considers color and model as the most intuitive attributes of the vehicle, the vehicle colour and model are relatively stable and easy to distinguish.
Abstract: With the continuous development of intelligent transportation systems, vehicle-related fields have emerged a research boom in detection, tracking, and retrieval. Vehicle re-identification aims to judge whether a specific vehicle appears in a video stream, which is a popular research direction. Previous researches have proven that the transformer is an efficient method in computer vision, which treats a visual image as a series of patch sequences. However, an efficient vehicle re-identification should consider the image feature and the attribute feature simultaneously. In this work, we propose a vehicle attribute transformer (VAT) for vehicle re-identification. First, we consider color and model as the most intuitive attributes of the vehicle, the vehicle color and model are relatively stable and easy to distinguish. Therefore, the color feature and the model feature are embedded in a transformer. Second, we consider that the shooting angle of each image may be different, so we encode the viewpoint of the vehicle image as another additional attribute. Besides, different attributes are supposed to have different importance. Based on this, we design a multi-attribute adaptive aggregation network, which can compare different attributes and assign different weights to the corresponding features. Finally, to optimize the proposed transformer network, we design a multi-sample dispersion triplet (MDT) loss. Not only the hardest samples based on hard mining strategy, but also some extra positive samples and negative samples are considered in this loss. The dispersion of multi-sample is utilized to dynamically adjust the loss, which can guide the network to learn more optimized division for feature space. Extensive experiments on popular vehicle re-identification datasets verify that the proposed method can achieve state-of-the-art performance.

19 citations

Journal ArticleDOI
01 Sep 2022
TL;DR: A novel end-to-end three-branch embedding network (TBE-Net) with feature complementary learning and part-aware ability that outperforms the state-of-the-art methods on VehicleID and VeRi-776 datasets.
Abstract: Vehicle re-identification (Re-ID) is one of the promising applications in the field of computer vision. Existing vehicle Re-ID methods mainly focus on global appearance features or pre-defined local region features, which have difficulties in handling inter-class similarities and intra-class differences among vehicles in various traffic scenarios. This paper proposes a novel end-to-end three-branch embedding network (TBE-Net) with feature complementary learning and part-aware ability. The proposed TBE-Net integrates complementary features, global appearance, and local region features into a unified framework for subtle feature learning, thereby obtaining more integral and diverse vehicle features to re-identify the vehicle from similar ones. The local region feature branch in the proposed TBE-Net contains an attention module that highlights the major differences among local regions by adaptively assigning large weights to the critical local regions and small weights to insignificant local regions, thereby enhancing the perception sensitivity of the network to subtle discrepancies. The complementary branch in the proposed TBE-Net exploits different pooling operations to obtain more comprehensive structural features and multi-granularity features as a supplement to the global appearance and local region features. The abundant features help accommodate the ever-changing critical local regions in vehicles’ images due to the sensors’ settings, such as the position and shooting angle of surveillance cameras. The extensive experiments on VehicleID and VeRi-776 datasets show that the proposed TBE-Net outperforms the state-of-the-art methods.

13 citations

Proceedings ArticleDOI
12 Oct 2020
TL;DR: This paper proposes part perspective transformation module (PPT) to map the different parts of vehicle into a unified perspective respectively and proposes a dynamically batch hard triplet loss to select the common visible regions of the compared vehicles.
Abstract: Given a query image, vehicle Re-Identification is to search the same vehicle in multi-camera scenarios, which are attracting much attention in recent years. However, vehicle ReID severely suffers from the perspective variation problem. For different vehicles with similar color and type which are taken from different perspectives, all visual patterns are misaligned and warped, which is hard for the model to find out the exact discriminative regions. In this paper, we propose part perspective transformation module (PPT) to map the different parts of vehicle into a unified perspective respectively. The PPT disentangles the vehicle features of different perspectives and then aligns them in a fine-grained level. Further, we propose a dynamically batch hard triplet loss to select the common visible regions of the compared vehicles. Our approach helps the model to generate the perspective invariant features and find out the exact distinguishable regions for vehicle ReID. Extensive experiments on three standard vehicle ReID datasets show the effectiveness of our method.

11 citations


Cites background from "A Structured Graph Attention Networ..."

  • ...It has attracted much attention recently as it serves as an important role in the field of the intelligent transportation systems and smart city [3, 4, 13, 14, 16, 17, 19, 20, 41]....

    [...]

References
More filters
Proceedings Article
01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

49,914 citations

Posted Content
TL;DR: A scalable approach for semi-supervised learning on graph-structured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs which outperforms related methods by a significant margin.
Abstract: We present a scalable approach for semi-supervised learning on graph-structured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs. We motivate the choice of our convolutional architecture via a localized first-order approximation of spectral graph convolutions. Our model scales linearly in the number of graph edges and learns hidden layer representations that encode both local graph structure and features of nodes. In a number of experiments on citation networks and on a knowledge graph dataset we demonstrate that our approach outperforms related methods by a significant margin.

15,696 citations


"A Structured Graph Attention Networ..." refers background in this paper

  • ...Recently, Graph Neural Network [2, 3, 7, 12] has drawn increasing attention in machine learning and multimedia community because it can generalize neural networks for structured data and propagate messages between different nodes in a topology structured graph....

    [...]

  • ...Graph Neural Network....

    [...]

28 Oct 2017
TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.
Abstract: In this article, we describe an automatic differentiation module of PyTorch — a library designed to enable rapid research on machine learning models. It builds upon a few projects, most notably Lua Torch, Chainer, and HIPS Autograd [4], and provides a high performance environment with easy access to automatic differentiation of models executed on different devices (CPU and GPU). To make prototyping easier, PyTorch does not follow the symbolic approach used in many other deep learning frameworks, but focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead. Note that this preprint is a draft of certain sections from an upcoming paper covering all PyTorch features.

13,268 citations


"A Structured Graph Attention Networ..." refers methods in this paper

  • ...Our proposed method is implemented on the Pytorch [30] platform and trained with four NVIDIA GTX 1080ti GPUs....

    [...]

Posted Content
TL;DR: In this article, a spectral graph theory formulation of convolutional neural networks (CNNs) was proposed to learn local, stationary, and compositional features on graphs, and the proposed technique offers the same linear computational complexity and constant learning complexity as classical CNNs while being universal to any graph structure.
Abstract: In this work, we are interested in generalizing convolutional neural networks (CNNs) from low-dimensional regular grids, where image, video and speech are represented, to high-dimensional irregular domains, such as social networks, brain connectomes or words' embedding, represented by graphs. We present a formulation of CNNs in the context of spectral graph theory, which provides the necessary mathematical background and efficient numerical schemes to design fast localized convolutional filters on graphs. Importantly, the proposed technique offers the same linear computational complexity and constant learning complexity as classical CNNs, while being universal to any graph structure. Experiments on MNIST and 20NEWS demonstrate the ability of this novel deep learning system to learn local, stationary, and compositional features on graphs.

4,562 citations

Journal ArticleDOI
TL;DR: This work proposes a new neural network module suitable for CNN-based high-level tasks on point clouds, including classification and segmentation called EdgeConv, which acts on graphs dynamically computed in each layer of the network.
Abstract: Point clouds provide a flexible geometric representation suitable for countless applications in computer graphics; they also comprise the raw output of most 3D data acquisition devices. While hand-designed features on point clouds have long been proposed in graphics and vision, however, the recent overwhelming success of convolutional neural networks (CNNs) for image analysis suggests the value of adapting insight from CNN to the point cloud world. Point clouds inherently lack topological information, so designing a model to recover topology can enrich the representation power of point clouds. To this end, we propose a new neural network module dubbed EdgeConv suitable for CNN-based high-level tasks on point clouds, including classification and segmentation. EdgeConv acts on graphs dynamically computed in each layer of the network. It is differentiable and can be plugged into existing architectures. Compared to existing modules operating in extrinsic space or treating each point independently, EdgeConv has several appealing properties: It incorporates local neighborhood information; it can be stacked applied to learn global shape properties; and in multi-layer systems affinity in feature space captures semantic characteristics over potentially long distances in the original embedding. We show the performance of our model on standard benchmarks, including ModelNet40, ShapeNetPart, and S3DIS.

3,727 citations


"A Structured Graph Attention Networ..." refers background in this paper

  • ...It has been adopted successfully in many multimedia tasks, such as image classification [26], visual question answering [29], graph classification [13, 47], object tracking [5], point clouds processing [37], action recognition [36] and person search [44] etc....

    [...]

Trending Questions (1)
Does vehicle number change after re registration?

Experimental results on two public vehicle re-identification datasets including VeRi-776 and VehicleID have shown that our proposed method achieves significant improvements over the state-of-the-art methods.