See the Forest for the Trees: Joint Spatial and Temporal Recurrent Neural Networks for Video-Based Person Re-identification

doi:10.1109/CVPR.2017.717

Proceedings ArticleDOI

See the Forest for the Trees: Joint Spatial and Temporal Recurrent Neural Networks for Video-Based Person Re-identification

Zhen Zhou, +4 more

- pp 6776-6785

Chats0

TLDR

This paper focuses on video-based person re-identification and builds an end-to-end deep neural network architecture to jointly learn features and metrics and integrates the surrounding information at each location by a spatial recurrent model when measuring the similarity with another pedestrian video.

Abstract:

Surveillance cameras have been widely used in different scenes. Accordingly, a demanding need is to recognize a person under different cameras, which is called person re-identification. This topic has gained increasing interests in computer vision recently. However, less attention has been paid to video-based approaches, compared with image-based ones. Two steps are usually involved in previous approaches, namely feature learning and metric learning. But most of the existing approaches only focus on either feature learning or metric learning. Meanwhile, many of them do not take full use of the temporal and spatial information. In this paper, we concentrate on video-based person re-identification and build an end-to-end deep neural network architecture to jointly learn features and metrics. The proposed method can automatically pick out the most discriminative frames in a given video by a temporal attention model. Moreover, it integrates the surrounding information at each location by a spatial recurrent model when measuring the similarity with another pedestrian video. That is, our method handles spatial and temporal information simultaneously in a unified manner. The carefully designed experiments on three public datasets show the effectiveness of each component of the proposed deep network, performing better in comparison with the state-of-the-art methods.

Citations

PDF

Open Access

More filters

Posted Content

Deep Learning for Person Re-identification: A Survey and Outlook

Mang Ye, +5 more

- 13 Jan 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A powerful AGW baseline is designed, achieving state-of-the-art or at least comparable performance on twelve datasets for four different Re-ID tasks, and a new evaluation metric (mINP) is introduced, indicating the cost for finding all the correct matches, which provides an additional criteria to evaluate the Re- ID system for real applications.

...read moreread less

Proceedings ArticleDOI

Mask-Guided Contrastive Attention Model for Person Re-identification

Chunfeng Song, +3 more

TL;DR: This paper introduces the binary segmentation masks to construct synthetic RGB-Mask pairs as inputs, then designs a mask-guided contrastive attention model (MGCAM) to learn features separately from the body and background regions, and proposes a novel region-level triplet loss to restrain the features learnt from different regions.

...read moreread less

Proceedings ArticleDOI

A Pose-Sensitive Embedding for Person Re-identification with Expanded Cross Neighborhood Re-ranking

M. Saquib Sarfraz, +3 more

TL;DR: In this paper, the fine and coarse pose information of the person was incorporated into CNN to learn a discriminative embedding and achieved state-of-the-art performance on a number of challenging surveillance image and video datasets.

...read moreread less

Posted Content

AlignedReID: Surpassing Human-Level Performance in Person Re-Identification.

Xuan Zhang, +8 more

- 22 Nov 2017 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This paper proposes a novel method called AlignedReID that extracts a global feature which is jointly learned with local features, and is the first to surpass human-level performance on Market1501 and CUHK03, two widely used Person ReID datasets.

...read moreread less

Book ChapterDOI

Part-Aligned Bilinear Representations for Person Re-Identification

Yumin Suh, +4 more

TL;DR: A novel network that learns a part-aligned representation for person re-identification that handles the body part misalignment problem, that is, body parts are misaligned across human detections due to pose/viewpoint change and unreliable detection.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Posted Content

Caffe: Convolutional Architecture for Fast Feature Embedding

Yangqing Jia, +7 more

- 20 Jun 2014 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

...read moreread less

Journal ArticleDOI

Object Detection with Discriminatively Trained Part-Based Models

Pedro F. Felzenszwalb, +3 more

- 01 Sep 2010 -

IEEE Transactions on Pattern Analysis an...

TL;DR: An object detection system based on mixtures of multiscale deformable part models that is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges is described.

...read moreread less

Proceedings ArticleDOI

Caffe: Convolutional Architecture for Fast Feature Embedding

Yangqing Jia, +7 more

TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

...read moreread less

Proceedings ArticleDOI

FaceNet: A unified embedding for face recognition and clustering

Florian Schroff, +2 more

TL;DR: A system that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure offace similarity, and achieves state-of-the-art face recognition performance using only 128-bytes perface.

...read moreread less

Collapse

arXiv: Computer Vision and Pattern Recog...

Scalable Person Re-identification: A Benchmark

Liang Zheng, +6 more

Person re-identification by descriptive and discriminative classification

Martin Hirzer, +3 more

See the Forest for the Trees: Joint Spatial and Temporal Recurrent Neural Networks for Video-Based Person Re-identification

Citations

Deep Learning for Person Re-identification: A Survey and Outlook

Mask-Guided Contrastive Attention Model for Person Re-identification

A Pose-Sensitive Embedding for Person Re-identification with Expanded Cross Neighborhood Re-ranking

AlignedReID: Surpassing Human-Level Performance in Person Re-Identification.

Part-Aligned Bilinear Representations for Person Re-Identification

References

ImageNet Classification with Deep Convolutional Neural Networks

Caffe: Convolutional Architecture for Fast Feature Embedding

Object Detection with Discriminatively Trained Part-Based Models

Caffe: Convolutional Architecture for Fast Feature Embedding

FaceNet: A unified embedding for face recognition and clustering

Related Papers (5)

Deep Residual Learning for Image Recognition

Person re-identification by Local Maximal Occurrence representation and metric learning

In Defense of the Triplet Loss for Person Re-Identification

Scalable Person Re-identification: A Benchmark

Person re-identification by descriptive and discriminative classification