Person Search with Natural Language Description

doi:10.1109/CVPR.2017.551

Open AccessProceedings ArticleDOI

Person Search with Natural Language Description

Shuang Li, +5 more

- pp 5187-5196

Chats0

TLDR

Zhang et al. as discussed by the authors proposed a recurrent neural network with gated neural attention mechanism (GNA-RNN) for person search in large-scale image databases with the query of natural language description.

Abstract:

Searching persons in large-scale image databases with the query of natural language description has important applications in video surveillance. Existing methods mainly focused on searching persons with image-based or attribute-based queries, which have major limitations for a practical usage. In this paper, we study the problem of person search with natural language description. Given the textual description of a person, the algorithm of the person search is required to rank all the samples in the person database then retrieve the most relevant sample corresponding to the queried description. Since there is no person dataset or benchmark with textual description available, we collect a large-scale person description dataset with detailed natural language annotations and person samples from various sources, termed as CUHK Person Description Dataset (CUHK-PEDES). A wide range of possible models and baselines have been evaluated and compared on the person search benchmark. An Recurrent Neural Network with Gated Neural Attention mechanism (GNA-RNN) is proposed to establish the state-of-the art performance on person search.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Multi-level Alignment Network for Domain Adaptive Cross-modal Retrieval

Jianfeng Dong, +7 more

- 14 Jun 2021 -

Neurocomputing

TL;DR: The proposed model, MAN, consistently outperforms multiple baselines, showing a superior generalization ability for target data, and establishes a new state-of-the-art for the large-scale text-to-video retrieval on TRECVID 2017, 2018 Ad-hoc Video Search benchmark.

...read moreread less

Posted Content

Weakly Supervised Person Re-ID: Differentiable Graphical Learning and A New Benchmark

Guangrun Wang, +5 more

- 08 Apr 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work ease the annotation of Re-ID by replacing the accurate annotation with inaccurate annotation, i.e., the authors group the images into bags in terms of time and assign a bag-level label for each bag, which greatly reduces the annotation effort and leads to the creation of a large-scale Re- ID benchmark called SYSU- $30k$ .

...read moreread less

Proceedings ArticleDOI

Review of Recent Deep Learning Based Methods for Image-Text Retrieval

Jianan Chen, +3 more

TL;DR: This paper highlights key points of recent cross-modal retrieval approaches based on deep-learning, especially in the image-text retrieval context, and classify them into four categories according to different embedding methods.

...read moreread less

Journal ArticleDOI

Person Re-Identification With Deep Kronecker-Product Matching and Group-Shuffling Random Walk

Yantao Shen, +5 more

- 01 May 2021 -

IEEE Transactions on Pattern Analysis an...

TL;DR: A unified end-to-end deep learning framework to tackle person re-identification and handling viewpoint and pose variations between compared person images is proposed and a novel Kronecker Product Matching operation to match and warp feature maps of different persons is proposed.

...read moreread less

Proceedings ArticleDOI

Category-Specific CNN for Visual-aware CTR Prediction at JD.com

Hu Liu, +10 more

- 18 Jun 2020 -

arXiv: Learning

TL;DR: Zhang et al. as discussed by the authors proposed a category-specific CNN (CSCNN) for click-through-rate prediction in e-commerce, which incorporates the category knowledge with a light-weighted attention module on each convolutional layer.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Book ChapterDOI

Microsoft COCO: Common Objects in Context

Tsung-Yi Lin, +7 more

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.

...read moreread less

Proceedings Article

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Kelvin Xu, +10 more

TL;DR: An attention based model that automatically learns to describe the content of images is introduced that can be trained in a deterministic manner using standard backpropagation techniques and stochastically by maximizing a variational lower bound.

...read moreread less

Collapse

Person Search with Natural Language Description

Citations

Multi-level Alignment Network for Domain Adaptive Cross-modal Retrieval

Weakly Supervised Person Re-ID: Differentiable Graphical Learning and A New Benchmark

Review of Recent Deep Learning Based Methods for Image-Text Retrieval

Person Re-Identification With Deep Kronecker-Product Matching and Group-Shuffling Random Walk

Category-Specific CNN for Visual-aware CTR Prediction at JD.com

References

Deep Residual Learning for Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

Long short-term memory

Microsoft COCO: Common Objects in Context

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Related Papers (5)

Deep Residual Learning for Image Recognition

Show and tell: A neural image caption generator

Person re-identification by Local Maximal Occurrence representation and metric learning

DeepReID: Deep Filter Pairing Neural Network for Person Re-identification

Beyond Part Models: Person Retrieval with Refined Part Pooling (and A Strong Convolutional Baseline)