scispace - formally typeset
Open AccessProceedings ArticleDOI

Person Search with Natural Language Description

Reads0
Chats0
TLDR
Zhang et al. as discussed by the authors proposed a recurrent neural network with gated neural attention mechanism (GNA-RNN) for person search in large-scale image databases with the query of natural language description.
Abstract
Searching persons in large-scale image databases with the query of natural language description has important applications in video surveillance. Existing methods mainly focused on searching persons with image-based or attribute-based queries, which have major limitations for a practical usage. In this paper, we study the problem of person search with natural language description. Given the textual description of a person, the algorithm of the person search is required to rank all the samples in the person database then retrieve the most relevant sample corresponding to the queried description. Since there is no person dataset or benchmark with textual description available, we collect a large-scale person description dataset with detailed natural language annotations and person samples from various sources, termed as CUHK Person Description Dataset (CUHK-PEDES). A wide range of possible models and baselines have been evaluated and compared on the person search benchmark. An Recurrent Neural Network with Gated Neural Attention mechanism (GNA-RNN) is proposed to establish the state-of-the art performance on person search.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Multi-level Alignment Network for Domain Adaptive Cross-modal Retrieval

TL;DR: The proposed model, MAN, consistently outperforms multiple baselines, showing a superior generalization ability for target data, and establishes a new state-of-the-art for the large-scale text-to-video retrieval on TRECVID 2017, 2018 Ad-hoc Video Search benchmark.
Posted Content

Weakly Supervised Person Re-ID: Differentiable Graphical Learning and A New Benchmark

TL;DR: This work ease the annotation of Re-ID by replacing the accurate annotation with inaccurate annotation, i.e., the authors group the images into bags in terms of time and assign a bag-level label for each bag, which greatly reduces the annotation effort and leads to the creation of a large-scale Re- ID benchmark called SYSU- $30k$ .
Proceedings ArticleDOI

Review of Recent Deep Learning Based Methods for Image-Text Retrieval

TL;DR: This paper highlights key points of recent cross-modal retrieval approaches based on deep-learning, especially in the image-text retrieval context, and classify them into four categories according to different embedding methods.
Journal ArticleDOI

Person Re-Identification With Deep Kronecker-Product Matching and Group-Shuffling Random Walk

TL;DR: A unified end-to-end deep learning framework to tackle person re-identification and handling viewpoint and pose variations between compared person images is proposed and a novel Kronecker Product Matching operation to match and warp feature maps of different persons is proposed.
Proceedings ArticleDOI

Category-Specific CNN for Visual-aware CTR Prediction at JD.com

TL;DR: Zhang et al. as discussed by the authors proposed a category-specific CNN (CSCNN) for click-through-rate prediction in e-commerce, which incorporates the category knowledge with a light-weighted attention module on each convolutional layer.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Journal ArticleDOI

Long short-term memory

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Book ChapterDOI

Microsoft COCO: Common Objects in Context

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Proceedings Article

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

TL;DR: An attention based model that automatically learns to describe the content of images is introduced that can be trained in a deterministic manner using standard backpropagation techniques and stochastically by maximizing a variational lower bound.
Related Papers (5)