Person Search with Natural Language Description
Shuang Li,Tong Xiao,Hongsheng Li,Bolei Zhou,Dayu Yue,Xiaogang Wang +5 more
- pp 5187-5196
Reads0
Chats0
TLDR
Zhang et al. as discussed by the authors proposed a recurrent neural network with gated neural attention mechanism (GNA-RNN) for person search in large-scale image databases with the query of natural language description.Abstract:
Searching persons in large-scale image databases with the query of natural language description has important applications in video surveillance. Existing methods mainly focused on searching persons with image-based or attribute-based queries, which have major limitations for a practical usage. In this paper, we study the problem of person search with natural language description. Given the textual description of a person, the algorithm of the person search is required to rank all the samples in the person database then retrieve the most relevant sample corresponding to the queried description. Since there is no person dataset or benchmark with textual description available, we collect a large-scale person description dataset with detailed natural language annotations and person samples from various sources, termed as CUHK Person Description Dataset (CUHK-PEDES). A wide range of possible models and baselines have been evaluated and compared on the person search benchmark. An Recurrent Neural Network with Gated Neural Attention mechanism (GNA-RNN) is proposed to establish the state-of-the art performance on person search.read more
Citations
More filters
Journal ArticleDOI
Multi-level Alignment Network for Domain Adaptive Cross-modal Retrieval
Jianfeng Dong,Jianfeng Dong,Zhongzi Long,Xiaofeng Mao,Changting Lin,Changting Lin,Yuan He,Shouling Ji +7 more
TL;DR: The proposed model, MAN, consistently outperforms multiple baselines, showing a superior generalization ability for target data, and establishes a new state-of-the-art for the large-scale text-to-video retrieval on TRECVID 2017, 2018 Ad-hoc Video Search benchmark.
Posted Content
Weakly Supervised Person Re-ID: Differentiable Graphical Learning and A New Benchmark
TL;DR: This work ease the annotation of Re-ID by replacing the accurate annotation with inaccurate annotation, i.e., the authors group the images into bags in terms of time and assign a bag-level label for each bag, which greatly reduces the annotation effort and leads to the creation of a large-scale Re- ID benchmark called SYSU- $30k$ .
Proceedings ArticleDOI
Review of Recent Deep Learning Based Methods for Image-Text Retrieval
TL;DR: This paper highlights key points of recent cross-modal retrieval approaches based on deep-learning, especially in the image-text retrieval context, and classify them into four categories according to different embedding methods.
Journal ArticleDOI
Person Re-Identification With Deep Kronecker-Product Matching and Group-Shuffling Random Walk
TL;DR: A unified end-to-end deep learning framework to tackle person re-identification and handling viewpoint and pose variations between compared person images is proposed and a novel Kronecker Product Matching operation to match and warp feature maps of different persons is proposed.
Proceedings ArticleDOI
Category-Specific CNN for Visual-aware CTR Prediction at JD.com
Hu Liu,Jing Lu,Hao Yang,Xiwei Zhao,Sulong Xu,Hao Peng,Zehua Zhang,Wenjie Niu,Xiaokun Zhu,Yongjun Bao,Weipeng Yan +10 more
TL;DR: Zhang et al. as discussed by the authors proposed a category-specific CNN (CSCNN) for click-through-rate prediction in e-commerce, which incorporates the category knowledge with a light-weighted attention module on each convolutional layer.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Journal ArticleDOI
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Book ChapterDOI
Microsoft COCO: Common Objects in Context
Tsung-Yi Lin,Michael Maire,Serge Belongie,James Hays,Pietro Perona,Deva Ramanan,Piotr Dollár,C. Lawrence Zitnick +7 more
TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Proceedings Article
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Kelvin Xu,Jimmy Ba,Ryan Kiros,Kyunghyun Cho,Aaron Courville,Ruslan Salakhudinov,Ruslan Salakhudinov,Rich Zemel,Rich Zemel,Yoshua Bengio,Yoshua Bengio +10 more
TL;DR: An attention based model that automatically learns to describe the content of images is introduced that can be trained in a deterministic manner using standard backpropagation techniques and stochastically by maximizing a variational lower bound.