Proceedings ArticleDOI
Hedged Deep Tracking
Yuankai Qi,Shengping Zhang,Lei Qin,Hongxun Yao,Qingming Huang,Jongwoo Lim,Ming-Hsuan Yang +6 more
- pp 4303-4311
Reads0
Chats0
TLDR
A novel CNN based tracking framework is proposed, which takes full advantage of features from different CNN layers and uses an adaptive Hedge method to hedge several CNN based trackers into a single stronger one.Abstract:
In recent years, several methods have been developed to utilize hierarchical features learned from a deep convolutional neural network (CNN) for visual tracking. However, as features from a certain CNN layer characterize an object of interest from only one aspect or one level, the performance of such trackers trained with features from one layer (usually the second to last layer) can be further improved. In this paper, we propose a novel CNN based tracking framework, which takes full advantage of features from different CNN layers and uses an adaptive Hedge method to hedge several CNN based trackers into a single stronger one. Extensive experiments on a benchmark dataset of 100 challenging image sequences demonstrate the effectiveness of the proposed algorithm compared to several state-of-theart trackers.read more
Citations
More filters
Proceedings Article
Learned in translation: contextualized word vectors
TL;DR: Adding context vectors to a deep LSTM encoder from an attentional sequence-to-sequence model trained for machine translation to contextualize word vectors improves performance over using only unsupervised word and character vectors on a wide variety of common NLP tasks.
Posted Content
Learned in Translation: Contextualized Word Vectors
TL;DR: The authors used a deep LSTM encoder from an attentional sequence-to-sequence model trained for machine translation (MT) to contextualize word vectors and showed that adding these context vectors (CoVe) improved performance over using only unsupervised word and character vectors on a wide variety of common NLP tasks.
Proceedings ArticleDOI
Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking
TL;DR: The spatial-temporal regularized correlation filters (STRCF) formulation can not only serve as a reasonable approximation to SRDCF with multiple training samples, but also provide a more robust appearance model thanSRDCF in the case of large appearance variations.
Proceedings ArticleDOI
VITAL: VIsual Tracking via Adversarial Learning
Yibing Song,Chao Ma,Xiaohe Wu,Lijun Gong,Linchao Bao,Wangmeng Zuo,Chunhua Shen,Rynson W. H. Lau,Ming-Hsuan Yang +8 more
TL;DR: Zhang et al. as mentioned in this paper used a generative network to randomly generate masks, which are applied to adaptively dropout input features to capture a variety of appearance changes, and the network identifies the mask that maintains the most robust features of the target objects over a long temporal span.
Proceedings ArticleDOI
Large Margin Object Tracking with Circulant Feature Maps
TL;DR: Wang et al. as discussed by the authors proposed a large margin object tracking method, which absorbs the strong discriminative ability from structured output SVM and speeds up by the correlation filter algorithm significantly.
References
More filters
Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Proceedings ArticleDOI
Histograms of oriented gradients for human detection
Navneet Dalal,Bill Triggs +1 more
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Proceedings ArticleDOI
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
TL;DR: RCNN as discussed by the authors combines CNNs with bottom-up region proposals to localize and segment objects, and when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost.
Proceedings ArticleDOI
Object recognition from local scale-invariant features
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.