Learning Policies for Adaptive Tracking with Deep Feature Cascades
Chen Huang,Simon Lucey,Deva Ramanan +2 more
- pp 105-114
TLDR
In this paper, the authors formulate the adaptive tracking problem as a decision-making process, and learn an agent to decide whether to locate objects with high confidence on an early layer, or continue processing subsequent layers of a network.Abstract:
Visual object tracking is a fundamental and time-critical vision task. Recent years have seen many shallow tracking methods based on real-time pixel-based correlation filters, as well as deep methods that have top performance but need a high-end GPU. In this paper, we learn to improve the speed of deep trackers without losing accuracy. Our fundamental insight is to take an adaptive approach, where easy frames are processed with cheap features (such as pixel values), while challenging frames are processed with invariant but expensive deep features. We formulate the adaptive tracking problem as a decision-making process, and learn an agent to decide whether to locate objects with high confidence on an early layer, or continue processing subsequent layers of a network. This significantly reduces the feedforward cost for easy frames with distinct or slow-moving objects. We train the agent offline in a reinforcement learning fashion, and further demonstrate that learning all deep layers (so as to provide good features for adaptive tracking) can lead to near real-time average tracking speed of 23 fps on a single CPU while achieving state-of-the-art performance. Perhaps most tellingly, our approach provides a 100X speedup for almost 50% of the time, indicating the power of an adaptive approach.read more
Citations
More filters
Proceedings ArticleDOI
A Twofold Siamese Network for Real-Time Object Tracking
TL;DR: The proposed SA-Siam outperforms all other real-time trackers by a large margin on OTB-2013/50/100 benchmarks and proposes a channel attention mechanism for the semantic branch.
Proceedings ArticleDOI
Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking
TL;DR: In this article, a Siamese-like tracking pipeline is proposed to exploit the rich temporal contexts among successive frames, which have been largely overlooked in existing trackers. And the proposed transformer-assisted tracking framework is neat and trained in an end-to-end manner.
Proceedings ArticleDOI
Unsupervised Deep Tracking
TL;DR: The proposed unsupervised tracker achieves the baseline accuracy of fully supervised trackers, which require complete and accurate labels during training, and exhibits a potential in leveraging unlabeled or weakly labeled data to further improve the tracking accuracy.
Proceedings ArticleDOI
Graph Convolutional Tracking
TL;DR: The GCT jointly incorporates two types of Graph Convolutional Networks into a siamese framework for target appearance modeling and adopts a spatial-temporal GCN to model the structured representation of historical target exemplars.
Book ChapterDOI
Learning Dynamic Memory Networks for Object Tracking
Tianyu Yang,Antoni B. Chan +1 more
TL;DR: In this paper, a dynamic memory network is proposed to adapt the template to the target's appearance variations during tracking, where an LSTM is used as a memory controller, where the input is the search feature map and the outputs are the control signals for the reading and writing process of the memory block.
References
More filters
Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Book
Reinforcement Learning: An Introduction
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Proceedings ArticleDOI
Histograms of oriented gradients for human detection
Navneet Dalal,Bill Triggs +1 more
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Journal ArticleDOI
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky,Jia Deng,Hao Su,Jonathan Krause,Sanjeev Satheesh,Sean Ma,Zhiheng Huang,Andrej Karpathy,Aditya Khosla,Michael S. Bernstein,Alexander C. Berg,Li Fei-Fei +11 more
TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.