SANet: Structure-Aware Network for Visual Tracking
Heng Fan,Haibin Ling +1 more
- pp 2217-2224
Reads0
Chats0
TLDR
SANet as mentioned in this paper utilizes recurrent neural network (RNN) to model object structure, and incorporate it into CNN to improve its robustness to similar distractors, considering that convolutional layers in different levels characterize the object from different perspectives.Abstract:
Convolutional neural network (CNN) has drawn increasing interest in visual tracking owing to its powerfulness in feature extraction. Most existing CNN-based trackers treat tracking as a classification problem. However, these trackers are sensitive to similar distractors because their CNN models mainly focus on inter-class classification. To address this problem, we use self-structure information of object to distinguish it from distractors. Specifically, we utilize recurrent neural network (RNN) to model object structure, and incorporate it into CNN to improve its robustness to similar distractors. Considering that convolutional layers in different levels characterize the object from different perspectives, we use multiple RNNs to model object structure in different levels respectively. Extensive experiments on three benchmarks, OTB100, TC-128 and VOT2015, show that the proposed algorithm outperforms other methods. Code is released at www.dabi.temple.edu/hbling/code/SANet/SANet.html.read more
Citations
More filters
Journal ArticleDOI
Deep visual tracking: Review and experimental comparison
TL;DR: The background of deep visual tracking is introduced, including the fundamental concepts of visual tracking and related deep learning algorithms, and the existing deep-learning-based trackers are categorize into three classes according to network structure, network function and network training.
Proceedings ArticleDOI
Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking
Heng Fan,Haibin Ling +1 more
TL;DR: C-RPN as discussed by the authors proposes a multi-stage tracking framework, which consists of a sequence of RPNs cascaded from deep high-level to shallow low-level layers in a Siamese network.
Proceedings ArticleDOI
Graph Convolutional Tracking
TL;DR: The GCT jointly incorporates two types of Graph Convolutional Networks into a siamese framework for target appearance modeling and adopts a spatial-temporal GCN to model the structured representation of historical target exemplars.
Proceedings ArticleDOI
Parallel Tracking and Verifying: A Framework for Real-Time and High Accuracy Visual Tracking
Heng Fan,Haibin Ling +1 more
TL;DR: Zhang et al. as mentioned in this paper proposed a parallel tracking and verifying (PTAV) framework, which consists of two components, a tracker T and a verifier V, working in parallel on two separate threads.
Book ChapterDOI
Real-Time MDNet
TL;DR: This work presents a fast and accurate visual tracking algorithm based on the multi-domain convolutional neural network (MDNet) that accelerates feature extraction procedure and learns more discriminative models for instance classification; it enhances representation quality of target and background by maintaining a high resolution feature map with a large receptive field per activation.
References
More filters
Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Proceedings ArticleDOI
Fully convolutional networks for semantic segmentation
TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
Proceedings ArticleDOI
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
TL;DR: RCNN as discussed by the authors combines CNNs with bottom-up region proposals to localize and segment objects, and when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost.
Journal ArticleDOI
Finding Structure in Time
TL;DR: A proposal along these lines first described by Jordan (1986) which involves the use of recurrent links in order to provide networks with a dynamic memory and suggests a method for representing lexical categories and the type/token distinction is developed.