scispace - formally typeset
Open AccessProceedings ArticleDOI

SANet: Structure-Aware Network for Visual Tracking

Heng Fan, +1 more
- pp 2217-2224
Reads0
Chats0
TLDR
SANet as mentioned in this paper utilizes recurrent neural network (RNN) to model object structure, and incorporate it into CNN to improve its robustness to similar distractors, considering that convolutional layers in different levels characterize the object from different perspectives.
Abstract
Convolutional neural network (CNN) has drawn increasing interest in visual tracking owing to its powerfulness in feature extraction. Most existing CNN-based trackers treat tracking as a classification problem. However, these trackers are sensitive to similar distractors because their CNN models mainly focus on inter-class classification. To address this problem, we use self-structure information of object to distinguish it from distractors. Specifically, we utilize recurrent neural network (RNN) to model object structure, and incorporate it into CNN to improve its robustness to similar distractors. Considering that convolutional layers in different levels characterize the object from different perspectives, we use multiple RNNs to model object structure in different levels respectively. Extensive experiments on three benchmarks, OTB100, TC-128 and VOT2015, show that the proposed algorithm outperforms other methods. Code is released at www.dabi.temple.edu/hbling/code/SANet/SANet.html.

read more

Citations
More filters
Journal ArticleDOI

Deep visual tracking: Review and experimental comparison

TL;DR: The background of deep visual tracking is introduced, including the fundamental concepts of visual tracking and related deep learning algorithms, and the existing deep-learning-based trackers are categorize into three classes according to network structure, network function and network training.
Proceedings ArticleDOI

Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking

TL;DR: C-RPN as discussed by the authors proposes a multi-stage tracking framework, which consists of a sequence of RPNs cascaded from deep high-level to shallow low-level layers in a Siamese network.
Proceedings ArticleDOI

Graph Convolutional Tracking

TL;DR: The GCT jointly incorporates two types of Graph Convolutional Networks into a siamese framework for target appearance modeling and adopts a spatial-temporal GCN to model the structured representation of historical target exemplars.
Proceedings ArticleDOI

Parallel Tracking and Verifying: A Framework for Real-Time and High Accuracy Visual Tracking

TL;DR: Zhang et al. as mentioned in this paper proposed a parallel tracking and verifying (PTAV) framework, which consists of two components, a tracker T and a verifier V, working in parallel on two separate threads.
Book ChapterDOI

Real-Time MDNet

TL;DR: This work presents a fast and accurate visual tracking algorithm based on the multi-domain convolutional neural network (MDNet) that accelerates feature extraction procedure and learns more discriminative models for instance classification; it enhances representation quality of target and background by maintaining a high resolution feature map with a large receptive field per activation.
References
More filters
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Proceedings ArticleDOI

Fully convolutional networks for semantic segmentation

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
Proceedings ArticleDOI

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

TL;DR: RCNN as discussed by the authors combines CNNs with bottom-up region proposals to localize and segment objects, and when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost.
Journal ArticleDOI

Finding Structure in Time

TL;DR: A proposal along these lines first described by Jordan (1986) which involves the use of recurrent links in order to provide networks with a dynamic memory and suggests a method for representing lexical categories and the type/token distinction is developed.
Related Papers (5)