scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Rotation Adaptive Visual Object Tracking with Motion Consistency

TL;DR: In this paper, the authors investigated the outcome of rotation adaptiveness in visual object tracking and also included various consistencies that turn out to be extremely effective in numerous challenging sequences than the current state-of-the-art.
Abstract: Visual Object tracking research has undergone significant improvement in the past few years. The emergence of tracking by detection approach in tracking paradigm has been quite successful in many ways. Recently, deep convolutional neural networks have been extensively used in most successful trackers. Yet, the standard approach has been based on correlation or feature selection with minimal consideration given to motion consistency. Thus, there is still a need to capture various physical constraints through motion consistency which will improve accuracy, robustness and more importantly rotation adaptiveness. Therefore, one of the major aspects of this paper is to investigate the outcome of rotation adaptiveness in visual object tracking. Among other key contributions, the paper also includes various consistencies that turn out to be extremely effective in numerous challenging sequences than the current state-of-the-art.
Citations
More filters
Posted Content
TL;DR: A novel algorithm that uses ellipse fitting to estimate the bounding box rotation angle and size with the segmentation(mask) on the target for online and real-time visual object tracking.
Abstract: In this paper, we demonstrate a novel algorithm that uses ellipse fitting to estimate the bounding box rotation angle and size with the segmentation(mask) on the target for online and real-time visual object tracking. Our method, SiamMask E, improves the bounding box fitting procedure of the state-of-the-art object tracking algorithm SiamMask and still retains a fast-tracking frame rate (80 fps) on a system equipped with GPU (GeForce GTX 1080 Ti or higher). We tested our approach on the visual object tracking datasets (VOT2016, VOT2018, and VOT2019) that were labeled with rotated bounding boxes. By comparing with the original SiamMask, we achieved an improved Accuracy of 0.645 and 0.303 EAO on VOT2019, which is 0.049 and 0.02 higher than the original SiamMask. Our project website is available at this http URL.

32 citations

Journal ArticleDOI
TL;DR: This paper proposes an adaptive template matching-based single object tracking algorithm framework to achieve template update online, based on the Faster-RCNN model, and presents a parallel strategy to accelerate the process of template matching.

15 citations

Book ChapterDOI
08 Sep 2018
TL;DR: This paper proposes a different approach to regress in the temporal domain, based on weighted aggregation of distinctive visual features and feature prioritization with entropy estimation in a recursive fashion, and provides a statistics based ensembler approach for integrating the conventionally driven spatial regression results and the proposed temporal regression results to accomplish better tracking.
Abstract: In the recent years, convolutional neural networks (CNN) have been extensively employed in various complex computer vision tasks including visual object tracking. In this paper, we study the efficacy of temporal regression with Tikhonov regularization in generic object tracking. Among other major aspects, we propose a different approach to regress in the temporal domain, based on weighted aggregation of distinctive visual features and feature prioritization with entropy estimation in a recursive fashion. We provide a statistics based ensembler approach for integrating the conventionally driven spatial regression results (such as from ECO), and the proposed temporal regression results to accomplish better tracking. Further, we exploit the obligatory dependency of deep architectures on provided visual information, and present an image enhancement filter that helps to boost the performance on popular benchmarks. Our extensive experimentation shows that the proposed weighted aggregation with enhancement filter (WAEF) tracker outperforms the baseline (ECO) in almost all the challenging categories on OTB50 dataset with a cumulative gain of 14.8%. As per the VOT2016 evaluation, the proposed framework offers substantial improvement of 19.04% in occlusion, 27.66% in illumination change, 33.33% in empty, 10% in size change, and 5.28% in average expected overlap.

7 citations

Journal ArticleDOI
TL;DR: A novel rotation adaptive tracker with motion constraint (RAMC) is proposed to explore how the hybridization of angle and motion information can be utilized to boost SV object tracking from two branches: rotation and translation.
Abstract: Single-object tracking (SOT) in satellite videos (SVs) is a promising and challenging task in the remote sensing community. In terms of the object itself and the tracking algorithm, the rotation of small-sized objects and tracking drift are common problems due to the nadir view coupled with a complex background. This article proposes a novel rotation adaptive tracker with motion constraint (RAMC) to explore how the hybridization of angle and motion information can be utilized to boost SV object tracking from two branches: rotation and translation. We decouple the rotation and translation motion patterns. The rotation phenomenon is decomposed into the translation solution to achieve adaptive rotation estimation in the rotation branch. In the translation branch, the appearance and motion information are synergized to enhance the object representations and address the tracking drift issue. Moreover, an internal shrinkage (IS) strategy is proposed to optimize the evaluation process of trackers. Extensive experiments on space-born SV datasets captured from the Jilin-1 satellite constellation and International Space Station (ISS) are conducted. The results demonstrate the superiority of the proposed method over other algorithms. With an area under the curve (AUC) of 0.785 and 0.946 in the success and precision plots, respectively, the proposed RAMC achieves optimal performance while running at real-time speed.

5 citations

Book ChapterDOI
02 Dec 2018
TL;DR: A robust framework is proposed that offers the provision to incorporate illumination and rotation invariance in the standard Discriminative Correlation Filter (DCF) formulation and supervise the detection stage of DCF trackers by eliminating false positives in the convolution response map.
Abstract: Visual object tracking is one of the major challenges in the field of computer vision. Correlation Filter (CF) trackers are one of the most widely used categories in tracking. Though numerous tracking algorithms based on CFs are available today, most of them fail to efficiently detect the object in an unconstrained environment with dynamically changing object appearance. In order to tackle such challenges, the existing strategies often rely on a particular set of algorithms. Here, we propose a robust framework that offers the provision to incorporate illumination and rotation invariance in the standard Discriminative Correlation Filter (DCF) formulation. We also supervise the detection stage of DCF trackers by eliminating false positives in the convolution response map. Further, we demonstrate the impact of displacement consistency on CF trackers. The generality and efficiency of the proposed framework is illustrated by integrating our contributions into two state-of-the-art CF trackers: SRDCF and ECO. As per the comprehensive experiments on the VOT2016 dataset, our top trackers show substantial improvement of \(14.7\%\) and \(6.41\%\) in robustness, \(11.4\%\) and \(1.71\%\) in Average Expected Overlap (AEO) over the baseline SRDCF and ECO, respectively.

1 citations

References
More filters
Proceedings ArticleDOI
23 Jun 1998
TL;DR: This paper presents a neural network-based face detection system, which is limited to detecting upright, frontal faces, and presents preliminary results for detecting faces rotated out of the image plane, such as profiles and semi-profiles.
Abstract: In this paper, we present a neural network-based face detection system. Unlike similar systems which are limited to detecting upright, frontal faces, this system detects faces at any degree of rotation in the image plane. The system employs multiple networks; a "router" network first processes each input window to determine its orientation and then uses this information to prepare the window for one or more "detector" networks. We present the training methods for both types of networks. We also perform sensitivity analysis on the networks, and present empirical results on a large test set. Finally, we present preliminary results for detecting faces rotated out of the image plane, such as profiles and semi-profiles.

570 citations

Proceedings Article
05 Dec 2016
TL;DR: This paper constructs the learner as a second deep network, called a learnet, which predicts the parameters of a pupil network from a single exemplar, and obtains an efficient feed-forward one-shot learner, trained end-to-end by minimizing a one- shot classification objective in a learning to learn formulation.
Abstract: One-shot learning is usually tackled by using generative models or discriminative embeddings. Discriminative methods based on deep learning, which are very effective in other learning scenarios, are ill-suited for one-shot learning as they need large amounts of training data. In this paper, we propose a method to learn the parameters of a deep model in one shot. We construct the learner as a second deep network, called a learnet, which predicts the parameters of a pupil network from a single exemplar. In this manner we obtain an efficient feed-forward one-shot learner, trained end-to-end by minimizing a one-shot classification objective in a learning to learn formulation. In order to make the construction feasible, we propose a number of factorizations of the parameters of the pupil network. We demonstrate encouraging results by learning characters from single exemplars in Omniglot, and by tracking visual objects from a single initial exemplar in the Visual Object Tracking benchmark.

343 citations

Proceedings ArticleDOI
01 Jun 2016
TL;DR: Wang et al. as mentioned in this paper introduced a complete framework for the object detection from video (VID) task based on still-image object detection and general object tracking, and a temporal convolution network is proposed to incorporate temporal information to regularize the detection results and shows its effectiveness for the task.
Abstract: Deep Convolution Neural Networks (CNNs) have shown impressive performance in various vision tasks such as image classification, object detection and semantic segmentation. For object detection, particularly in still images, the performance has been significantly increased last year thanks to powerful deep networks (e.g. GoogleNet) and detection frameworks (e.g. Regions with CNN features (RCNN)). The lately introduced ImageNet [6] task on object detection from video (VID) brings the object detection task into the video domain, in which objects' locations at each frame are required to be annotated with bounding boxes. In this work, we introduce a complete framework for the VID task based on still-image object detection and general object tracking. Their relations and contributions in the VID task are thoroughly studied and evaluated. In addition, a temporal convolution network is proposed to incorporate temporal information to regularize the detection results and shows its effectiveness for the task. Code is available at https://github.com/ myfavouritekk/vdetlib.

338 citations

Journal ArticleDOI
TL;DR: This paper presents an algorithm for modeling, tracking, and recognizing human faces in video sequences within one integrated framework that emphasizes an algorithmic architecture that tightly couples these two components within a single framework.

246 citations

Proceedings ArticleDOI
07 Dec 2015
TL;DR: This paper formulating it as a proposal selection task and making two contributions, introducing novel proposals estimated from the geometric transformations undergone by the object, and building a rich candidate set for predicting the object location.
Abstract: Tracking-by-detection approaches are some of the most successful object trackers in recent years. Their success is largely determined by the detector model they learn initially and then update over time. However, under challenging conditions where an object can undergo transformations, e.g., severe rotation, these methods are found to be lacking. In this paper, we address this problem by formulating it as a proposal selection task and making two contributions. The first one is introducing novel proposals estimated from the geometric transformations undergone by the object, and building a rich candidate set for predicting the object location. The second one is devising a novel selection strategy using multiple cues, i.e., detection score and edgeness score computed from state-of-the-art object edges and motion boundaries. We extensively evaluate our approach on the visual object tracking 2014 challenge and online tracking benchmark datasets, and show the best performance.

117 citations