scispace - formally typeset
Proceedings ArticleDOI

Rotation Adaptive Visual Object Tracking with Motion Consistency

12 Mar 2018-pp 1047-1055

TL;DR: In this paper, the authors investigated the outcome of rotation adaptiveness in visual object tracking and also included various consistencies that turn out to be extremely effective in numerous challenging sequences than the current state-of-the-art.

AbstractVisual Object tracking research has undergone significant improvement in the past few years. The emergence of tracking by detection approach in tracking paradigm has been quite successful in many ways. Recently, deep convolutional neural networks have been extensively used in most successful trackers. Yet, the standard approach has been based on correlation or feature selection with minimal consideration given to motion consistency. Thus, there is still a need to capture various physical constraints through motion consistency which will improve accuracy, robustness and more importantly rotation adaptiveness. Therefore, one of the major aspects of this paper is to investigate the outcome of rotation adaptiveness in visual object tracking. Among other key contributions, the paper also includes various consistencies that turn out to be extremely effective in numerous challenging sequences than the current state-of-the-art.

...read more


Citations
More filters
Posted Content
TL;DR: A novel algorithm that uses ellipse fitting to estimate the bounding box rotation angle and size with the segmentation(mask) on the target for online and real-time visual object tracking.
Abstract: In this paper, we demonstrate a novel algorithm that uses ellipse fitting to estimate the bounding box rotation angle and size with the segmentation(mask) on the target for online and real-time visual object tracking. Our method, SiamMask E, improves the bounding box fitting procedure of the state-of-the-art object tracking algorithm SiamMask and still retains a fast-tracking frame rate (80 fps) on a system equipped with GPU (GeForce GTX 1080 Ti or higher). We tested our approach on the visual object tracking datasets (VOT2016, VOT2018, and VOT2019) that were labeled with rotated bounding boxes. By comparing with the original SiamMask, we achieved an improved Accuracy of 0.645 and 0.303 EAO on VOT2019, which is 0.049 and 0.02 higher than the original SiamMask. Our project website is available at this http URL.

25 citations

Journal ArticleDOI
TL;DR: This paper proposes an adaptive template matching-based single object tracking algorithm framework to achieve template update online, based on the Faster-RCNN model, and presents a parallel strategy to accelerate the process of template matching.
Abstract: Existing template matching based visual object tracking algorithms usually require to manually update the template and have high execution cost on general embedded systems. To address these issues, an adaptive template matching-based single object tracking algorithm with parallel acceleration is proposed in this paper. In this algorithm, we propose an adaptive single object tracking algorithm framework to achieve template update online. Based on the Faster-RCNN model, we design a single object capture method to update the template. Meanwhile, we present a parallel strategy to accelerate the process of template matching. To evaluate the proposed algorithm, we use OTB benchmark to compare the performance with several state-of-the-art trackers on TX2 embedded platform. Experimental results show that the proposed method achieves a 5.9 times execution speed and 71.9% accuracy improvement over the comparison methods.

9 citations

Book ChapterDOI
08 Sep 2018
TL;DR: This paper proposes a different approach to regress in the temporal domain, based on weighted aggregation of distinctive visual features and feature prioritization with entropy estimation in a recursive fashion, and provides a statistics based ensembler approach for integrating the conventionally driven spatial regression results and the proposed temporal regression results to accomplish better tracking.
Abstract: In the recent years, convolutional neural networks (CNN) have been extensively employed in various complex computer vision tasks including visual object tracking. In this paper, we study the efficacy of temporal regression with Tikhonov regularization in generic object tracking. Among other major aspects, we propose a different approach to regress in the temporal domain, based on weighted aggregation of distinctive visual features and feature prioritization with entropy estimation in a recursive fashion. We provide a statistics based ensembler approach for integrating the conventionally driven spatial regression results (such as from ECO), and the proposed temporal regression results to accomplish better tracking. Further, we exploit the obligatory dependency of deep architectures on provided visual information, and present an image enhancement filter that helps to boost the performance on popular benchmarks. Our extensive experimentation shows that the proposed weighted aggregation with enhancement filter (WAEF) tracker outperforms the baseline (ECO) in almost all the challenging categories on OTB50 dataset with a cumulative gain of 14.8%. As per the VOT2016 evaluation, the proposed framework offers substantial improvement of 19.04% in occlusion, 27.66% in illumination change, 33.33% in empty, 10% in size change, and 5.28% in average expected overlap.

6 citations

Book ChapterDOI
02 Dec 2018
TL;DR: A robust framework is proposed that offers the provision to incorporate illumination and rotation invariance in the standard Discriminative Correlation Filter (DCF) formulation and supervise the detection stage of DCF trackers by eliminating false positives in the convolution response map.
Abstract: Visual object tracking is one of the major challenges in the field of computer vision. Correlation Filter (CF) trackers are one of the most widely used categories in tracking. Though numerous tracking algorithms based on CFs are available today, most of them fail to efficiently detect the object in an unconstrained environment with dynamically changing object appearance. In order to tackle such challenges, the existing strategies often rely on a particular set of algorithms. Here, we propose a robust framework that offers the provision to incorporate illumination and rotation invariance in the standard Discriminative Correlation Filter (DCF) formulation. We also supervise the detection stage of DCF trackers by eliminating false positives in the convolution response map. Further, we demonstrate the impact of displacement consistency on CF trackers. The generality and efficiency of the proposed framework is illustrated by integrating our contributions into two state-of-the-art CF trackers: SRDCF and ECO. As per the comprehensive experiments on the VOT2016 dataset, our top trackers show substantial improvement of \(14.7\%\) and \(6.41\%\) in robustness, \(11.4\%\) and \(1.71\%\) in Average Expected Overlap (AEO) over the baseline SRDCF and ECO, respectively.

1 citations

Proceedings ArticleDOI
01 Jan 2020
TL;DR: Esta proposta apresenta ganhos em comparacao com a proposta original da SiamFC, que foi proposto a combinacao de sinal de descritores em blocos de memorias de longo e de curto prazo, os quais representam a primeira e a mais recente aparencia do objeto, respectivamente.
Abstract: Nos ultimos anos, os avancos em Aprendizado Profundo revolucionaram diversas sub- areas da Visao Computacional, incluindo o Rastreamento de Objetos Visuais. Um tipo especial de rede neural profunda, a Rede Neural Siamesa, chamou a atencao da comunidade especializada em rastreamento. Ela possui baixo custo computacional e alta efi cacia para comparar a similaridade entre objetos. Atualmente, a comunidade cienti ca atingiu resultados notaveis ao aplicar tais redes ao problema de Rastreamento de Objetos Visuais. No entanto, observou-se que limitacoes dessa rede neural impactam negativamente no rastreamento. Superou-se o problema ao se obter um novo descritor para referencia do objeto combinando descritores passados fornecidos pelo rastreador. Em particular, foi proposto a combinacao de sinal de descritores em blocos de memorias de longo e de curto prazo, os quais representam a primeira e a mais recente aparencia do objeto, respectivamente. Um descritor nal e gerado a partir desses blocos de memoria, o qual o rastreador usa como referencia. Este trabalho enfatizou-se na obtencao de um metodo para calcular um banco de ltros otimizado atraves do uso de um algoritmo genetico. O banco de ltros e utilizado entao para gerar a saida da memoria de curto prazo. De acordo com experimentos realizados na base de dados OTB, esta proposta apresenta ganhos em comparacao com a proposta original da SiamFC. Considerando a metrica area abaixo da curva, ha ganhos de 7.4% e 3.0% para os gra cos de precisao e sucesso, respectivamente, tornando este trabalho comparavel a metodos do estato da arte.

References
More filters
Journal ArticleDOI
TL;DR: A new approach toward target representation and localization, the central component in visual tracking of nonrigid objects, is proposed, which employs a metric derived from the Bhattacharyya coefficient as similarity measure, and uses the mean shift procedure to perform the optimization.
Abstract: A new approach toward target representation and localization, the central component in visual tracking of nonrigid objects, is proposed. The feature histogram-based target representations are regularized by spatial masking with an isotropic kernel. The masking induces spatially-smooth similarity functions suitable for gradient-based optimization, hence, the target localization problem can be formulated using the basin of attraction of the local maxima. We employ a metric derived from the Bhattacharyya coefficient as similarity measure, and use the mean shift procedure to perform the optimization. In the presented tracking examples, the new method successfully coped with camera motion, partial occlusions, clutter, and target scale variations. Integration with motion filters and data association techniques is also discussed. We describe only a few of the potential applications: exploitation of background information, Kalman tracking using motion models, and face tracking.

4,901 citations

Proceedings Article
07 Dec 2015
TL;DR: This work introduces a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within the network, and can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps.
Abstract: Convolutional Neural Networks define an exceptionally powerful class of models, but are still limited by the lack of ability to be spatially invariant to the input data in a computationally and parameter efficient manner. In this work we introduce a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within the network. This differentiable module can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps, conditional on the feature map itself, without any extra training supervision or modification to the optimisation process. We show that the use of spatial transformers results in models which learn invariance to translation, scale, rotation and more generic warping, resulting in state-of-the-art performance on several benchmarks, and for a number of classes of transformations.

4,869 citations

Proceedings ArticleDOI
23 Jun 2013
TL;DR: Large scale experiments are carried out with various evaluation criteria to identify effective approaches for robust tracking and provide potential future research directions in this field.
Abstract: Object tracking is one of the most important components in numerous applications of computer vision. While much progress has been made in recent years with efforts on sharing code and datasets, it is of great importance to develop a library and benchmark to gauge the state of the art. After briefly reviewing recent advances of online object tracking, we carry out large scale experiments with various evaluation criteria to understand how these algorithms perform. The test image sequences are annotated with different attributes for performance evaluation and analysis. By analyzing quantitative results, we identify effective approaches for robust tracking and provide potential future research directions in this field.

3,290 citations

01 Jul 2011
TL;DR: CUB-200-2011 as mentioned in this paper is an extended version of CUB200, which roughly doubles the number of images per category and adds new part localization annotations, annotated with bounding boxes, part locations, and at-ribute labels.
Abstract: CUB-200-2011 is an extended version of CUB-200 [7], a challenging dataset of 200 bird species. The extended version roughly doubles the number of images per category and adds new part localization annotations. All images are annotated with bounding boxes, part locations, and at- tribute labels. Images and annotations were filtered by mul- tiple users of Mechanical Turk. We introduce benchmarks and baseline experiments for multi-class categorization and part localization.

2,875 citations

01 Jan 2015
TL;DR: A method for learning siamese neural networks which employ a unique structure to naturally rank similarity between inputs and is able to achieve strong results which exceed those of other deep learning models with near state-of-the-art performance on one-shot classification tasks.
Abstract: The process of learning good features for machine learning applications can be very computationally expensive and may prove difficult in cases where little data is available. A prototypical example of this is the one-shot learning setting, in which we must correctly make predictions given only a single example of each new class. In this paper, we explore a method for learning siamese neural networks which employ a unique structure to naturally rank similarity between inputs. Once a network has been tuned, we can then capitalize on powerful discriminative features to generalize the predictive power of the network not just to new data, but to entirely new classes from unknown distributions. Using a convolutional architecture, we are able to achieve strong results which exceed those of other deep learning models with near state-of-the-art performance on one-shot classification tasks.

2,476 citations