scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Bag of Visual Words based Correlation Filter Tracker (BoVW-CFT)

TL;DR: The proposed BoVW-CFT is a classifier-based generic technique to handle tracking uncertainties in correlation filter trackers that has the advantages of mitigating the model drift in correlation trackers and learns a robust model that tracks long term.
Abstract: Accurate and robust visual object tracking is one of the most challenging computer vision problems. Recently, discriminative correlation filter trackers have shown promising results on benchmark datasets with continuous performance improvements in tracking accuracy and robustness. Still, these algorithms fail to track as the target object and background conditions undergo drastic changes over time. They are also incapable to resume tracking once the target is lost, limiting the ability to track long term. The proposed BoVW-CFT is a classifier-based generic technique to handle tracking uncertainties in correlation filter trackers. Tracking failures in correlation trackers are automatically identified and an image classifier with training, testing and online update stages is proposed as detector in the tracking scenario using Bag of Visual Words (BoVW) features. The proposed detector falls under the parts based model and is quite well suited in the tracking framework. Further, the online training stage in the proposed framework with updated model or training samples, incorporates temporal information, helping to detect rotated, blurred and scaled versions of the target. On detecting a target loss in the correlation tracker, the trained classifier, referred to as detector, is invoked to re-initialize the tracker with the actual target location. Therefore, for each tracking uncertainty, two output patches are obtained, one each from the base tracker and the classifier. The final target location is estimated using the normalized cross-correlation with the initial target patch. The method has the advantages of mitigating the model drift in correlation trackers and learns a robust model that tracks long term. Extensive experimental results demonstrate an improvement of 4.1% in the expected overlap, 1.86% in accuracy and 15.46% in robustness on VOT2016 and 1.82% in overlap precision, 2.32% in AUC and 2.87% in success rates on OTB100.
Citations
More filters
Reference EntryDOI
15 Oct 2004

2,118 citations

References
More filters
Journal ArticleDOI
TL;DR: A new kernelized correlation filter is derived, that unlike other kernel algorithms has the exact same complexity as its linear counterpart, which is called dual correlation filter (DCF), which outperform top-ranking trackers such as Struck or TLD on a 50 videos benchmark, despite being implemented in a few lines of code.
Abstract: The core component of most modern trackers is a discriminative classifier, tasked with distinguishing between the target and the surrounding environment. To cope with natural image changes, this classifier is typically trained with translated and scaled sample patches. Such sets of samples are riddled with redundancies—any overlapping pixels are constrained to be the same. Based on this simple observation, we propose an analytic model for datasets of thousands of translated patches. By showing that the resulting data matrix is circulant, we can diagonalize it with the discrete Fourier transform, reducing both storage and computation by several orders of magnitude. Interestingly, for linear regression our formulation is equivalent to a correlation filter, used by some of the fastest competitive trackers. For kernel regression, however, we derive a new kernelized correlation filter (KCF), that unlike other kernel algorithms has the exact same complexity as its linear counterpart. Building on it, we also propose a fast multi-channel extension of linear correlation filters, via a linear kernel, which we call dual correlation filter (DCF). Both KCF and DCF outperform top-ranking trackers such as Struck or TLD on a 50 videos benchmark, despite running at hundreds of frames-per-second, and being implemented in a few lines of code (Algorithm 1). To encourage further developments, our tracking framework was made open-source.

4,994 citations

Proceedings ArticleDOI
23 Jun 2013
TL;DR: Large scale experiments are carried out with various evaluation criteria to identify effective approaches for robust tracking and provide potential future research directions in this field.
Abstract: Object tracking is one of the most important components in numerous applications of computer vision. While much progress has been made in recent years with efforts on sharing code and datasets, it is of great importance to develop a library and benchmark to gauge the state of the art. After briefly reviewing recent advances of online object tracking, we carry out large scale experiments with various evaluation criteria to understand how these algorithms perform. The test image sequences are annotated with different attributes for performance evaluation and analysis. By analyzing quantitative results, we identify effective approaches for robust tracking and provide potential future research directions in this field.

3,828 citations


"Bag of Visual Words based Correlati..." refers background in this paper

  • ...The discriminative trackers are more competing since the use of background information has been found to be advantageous as shown in [29], [30]....

    [...]

Journal ArticleDOI
TL;DR: An extensive evaluation of the state-of-the-art online object-tracking algorithms with various evaluation criteria is carried out to identify effective approaches for robust tracking and provide potential future research directions in this field.
Abstract: Object tracking has been one of the most important and active research areas in the field of computer vision. A large number of tracking algorithms have been proposed in recent years with demonstrated success. However, the set of sequences used for evaluation is often not sufficient or is sometimes biased for certain types of algorithms. Many datasets do not have common ground-truth object positions or extents, and this makes comparisons among the reported quantitative results difficult. In addition, the initial conditions or parameters of the evaluated tracking algorithms are not the same, and thus, the quantitative results reported in literature are incomparable or sometimes contradictory. To address these issues, we carry out an extensive evaluation of the state-of-the-art online object-tracking algorithms with various evaluation criteria to understand how these methods perform within the same framework. In this work, we first construct a large dataset with ground-truth object positions and extents for tracking and introduce the sequence attributes for the performance analysis. Second, we integrate most of the publicly available trackers into one code library with uniform input and output formats to facilitate large-scale performance evaluation. Third, we extensively evaluate the performance of 31 algorithms on 100 sequences with different initialization settings. By analyzing the quantitative results, we identify effective approaches for robust tracking and provide potential future research directions in this field.

2,974 citations


"Bag of Visual Words based Correlati..." refers background or methods in this paper

  • ...Benchmark 2015 (OTB2015) The benchmark dataset contains 100 sequences with groundtruth annotation....

    [...]

  • ...The discriminative trackers are more competing since the use of background information has been found to be advantageous as shown in [29], [30]....

    [...]

  • ...For a generic and extensive demonstration, the proposed BoVW_CFT algorithm is evaluated on two publicly available benchmark datasets: OTB2015 [30] and VOT2016 [16]....

    [...]

Proceedings ArticleDOI
13 Jun 2010
TL;DR: A new type of correlation filter is presented, a Minimum Output Sum of Squared Error (MOSSE) filter, which produces stable correlation filters when initialized using a single frame, which enables the tracker to pause and resume where it left off when the object reappears.
Abstract: Although not commonly used, correlation filters can track complex objects through rotations, occlusions and other distractions at over 20 times the rate of current state-of-the-art techniques. The oldest and simplest correlation filters use simple templates and generally fail when applied to tracking. More modern approaches such as ASEF and UMACE perform better, but their training needs are poorly suited to tracking. Visual tracking requires robust filters to be trained from a single frame and dynamically adapted as the appearance of the target object changes. This paper presents a new type of correlation filter, a Minimum Output Sum of Squared Error (MOSSE) filter, which produces stable correlation filters when initialized using a single frame. A tracker based upon MOSSE filters is robust to variations in lighting, scale, pose, and nonrigid deformations while operating at 669 frames per second. Occlusion is detected based upon the peak-to-sidelobe ratio, which enables the tracker to pause and resume where it left off when the object reappears.

2,948 citations


"Bag of Visual Words based Correlati..." refers methods in this paper

  • ...encodes the target appearance using a minimum sum of squared error (MOSSE) filter [4] with an update on each frame....

    [...]

Book ChapterDOI
07 Oct 2012
TL;DR: Using the well-established theory of Circulant matrices, this work provides a link to Fourier analysis that opens up the possibility of extremely fast learning and detection with the Fast Fourier Transform, which can be done in the dual space of kernel machines as fast as with linear classifiers.
Abstract: Recent years have seen greater interest in the use of discriminative classifiers in tracking systems, owing to their success in object detection. They are trained online with samples collected during tracking. Unfortunately, the potentially large number of samples becomes a computational burden, which directly conflicts with real-time requirements. On the other hand, limiting the samples may sacrifice performance. Interestingly, we observed that, as we add more and more samples, the problem acquires circulant structure. Using the well-established theory of Circulant matrices, we provide a link to Fourier analysis that opens up the possibility of extremely fast learning and detection with the Fast Fourier Transform. This can be done in the dual space of kernel machines as fast as with linear classifiers. We derive closed-form solutions for training and detection with several types of kernels, including the popular Gaussian and polynomial kernels. The resulting tracker achieves performance competitive with the state-of-the-art, can be implemented with only a few lines of code and runs at hundreds of frames-per-second. MATLAB code is provided in the paper (see Algorithm 1).

2,197 citations


"Bag of Visual Words based Correlati..." refers methods in this paper

  • ...The CSK [13] method uses correlation filters in a kernel space and illumination intensity features for object representation....

    [...]