scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Occlusion-Aware Real-Time Object Tracking

01 Apr 2017-IEEE Transactions on Multimedia (IEEE)-Vol. 19, Iss: 4, pp 763-771
TL;DR: A new real-time occlusion-aware visual tracking algorithm that achieves better performance than state-of-the-art methods based on a novel two-stage classifier with circulant structure with kernel, named integrated circulan structure kernels (ICSK).
Abstract: The online learning methods are popular for visual tracking because of their robust performance for most video sequences. However, the drifting problem caused by noisy updates is still a challenge for most highly adaptive online classifiers. In visual tracking, target object appearance variation, such as deformation and long-term occlusion, easily causes noisy updates. To overcome this problem, a new real-time occlusion-aware visual tracking algorithm is introduced. First, we learn a novel two-stage classifier with circulant structure with kernel, named integrated circulant structure kernels (ICSK). The first stage is applied for transition estimation and the second is used for scale estimation. The circulant structure makes our algorithm realize fast learning and detection. Then, the ICSK is used to detect the target without occlusion and build a classifier pool to save these classifiers with noisy updates. When the target is in heavy occlusion or after long-term occlusion, we redetect it using an optimal classifier selected from the classifier-pool according to an entropy minimization criterion. Extensive experimental results on the full benchmark demonstrate our real-time algorithm achieves better performance than state-of-the-art methods.
Citations
More filters
Book ChapterDOI
08 Sep 2018
TL;DR: A novel triplet loss is proposed to extract expressive deep feature for object tracking by adding it into Siamese network framework instead of pairwise loss for training.
Abstract: Object tracking is still a critical and challenging problem with many applications in computer vision. For this challenge, more and more researchers pay attention to applying deep learning to get powerful feature for better tracking accuracy. In this paper, a novel triplet loss is proposed to extract expressive deep feature for object tracking by adding it into Siamese network framework instead of pairwise loss for training. Without adding any inputs, our approach is able to utilize more elements for training to achieve more powerful feature via the combination of original samples. Furthermore, we propose a theoretical analysis by combining comparison of gradients and back-propagation, to prove the effectiveness of our method. In experiments, we apply the proposed triplet loss for three real-time trackers based on Siamese network. And the results on several popular tracking benchmarks show our variants operate at almost the same frame-rate with baseline trackers and achieve superior tracking performance than them, as well as the comparable accuracy with recent state-of-the-art real-time trackers.

506 citations

Journal ArticleDOI
TL;DR: This paper proposes a novel multiple sparse representation framework for visual tracking which jointly exploits the shared and feature-specific properties of different features by decomposing multiple sparsity patterns and introduces a novel online multiple metric learning to efficiently and adaptively incorporate the appearance proximity constraint.
Abstract: The use of multiple features has been shown to be an effective strategy for visual tracking because of their complementary contributions to appearance modeling. The key problem is how to learn a fused representation from multiple features for appearance modeling. Different features extracted from the same object should share some commonalities in their representations while each feature should also have some feature-specific representation patterns which reflect its complementarity in appearance modeling. Different from existing multi-feature sparse trackers which only consider the commonalities among the sparsity patterns of multiple features, this paper proposes a novel multiple sparse representation framework for visual tracking which jointly exploits the shared and feature-specific properties of different features by decomposing multiple sparsity patterns. Moreover, we introduce a novel online multiple metric learning to efficiently and adaptively incorporate the appearance proximity constraint, which ensures that the learned commonalities of multiple features are more representative. Experimental results on tracking benchmark videos and other challenging videos demonstrate the effectiveness of the proposed tracker.

207 citations


Cites methods from "Occlusion-Aware Real-Time Object Tr..."

  • ...[60] proposed the occlusion-aware real-time object tracking algorithm which exploits the entropy minimization criterion to select the optimal classifiers from a classifier pool....

    [...]

Proceedings ArticleDOI
18 Jun 2018
TL;DR: This work proposes a novel hyperparameter optimization method that can find optimal hyperparameters for a given sequence using an action-prediction network leveraged on Continuous Deep Q-Learning.
Abstract: Hyperparameters are numerical presets whose values are assigned prior to the commencement of the learning process. Selecting appropriate hyperparameters is critical for the accuracy of tracking algorithms, yet it is difficult to determine their optimal values, in particular, adaptive ones for each specific video sequence. Most hyperparameter optimization algorithms depend on searching a generic range and they are imposed blindly on all sequences. Here, we propose a novel hyperparameter optimization method that can find optimal hyperparameters for a given sequence using an action-prediction network leveraged on Continuous Deep Q-Learning. Since the common state-spaces for object tracking tasks are significantly more complex than the ones in traditional control problems, existing Continuous Deep Q-Learning algorithms cannot be directly applied. To overcome this challenge, we introduce an efficient heuristic to accelerate the convergence behavior. We evaluate our method on several tracking benchmarks and demonstrate its superior performance1.

161 citations


Cites methods from "Occlusion-Aware Real-Time Object Tr..."

  • ...Tracker with heat-map: Our hyperparameter framework can be directly applied to the tracker with heat-map, thus we give several real-time related works including some deep learning trackers [3, 48, 16] and recent correlation filter based trackers [5, 19, 9, 10, 34, 58, 2, 12, 52, 24, 14]....

    [...]

Journal ArticleDOI
TL;DR: A novel tracking method is presented by introducing the attention mechanism into the Siamese network to increase its matching discrimination and a new way to fuse multiscale response maps from each layer to obtain a more accurate position estimation of the object is proposed.
Abstract: Visual tracking addresses the problem of localizing an arbitrary target in video according to the annotated bounding box. In this article, we present a novel tracking method by introducing the attention mechanism into the Siamese network to increase its matching discrimination. We propose a new way to compute attention weights to improve matching performance by a sub-Siamese network [Attention Net (A-Net)], which locates attentive parts for solving the searching problem. In addition, features in higher layers can preserve more semantic information while features in lower layers preserve more location information. Thus, in order to solve the tracking failure cases by the higher layer features, we fully utilize location and semantic information by multilevel features and propose a new way to fuse multiscale response maps from each layer to obtain a more accurate position estimation of the object. We further propose a hierarchical attention Siamese network by combining the attention weights and multilayer integration for tracking. Our method is implemented with a pretrained network which can outperform most well-trained Siamese trackers even without any fine-tuning and online updating. The comparison results with the state-of-the-art methods on popular tracking benchmarks show that our method achieves better performance. Our source code and results will be available at https://github.com/shenjianbing/HASN .

133 citations


Cites background from "Occlusion-Aware Real-Time Object Tr..."

  • ...These handcrafted features [7], [27], [47], [50] are developed by human experience, which may lack generalization compared to CNN features....

    [...]

Journal ArticleDOI
TL;DR: This work introduces a semi-supervised video segmentation approach based on an efficient video representation, called as “super-trajectory”, that is capable of extracting the target objects from complex backgrounds, and even reidentifying them after prolonged occlusions, producing high-quality video object segments.
Abstract: We introduce a semi-supervised video segmentation approach based on an efficient video representation, called as “super-trajectory”. A super-trajectory corresponds to a group of compact point trajectories that exhibit consistent motion patterns, similar appearances, and close spatiotemporal relationships. We generate the compact trajectories using a probabilistic model, which enables handling of occlusions and drifts effectively. To reliably group point trajectories, we adopt the density peaks based clustering algorithm that allows capturing rich spatiotemporal relations among trajectories in the clustering process. We incorporate two intuitive mechanisms for segmentation, called as reverse-tracking and object re-occurrence , for robustness and boosting the performance. Building on the proposed video representation, our segmentation method is discriminative enough to accurately propagate the initial annotations in the first frame onto the remaining frames. Our extensive experimental analyses on three challenging benchmarks demonstrate that, our method is capable of extracting the target objects from complex backgrounds, and even reidentifying them after prolonged occlusions, producing high-quality video object segments. The code and results are available at: https://github.com/wenguanwang/SupertrajectorySeg .

131 citations


Additional excerpts

  • ...Semi-supervised methods often rely on optical flow [35], [36] and share similar spirit with video tracking [7], [37], [64]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A new kernelized correlation filter is derived, that unlike other kernel algorithms has the exact same complexity as its linear counterpart, which is called dual correlation filter (DCF), which outperform top-ranking trackers such as Struck or TLD on a 50 videos benchmark, despite being implemented in a few lines of code.
Abstract: The core component of most modern trackers is a discriminative classifier, tasked with distinguishing between the target and the surrounding environment. To cope with natural image changes, this classifier is typically trained with translated and scaled sample patches. Such sets of samples are riddled with redundancies—any overlapping pixels are constrained to be the same. Based on this simple observation, we propose an analytic model for datasets of thousands of translated patches. By showing that the resulting data matrix is circulant, we can diagonalize it with the discrete Fourier transform, reducing both storage and computation by several orders of magnitude. Interestingly, for linear regression our formulation is equivalent to a correlation filter, used by some of the fastest competitive trackers. For kernel regression, however, we derive a new kernelized correlation filter (KCF), that unlike other kernel algorithms has the exact same complexity as its linear counterpart. Building on it, we also propose a fast multi-channel extension of linear correlation filters, via a linear kernel, which we call dual correlation filter (DCF). Both KCF and DCF outperform top-ranking trackers such as Struck or TLD on a 50 videos benchmark, despite running at hundreds of frames-per-second, and being implemented in a few lines of code (Algorithm 1). To encourage further developments, our tracking framework was made open-source.

4,994 citations


"Occlusion-Aware Real-Time Object Tr..." refers background or methods in this paper

  • ...THE FIRST AND SECOND BEST RESULTS ARE SHOWN BY BOLD AND UNDERLINE CT TLD Struck DFT LOT L1APG SCM CSK CSKCN STC PCOM DSSCF KCF MEEM ROT-SF ROT (ours) CLE 87.1 52.0 46.5 72.2 61.5 76.2 57.8 83.7 58.8 81.1 86.8 43.9 36.0 19.6 34.7 19.4 DP 0.355 0.588 0.688 0.481 0.496 0.497 0.605 0.555 0.661 0.525 0.457 0.715 0.724 0.840 0.760 0.879 OS 0.239 0.507 0.588 0.433 0.398 0.445 0.558 0.467 0.553 0.349 0.397 0.660 0.617 0.723 0.684 0.791 FPS 47.8 29.0 12.8 11.4 0.6 1.1 0.2 325.8 180.2 367.5 7.3 32.6 252.8 15.8 35.0 29.0 Fig....

    [...]

  • ...The correlation-filter-based tracking algorithms include CSK [25], CSK-CN [12], STC [13], DSSCF [32], and KCF [43]....

    [...]

  • ...Recently, in Kernelized Correlation Filter (KCF) [43], Henriques et al. improve their CSK tracker by using HOG features and provide a high speed which is only slightly lower than CSK....

    [...]

  • ...The implementations of CSK [25], CSKCN [12], STC [13], PCOM [38], DSSCF [32], KCF [43], MEEM [30], and Struck [40] are provided by the authors with suggested parameters....

    [...]

  • ...ture, which are substantially the ones with correlation filtering in some special conditions [43]....

    [...]

Proceedings ArticleDOI
23 Jun 2013
TL;DR: Large scale experiments are carried out with various evaluation criteria to identify effective approaches for robust tracking and provide potential future research directions in this field.
Abstract: Object tracking is one of the most important components in numerous applications of computer vision. While much progress has been made in recent years with efforts on sharing code and datasets, it is of great importance to develop a library and benchmark to gauge the state of the art. After briefly reviewing recent advances of online object tracking, we carry out large scale experiments with various evaluation criteria to understand how these algorithms perform. The test image sequences are annotated with different attributes for performance evaluation and analysis. By analyzing quantitative results, we identify effective approaches for robust tracking and provide potential future research directions in this field.

3,828 citations


"Occlusion-Aware Real-Time Object Tr..." refers background or methods or result in this paper

  • ...sequences of them are taken from a recent online object tracking benchmark [3]....

    [...]

  • ...To measure the quantitative performance, we use the following measures from OTB [3], the center location error (CLE), distance precision (DP), overlap success (OS) rate, and speed in terms of frames per second (fps)....

    [...]

  • ...Comparisons with state-of-the-art trackers on the challenging Jogging sequence [3], including out-of-plane rotation, significant deformation, and longterm occlusion....

    [...]

  • ...And the other implementation algorithms are directly taken from the benchmark [3]....

    [...]

  • ...Here, we just give short introduction for these evaluation metric, and more details are referred to [3]....

    [...]

Journal ArticleDOI
TL;DR: A novel tracking framework (TLD) that explicitly decomposes the long-term tracking task into tracking, learning, and detection, and develops a novel learning method (P-N learning) which estimates the errors by a pair of “experts”: P-expert estimates missed detections, and N-ex Expert estimates false alarms.
Abstract: This paper investigates long-term tracking of unknown objects in a video stream. The object is defined by its location and extent in a single frame. In every frame that follows, the task is to determine the object's location and extent or indicate that the object is not present. We propose a novel tracking framework (TLD) that explicitly decomposes the long-term tracking task into tracking, learning, and detection. The tracker follows the object from frame to frame. The detector localizes all appearances that have been observed so far and corrects the tracker if necessary. The learning estimates the detector's errors and updates it to avoid these errors in the future. We study how to identify the detector's errors and learn from them. We develop a novel learning method (P-N learning) which estimates the errors by a pair of “experts”: (1) P-expert estimates missed detections, and (2) N-expert estimates false alarms. The learning process is modeled as a discrete dynamical system and the conditions under which the learning guarantees improvement are found. We describe our real-time implementation of the TLD framework and the P-N learning. We carry out an extensive quantitative evaluation which shows a significant improvement over state-of-the-art approaches.

3,137 citations


"Occlusion-Aware Real-Time Object Tr..." refers background or methods in this paper

  • ...THE FIRST AND SECOND BEST RESULTS ARE SHOWN BY BOLD AND UNDERLINE CT TLD Struck DFT LOT L1APG SCM CSK CSKCN STC PCOM DSSCF KCF MEEM ROT-SF ROT (ours) CLE 87.1 52.0 46.5 72.2 61.5 76.2 57.8 83.7 58.8 81.1 86.8 43.9 36.0 19.6 34.7 19.4 DP 0.355 0.588 0.688 0.481 0.496 0.497 0.605 0.555 0.661 0.525 0.457 0.715 0.724 0.840 0.760 0.879 OS 0.239 0.507 0.588 0.433 0.398 0.445 0.558 0.467 0.553 0.349 0.397 0.660 0.617 0.723 0.684 0.791 FPS 47.8 29.0 12.8 11.4 0.6 1.1 0.2 325.8 180.2 367.5 7.3 32.6 252.8 15.8 35.0 29.0 Fig....

    [...]

  • ...The other state-of-the-art algorithms are CT [24], DFT [34], L1APG [35], SCM [28], LOT [17], TLD [10], PCOM [38], Struct [23], [40], and MEEM [30]....

    [...]

  • ...Our tracker is also faster than the TLD [10] and the MEEM [30] to redetect the target....

    [...]

  • ...[10], PCOM [38], Struct [23], [40], and MEEM [30]....

    [...]

  • ...There exist several algorithms are able to handle partial and heavy occlusion [8], [36], [37], [39], [41], [23], [10], [30]....

    [...]

Proceedings ArticleDOI
13 Jun 2010
TL;DR: A new type of correlation filter is presented, a Minimum Output Sum of Squared Error (MOSSE) filter, which produces stable correlation filters when initialized using a single frame, which enables the tracker to pause and resume where it left off when the object reappears.
Abstract: Although not commonly used, correlation filters can track complex objects through rotations, occlusions and other distractions at over 20 times the rate of current state-of-the-art techniques. The oldest and simplest correlation filters use simple templates and generally fail when applied to tracking. More modern approaches such as ASEF and UMACE perform better, but their training needs are poorly suited to tracking. Visual tracking requires robust filters to be trained from a single frame and dynamically adapted as the appearance of the target object changes. This paper presents a new type of correlation filter, a Minimum Output Sum of Squared Error (MOSSE) filter, which produces stable correlation filters when initialized using a single frame. A tracker based upon MOSSE filters is robust to variations in lighting, scale, pose, and nonrigid deformations while operating at 669 frames per second. Occlusion is detected based upon the peak-to-sidelobe ratio, which enables the tracker to pause and resume where it left off when the object reappears.

2,948 citations


"Occlusion-Aware Real-Time Object Tr..." refers methods in this paper

  • ...Correlation filters have also been widely used in many applications such as object detection and tracking [31], [32]....

    [...]

Book ChapterDOI
07 Oct 2012
TL;DR: Using the well-established theory of Circulant matrices, this work provides a link to Fourier analysis that opens up the possibility of extremely fast learning and detection with the Fast Fourier Transform, which can be done in the dual space of kernel machines as fast as with linear classifiers.
Abstract: Recent years have seen greater interest in the use of discriminative classifiers in tracking systems, owing to their success in object detection. They are trained online with samples collected during tracking. Unfortunately, the potentially large number of samples becomes a computational burden, which directly conflicts with real-time requirements. On the other hand, limiting the samples may sacrifice performance. Interestingly, we observed that, as we add more and more samples, the problem acquires circulant structure. Using the well-established theory of Circulant matrices, we provide a link to Fourier analysis that opens up the possibility of extremely fast learning and detection with the Fast Fourier Transform. This can be done in the dual space of kernel machines as fast as with linear classifiers. We derive closed-form solutions for training and detection with several types of kernels, including the popular Gaussian and polynomial kernels. The resulting tracker achieves performance competitive with the state-of-the-art, can be implemented with only a few lines of code and runs at hundreds of frames-per-second. MATLAB code is provided in the paper (see Algorithm 1).

2,197 citations


"Occlusion-Aware Real-Time Object Tr..." refers background or methods in this paper

  • ...Some existing algorithms [12], [25], [32] have good performance and real-time speed on the videos without occlusion....

    [...]

  • ...circulant structure with kernel (CSK) [25] can handle hundreds of frames per second (fps) for general tracking tasks....

    [...]

  • ...The implementations of CSK [25], CSKCN [12], STC [13], PCOM [38], DSSCF [32], KCF [43], MEEM [30], and Struck [40] are provided by the authors with suggested parameters....

    [...]

  • ...According to the property of circulant matrices [25], the Fast Fourier Transformation (FFT) is applied to minimize the cost function....

    [...]

  • ...Recently, some online learning algorithms with circulant structure [12], [25] are proposed for visual tracking....

    [...]