Real-time compressive tracking
Summary (3 min read)
1 Introduction
- Object tracking remains a challenging problem due to appearance change caused by pose, illumination, occlusion, and motion, among others.
- Since there are only a few samples at the outset, most tracking algorithms often assume that the target appearance does not change much during this period.
- As the appearance model is updated with noisy and potentially misaligned examples, this often leads to the tracking drift problem. [7] propose an online semi-supervised boosting method to alleviate the drift problem in which only the samples in the first frame are labeled and all the other samples are unlabeled.
- The main components of their compressive tracking algorithm are shown by Figure 1 .
- In their appearance model, features are selected by an information-preserving and non-adaptive dimensionality reduction from the multi-scale image feature space based on compressive sensing theories [12, 13] .
2.1 Random projection
- Ideally, the authors expect R provides a stable embedding that approximately preserves the distance between all pairs of original signals.
- The Johnson-Lindenstrauss lemma [16] states that with high probability the distances between the points in a vector space are preserved if they are projected onto a randomly selected subspace with suitably high dimensions.
- Baraniuk et al. [17] proved that the random matrix satisfying the Johnson-Lindenstrauss lemma also holds true for the restricted isometry property in compressive sensing.
- This very strong theoretical support motivates us to analyze the high-dimensional signals via its low-dimensional random projections.
- In the proposed algorithm, the authors use a very sparse matrix that not only satisfies the Johnson-Lindenstrauss lemma, but also can be efficiently computed for real-time tracking.
2.2 Random measurement matrix
- As the matrix is dense, the memory and computational loads are still large when m is large.
- The authors adopt a very sparse random measurement matrix with entries defined as EQUATION Achlioptas [16] proved that this type of matrix with s = 2 or 3 satisfies the Johnson-Lindenstrauss lemma.
- This matrix is very easy to compute which requires only a uniform random generator.
- Therefore, the computational complexity is only O(cn) which is very low.
- Furthermore, the authors only need to store the nonzero entries of R which makes the memory requirement also very light.
3 Proposed Algorithm
- The authors assume that the tracking window in the first frame has been determined.
- At each frame, the authors sample some positive samples near the current target location and negative samples far away from the object center to update the classifier.
- To predict the object location in the next frame, the authors draw some samples around the current target location and determine the one with the maximal classification score.
- In the matrix R, dark, gray and white rectangles represent negative, positive, and zero entries, respectively.
3.1 Efficient dimensionality reduction
- The random matrix R needs to be computed only once off-line and remains fixed throughout the tracking process.
- For the sparse matrix R in (2), the computational load is very light.
- Then, v can be efficiently computed by using R to sparsely measure the rectangular features which can be efficiently computed using the integral image method [20] .
3.2 Analysis of low-dimensional compressive features
- As the coefficients in the measurement matrix can be positive or negative (via (2)), the compressive features compute the relative intensity difference in a way similar to the generalized Haar-like features [8].
- The basic types of these Haar-like features are typically designed for different tasks [20, 21] .
- This problem is alleviated by boosting algorithms for selecting important features [20, 21] .
- In their work, the large set of Haar-like features are compressively sensed with a very sparse measurement matrix.
- Therefore, the authors can classify the projected features in the compressed domain efficiently without curse of dimensionality.
3.3 Classifier construction and update
- Diaconis and Freedman [23] showed that the random projections of high dimensional random vectors are almost always Gaussian.
- The above equations can be easily derived by maximal likelihood estimation.
- Figure 3 shows the probability distributions for three different features of the positive and negative samples cropped from a few frames of a sequence for clarity of presentation.
- The main steps of their algorithm are summarized in Algorithm 1.
3. Sample two sets of image patches
- Extract the features with these two sets of samples and update the classifier parameters according to (6) .
- Tracking location lt and classifier parameters, also known as Output.
3.4 Discussion
- The authors note that simplicity is the prime characteristic of their algorithm in which the proposed sparse measurement matrix R is independent of any training samples, thereby resulting in a very efficient method.
- It should be noted that their algorithm is different from the recently proposed 1 -tracker [10] and compressive sensing tracker [9] .
- The sample in red rectangle is the most "correct" positive sample while other two in yellow rectangles are less "correct" positive samples.
- These methods need to update the appearance models frequently for robust tracking.
- Similar representations, e.g., local binary patterns [26] and generalized Haar-like features [8] , have been shown to be more effective in handling occlusion.
4 Experiments
- The authors evaluate their tracking algorithm with 7 state-or-the-art methods on 20 challenging sequences among which 16 are publicly available and 4 are their own.
- The Animal, Shaking and Soccer sequences are provided in [28] and the Box and Jumping are from [29] .
- The authors note that the source code of [9] is not available for evaluation and the implementation requires some technical details and parameters not discussed therein.
- It is worth noticing that the authors use the most challenging sequences from the existing works.
- For their compared trackers, the authors either use the tuned parameters from the source codes or empirically set them for best results.
4.2 Experimental results
- All of the video frames are in gray scale and the authors use two metrics to evaluate the proposed algorithm with 7 state-of-the-art trackers.
- The authors note that although TLD tracker is able to relocate on the target during tracking, it is easy to lose the target completely for some frames in most of the test sequences.
- For the David indoor sequence shown in Figure 5 (a), the illumination and pose of the object both change gradually.
- The MILTrack, TLD and Struck methods perform well on this sequence.
- In addition, their tracker performs well on the Sylvester and Panda sequences in which the target objects undergo significant pose changes (See the supplementary material for details).
5 Concluding Remarks
- The authors proposed a simple yet robust tracking algorithm with an appearance model based on non-adaptive random projections that preserve the structure of original image space.
- A very sparse measurement matrix was adopted to efficiently compress features from the foreground targets and background ones.
- The tracking task was formulated as a binary classification problem with online update in the compressed domain.
- The authors algorithm combines the merits of generative and discriminative appearance models to account for scene changes.
- Numerous experiments with state-of-the-art algorithms on challenging sequences demonstrated that the proposed algorithm performs well in terms of accuracy, robustness, and speed.
Did you find this useful? Give us your feedback
Citations
4,994 citations
Cites background from "Real-time compressive tracking"
...The task of tracking, a crucial component of many computer vision systems, can be naturally specified as an online learning problem [1], [2]....
[...]
...F...
[...]
...An extremely challenging factor is the virtually unlimited amount of negative samples that can be obtained from an image....
[...]
3,828 citations
2,974 citations
Cites background from "Real-time compressive tracking"
...Here, we discuss the relevant performance evaluation work on object tracking and challenging factors in object tracking....
[...]
2,038 citations
Cites background or methods from "Real-time compressive tracking"
...We compare our approach with 11 state-of-the-art trackers: CT [19], TLD [15], DFT [17], EDFT [6], ASLA [14], L1APG [1], CSK [11], SCM [20], LOT [16], Struck [9] and LSHT [10], which have shown to provide excellent performance in literature....
[...]
...Ours ASLA [14] SCM [20] Struck [9] TLD [15] EDFT [6] L1APG [1] DFT [17] LOT [16] CSK [11] LSHT [10] CT [19] Median OP 75....
[...]
...In recent years, tracking-by-detection methods [3, 9, 11, 19] have shown to provide excellent tracking performance....
[...]
1,812 citations
Cites background from "Real-time compressive tracking"
...Ours DLT KCF STC Struck SCM CT LSHT CSK MIL TLD MEEM TGPR [30] [17] [35] [13] [38] [36] [15] [16] [1] [19] [34] [10] DP rate (%) I 89....
[...]
...These trackers can be broadly categorized into three classes: (i) deep learning tracker DLT [30] (ii) correlation filter trackers including the CSK [16], STC [35], and KCF [17]; and (iii) representative tracking algorithms using single or multiple online classifiers, including the MIL [1], Struck [13], CT [36], LSHT [15], TLD [19], SCM [38], MEEM [34], and TGPR [10] methods....
[...]
References
18,620 citations
"Real-time compressive tracking" refers background or methods in this paper
...This problem is alleviated by boosting algorithms for selecting important features [20, 21]....
[...]
...The basic types of these Haar-like features are typically designed for different tasks [20, 21]....
[...]
...Then, v can be efficiently computed by using R to sparsely measure the rectangular features which can be efficiently computed using the integral image method [20]....
[...]
...The Haar-like features have been widely used for object detection with demonstrated success [20, 21, 8]....
[...]
[...]
18,609 citations
9,658 citations
"Real-time compressive tracking" refers background in this paper
...A typical measurement matrix satisfying the restricted isometry property is the random Gaussian matrix R ∈ Rn×m where rij ∼ N(0, 1), as used in numerous works recently [14, 9, 18]....
[...]
6,853 citations
"Real-time compressive tracking" refers background or methods in this paper
...Another bound derived from the restricted isometry property in compressive sensing [15] is much tighter than that from Johnson-Lindenstrauss lemma, where n ≥ κβ log(m/β) and κ and β are constants....
[...]
...We use a very sparse measurement matrix that satisfies the restricted isometry property (RIP) [15], thereby facilitating efficient projection from the image feature space to a low-dimensional compressed subspace....
[...]
6,342 citations
"Real-time compressive tracking" refers methods in this paper
...In our appearance model, features are selected by an information-preserving and non-adaptive dimensionality reduction from the multi-scale image feature space based on compressive sensing theories [12, 13]....
[...]
Related Papers (5)
Frequently Asked Questions (13)
Q2. What is the tracker for the sylvester sequence?
The generative subspace tracker (e.g., IVT [6]) has been shown to be effective in dealing with large illumination changes while the discriminative tracking method with local features (i.e., MILTrack [8]) has been demonstrated to handle pose variation adequately.
Q3. What is the way to measure the distance between the original signals?
the authors expect R provides a stable embedding that approximately preserves the distance between all pairs of original signals.
Q4. What is the main component of their appearance model?
Their appearance model is generative as the object can be well represented based on the features extracted in the compressive domain.
Q5. How can the authors compute the relative intensity difference in a linear way?
As the coefficients in the measurement matrix can be positive or negative (via (2)), the compressive features compute the relative intensity difference in a way similar to the generalized Haar-like features [8]
Q6. What is the implementation of the tracker?
Their tracker is implemented in MATLAB, which runs at 35 frames per second (FPS) on a Pentium Dual-Core 2.80 GHz CPU with 4 GB RAM.
Q7. What is the purpose of the tracker?
Both their tracker and the MILTrack method are designed to handle object location ambiguity in tracking with classifiers and discriminative features.
Q8. Why does the TLD tracker suffer from the same problem?
Because the TLD tracker relies heavily on the visual information in the first frame to re-detect the object, it also suffers from the same problem.
Q9. How do the authors represent each filtered image as a column vector in Rwh?
the authors represent each filtered image as a column vector in Rwh and then concatenate these vectors as a very high-dimensional multi-scale image feature vector x = (x1, ..., xm)> ∈
Q10. How many times do the authors run the trackers?
Since all of the trackers except for Frag involve randomness, the authors run them 10 times and report the average result for each video clip.
Q11. What is the classification of the generative tracking algorithm?
Tracking algorithms can be generally categorized as either generative [1, 2, 6, 10, 9] or discriminative [3–5, 7, 8] based on their appearance models.
Q12. What is the tracker for the Shaking sequence?
For the Shaking sequence shown in Figure 5(b), when the stage light changes drastically and the pose of the subject changes rapidly as he performs, all the other trackers fail to track the object reliably.
Q13. What is the problem with the appearance model?
As the appearance model is updated with noisy and potentially misaligned examples, this often leads to the tracking drift problem.