What is the tracker for the sylvester sequence?

The generative subspace tracker (e.g., IVT [6]) has been shown to be effective in dealing with large illumination changes while the discriminative tracking method with local features (i.e., MILTrack [8]) has been demonstrated to handle pose variation adequately.

What is the implementation of the tracker?

Their tracker is implemented in MATLAB, which runs at 35 frames per second (FPS) on a Pentium Dual-Core 2.80 GHz CPU with 4 GB RAM.

What is the purpose of the tracker?

Both their tracker and the MILTrack method are designed to handle object location ambiguity in tracking with classifiers and discriminative features.

Why does the TLD tracker suffer from the same problem?

Because the TLD tracker relies heavily on the visual information in the first frame to re-detect the object, it also suffers from the same problem.

How do the authors represent each filtered image as a column vector in Rwh?

the authors represent each filtered image as a column vector in Rwh and then concatenate these vectors as a very high-dimensional multi-scale image feature vector x = (x1, ..., xm)> ∈

How many times do the authors run the trackers?

Since all of the trackers except for Frag involve randomness, the authors run them 10 times and report the average result for each video clip.

What is the tracker for the Shaking sequence?

For the Shaking sequence shown in Figure 5(b), when the stage light changes drastically and the pose of the subject changes rapidly as he performs, all the other trackers fail to track the object reliably.

(Open Access) Real-time compressive tracking (2012) | Kaihua Zhang

Q: What contributions have the authors mentioned in the paper "Real-time compressive tracking" ?

In this paper, the authors propose a simple yet effective and efficient tracking algorithm with an appearance model based on features extracted from the multi-scale image feature space with data-independent basis.

Q: What is the way to measure the distance between the original signals?

the authors expect R provides a stable embedding that approximately preserves the distance between all pairs of original signals.

Q: What is the main component of their appearance model?

Their appearance model is generative as the object can be well represented based on the features extracted in the compressive domain.

Q: How can the authors compute the relative intensity difference in a linear way?

As the coefficients in the measurement matrix can be positive or negative (via (2)), the compressive features compute the relative intensity difference in a way similar to the generalized Haar-like features [8]

Real-time Compressive Tracking

Kaihua Zhang

, Lei Zhang

, and Ming-Hsuan Yang

Depart. of Computing, Hong Kong Polytechnic University

Electrical Engineering and Computer Science, University of California at Merced

{cskhzhang,cslzhang}@comp.polyu.edu.hk, mhyang@ucmerced.edu

Abstract. It is a challenging task to develop eﬀective and eﬃcient ap-

pearance models for robust object tracking due to factors such as pose

variation, illumination change, occlusion, and motion blur. Existing on-

line tracking algorithms often update models with samples from obser-

vations in recent frames. While much success has been demonstrated,

numerous issues remain to be addressed. First, while these adaptive

appearance models are data-dependent, there does not exist suﬃcien-

t amount of data for online algorithms to learn at the outset. Second,

online tracking algorithms often encounter the drift problems. As a re-

sult of self-taught learning, these mis-aligned samples are likely to be

added and degrade the appearance models. In this paper, we propose a

simple yet eﬀective and eﬃcient tracking algorithm with an appearance

model based on features extracted from the multi-scale image feature

space with data-independent basis. Our appearance model employs non-

adaptive random projections that preserve the structure of the image

feature space of objects. A very sparse measurement matrix is adopted

to eﬃciently extract the features for the appearance model. We com-

press samples of foreground targets and the background using the same

sparse measurement matrix. The tracking task is formulated as a binary

classiﬁcation via a naive Bayes classiﬁer with online update in the com-

pressed domain. The proposed compressive tracking algorithm runs in

real-time and performs favorably against state-of-the-art algorithms on

challenging sequences in terms of eﬃciency, accuracy and robustness.

1 Introduction

Despite that numerous algorithms have been proposed in the literature, object

tracking remains a challenging problem due to appearance change caused by

pose, illumination, occlusion, and motion, among others. An eﬀective appearance

model is of prime importance for the success of a tracking algorithm that has

been attracting much attention in recent years [1–10]. Tracking algorithms can

be generally categorized as either generative [1, 2, 6, 10, 9] or discriminative [3–5,

7, 8] based on their appearance models.

Generative tracking algorithms typically learn a model to represent the target

object and then use it to search for the image region with minimal reconstruction

error. Black et al. [1] learn an oﬀ-line subspace model to represent the object of

interest for tracking. The IVT method [6] utilizes an incremental subspace model

2 Kaihua Zhang

, Lei Zhang

, and Ming-Hsuan Yang

to adapt appearance changes. Recently, sparse representation has been used in

the `

-tracker where an object is modeled by a sparse linear combination of target

and trivial templates [10]. However, the computational complexity of this tracker

is rather high, thereby limiting its applications in real-time scenarios. Li et al. [9]

further extend the `

-tracker by using the orthogonal matching pursuit algorithm

for solving the optimization problems eﬃciently. Despite much demonstrated

success of these online generative tracking algorithms, several problems remain

to be solved. First, numerous training samples cropped from consecutive frames

are required in order to learn an appearance model online. Since there are only

a few samples at the outset, most tracking algorithms often assume that the

target appearance does not change much during this period. However, if the

appearance of the target changes signiﬁcantly at the beginning, the drift problem

is likely to occur. Second, when multiple samples are drawn at the current target

location, it is likely to cause drift as the appearance model needs to adapt to

these potentially mis-aligned examples [8]. Third, these generative algorithms do

not use the background information which is likely to improve tracking stability

and accuracy.

Discriminative algorithms pose the tracking problem as a binary classiﬁcation

task in order to ﬁnd the decision boundary for separating the target object from

the background. Avidan [3] extends the optical ﬂow approach with a support

vector machine classiﬁer for object tracking. Collins et al. [4] demonstrate that

the most discriminative features can be learned online to separate the target

object from the background. Grabner et al. [5] propose an online boosting algo-

rithm to select features for tracking. However, these trackers [3–5] only use one

positive sample (i.e., the current tracker location) and a few negative samples

when updating the classiﬁer. As the appearance model is updated with noisy and

potentially misaligned examples, this often leads to the tracking drift problem.

Grabner et al. [7] propose an online semi-supervised boosting method to allevi-

ate the drift problem in which only the samples in the ﬁrst frame are labeled

and all the other samples are unlabeled. Babenko et al. [8] introduce multiple

instance learning into online tracking where samples are considered within posi-

tive and negative bags or sets. Recently, a semi-supervised learning approach [11]

is developed in which positive and negative samples are selected via an online

classiﬁer with structural constraints.

In this paper, we propose an eﬀective and eﬃcient tracking algorithm with an

appearance model based on features extracted in the compressed domain. The

main components of our compressive tracking algorithm are shown by Figure 1.

Our appearance model is generative as the object can be well represented based

on the features extracted in the compressive domain. It is also discriminative

because we use these features to separate the target from the surrounding back-

ground via a naive Bayes classiﬁer. In our appearance model, features are select-

ed by an information-preserving and non-adaptive dimensionality reduction from

the multi-scale image feature space based on compressive sensing theories [12,

13]. It has been demonstrated that a small number of randomly generated linear

measurements can preserve most of the salient information and allow almost

Real-time Compressive Tracking 3

⎛⎞⎛⎞⎛⎞

⎜⎟⎜⎟⎜⎟

⎜⎟⎜

•••

⎟⎜ ⎟

⎜⎟⎜⎟⎜⎟

⎜

••

⎟⎜ ⎟⎜ ⎟

⎝⎠⎝⎠⎝⎠

•

⎛⎞⎛⎞⎛⎞

⎜⎟⎜⎟⎜⎟

⎜⎟⎜

•••

⎟⎜ ⎟

⎜⎟⎜⎟⎜⎟

⎜

••

⎟⎜ ⎟⎜ ⎟

⎝⎠⎝⎠⎝⎠

•

⊗

⎛⎞⎛⎞⎛⎞

⎜⎟⎜⎟⎜⎟

⎜⎟⎜⎟

•••

⎜⎟

⎝⎠ ⎝

•

⎠

•

⎝⎠

•

⎛⎞⎛⎞⎛⎞

⎜⎟⎜⎟⎜⎟

⎜⎟⎜⎟

•••

⎜⎟

⎝⎠ ⎝

•

⎠

•

⎝⎠

•

⎛⎞⎛⎞⎛⎞

⎜⎟⎜⎟⎜⎟

⎜⎟⎜⎟

•••

⎜⎟

⎝⎠ ⎝

•

⎠

•

⎝⎠

•

⎛⎞⎛⎞⎛⎞

⎜⎟⎜⎟⎜⎟

⎜⎟⎜⎟

•••

⎜⎟

⎝⎠ ⎝

•

⎠

•

⎝⎠

•

Frame(t)

Samples

Multiscale

filter bank

Sparse

measurement

matrix

Compressed

vectors

Classifer

Multiscale

image features

(a) Updating classiﬁer at the t-th frame

Sparse

measurement

matrix

Compressed

vectors

Multiscale

filter bank

Frame(t+1)

Sample with maximal

classifier response

Classifier

Multiscale

image features

(b) Tracking at the (t + 1)-th frame

Fig. 1. Main components of our compressive tracking algorithm.

perfect reconstruction of the signal if the signal is compressible such as natural

images or audio [12–14]. We use a very sparse measurement matrix that satisﬁes

the restricted isometry property (RIP) [15], thereby facilitating eﬃcient projec-

tion from the image feature space to a low-dimensional compressed subspace. For

tracking, the positive and negative samples are projected (i.e., compressed) with

the same sparse measurement matrix and discriminated by a simple naive Bayes

classiﬁer learned online. The proposed compressive tracking algorithm runs at

real-time and performs favorably against state-of-the-art trackers on challenging

sequences in terms of eﬃciency, accuracy and robustness.

2 Preliminaries

We present some preliminaries of compressive sensing which are used in the

proposed tracking algorithm.

2.1 Random projection

A random matrix R ∈ R

n×m

whose rows have unit length projects data from

the high-dimensional image space x ∈ R

to a lower-dimensional space v ∈ R

v = Rx, (1)

where n  m. Ideally, we expect R provides a stable embedding that approxi-

mately preserves the distance between all pairs of original signals. The Johnson-

Lindenstrauss lemma [16] states that with high probability the distances between

the points in a vector space are preserved if they are projected onto a random-

ly selected subspace with suitably high dimensions. Baraniuk et al. [17] proved

that the random matrix satisfying the Johnson-Lindenstrauss lemma also holds

4 Kaihua Zhang

, Lei Zhang

, and Ming-Hsuan Yang

true for the restricted isometry property in compressive sensing. Therefore, if

the random matrix R in (1) satisﬁes the Johnson-Lindenstrauss lemma, we can

reconstruct x with minimum error from v with high probability if x is com-

pressive such as audio or image. We can ensure that v preserves almost all

the information in x. This very strong theoretical support motivates us to ana-

lyze the high-dimensional signals via its low-dimensional random projections. In

the proposed algorithm, we use a very sparse matrix that not only satisﬁes the

Johnson-Lindenstrauss lemma, but also can be eﬃciently computed for real-time

tracking.

2.2 Random measurement matrix

A typical measurement matrix satisfying the restricted isometry property is the

random Gaussian matrix R ∈ R

n×m

where r

∼ N (0, 1), as used in numerous

works recently [14, 9, 18]. However, as the matrix is dense, the memory and

computational loads are still large when m is large. In this paper, we adopt a

very sparse random measurement matrix with entries deﬁned as

√

s ×







1 with probability

0 with probability 1 −

−1 with probability

(2)

Achlioptas [16] proved that this type of matrix with s = 2 or 3 satisﬁes the

Johnson-Lindenstrauss lemma. This matrix is very easy to compute which re-

quires only a uniform random generator. More importantly, when s = 3, it is

very sparse where two thirds of the computation can be avoided. In addition, Li

et al. [19] showed that for s = O(m) (x ∈ R

), this matrix is asymptotically

normal. Even when s = m/ log(m), the random projections are almost as accu-

rate as the conventional random projections where r

∼ N(0, 1). In this work,

we set s = m/4 which makes a very sparse random matrix. For each row of R,

only about c, c ≤ 4, entries need to be computed. Therefore, the computational

complexity is only O(cn) which is very low. Furthermore, we only need to store

the nonzero entries of R which makes the memory requirement also very light.

3 Proposed Algorithm

In this section, we present our tracking algorithm in details. The tracking prob-

lem is formulated as a detection task and our algorithm is shown in Figure 1.

We assume that the tracking window in the ﬁrst frame has been determined. At

each frame, we sample some positive samples near the current target location

and negative samples far away from the object center to update the classiﬁer. To

predict the object location in the next frame, we draw some samples around the

current target location and determine the one with the maximal classiﬁcation

score.

Real-time Compressive Tracking 5

⎡

⎤

⎢

⎥

⎢

⎥

⎢

⎥

⎢

⎥

⎢

⎥

⎢

⎥

⎢

⎥

⎢

⎥

⎢

⎥

⎢

⎥

⎢

⎥

⎢

⎥

⎢

⎥

⎢

⎥

⎣

⎦

⎡

⎤

⎢

⎥

⎢

⎥

⎢

⎥

⎢

⎥

⎢

⎥

⎢

⎥

⎢

⎥

⎣

⎦

iijj

vrx=

∑

Fig. 2. Graphical representation of compressing a high-dimensional vector x to a low-

dimensional vector v. In the matrix R, dark, gray and white rectangles represent neg-

ative, positive, and zero entries, respectively. The blue arrows illustrate that one of

nonzero entries of one row of R sensing an element in x is equivalent to a rectangle

ﬁlter convolving the intensity at a ﬁxed position of an input image.

3.1 Eﬃcient dimensionality reduction

For each sample z ∈ R

w×h

, to deal with the scale problem, we represent it by

convolving z with a set of rectangle ﬁlters at multiple scales {h

1,1

, . . . , h

w,h

}

deﬁned as

i,j

(x, y) =



1, 1≤ x ≤ i, 1≤ y ≤ j

0, otherwise

(3)

where i and j are the width and height of a rectangle ﬁlter, respectively. Then,

we represent each ﬁltered image as a column vector in R

and then concatenate

these vectors as a very high-dimensional multi-scale image feature vector x =

, ..., x

)

∈ R

where m = (wh)

. The dimensionality m is typically in the

order of 10

to 10

. We adopt a sparse random matrix R in (2) with s = m/4

to project x onto a vector v ∈ R

in a low-dimensional space. The random

matrix R needs to be computed only once oﬀ-line and remains ﬁxed throughout

the tracking process. For the sparse matrix R in (2), the computational load is

very light. As shown by Figure 2, we only need to store the nonzero entries in

R and the positions of rectangle ﬁlters in an input image corresponding to the

nonzero entries in each row of R. Then, v can be eﬃciently computed by using R

to sparsely measure the rectangular features which can be eﬃciently computed

using the integral image method [20].

3.2 Analysis of low-dimensional compressive features

As shown in Figure 2, each element v

in the low-dimensional feature v ∈ R

is a linear combination of spatially distributed rectangle features at diﬀerent s-

cales. As the coeﬃcients in the measurement matrix can be positive or negative

(via (2)), the compressive features compute the relative intensity diﬀerence in

a way similar to the generalized Haar-like features [8] (See also Figure 2). The

Haar-like features have been widely used for object detection with demonstrat-

ed success [20, 21, 8]. The basic types of these Haar-like features are typically

Real-time compressive tracking

Figures

Citations

Two-stage modality-graphs regularized manifold ranking for RGB-T tracking

Density-Preserving Hierarchical EM Algorithm: Simplifying Gaussian Mixture Models for Approximate Inference

Real-Time and Deep Learning Based Vehicle Detection and Classification Using Pixel-Wise Code Exposure Measurements

Diminished reality using appearance and 3D geometry of internet photo collections

A system on chip-based real-time tracking system for amphibious spherical robots:

References

Rapid object detection using a boosted cascade of simple features

Compressed sensing

Robust Face Recognition via Sparse Representation

Decoding by linear programming

Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?

Related Papers (5)

Online Object Tracking: A Benchmark

Incremental Learning for Robust Visual Tracking

Struck: Structured output tracking with kernels

Tracking-Learning-Detection

Robust Object Tracking with Online Multiple Instance Learning

Frequently Asked Questions (13)

Q1. What contributions have the authors mentioned in the paper "Real-time compressive tracking" ?

Q2. What is the tracker for the sylvester sequence?

Q3. What is the way to measure the distance between the original signals?

Q4. What is the main component of their appearance model?

Q5. How can the authors compute the relative intensity difference in a linear way?

Q6. What is the implementation of the tracker?

Q7. What is the purpose of the tracker?

Q8. Why does the TLD tracker suffer from the same problem?

Q9. How do the authors represent each filtered image as a column vector in Rwh?

Q10. How many times do the authors run the trackers?

Q11. What is the classification of the generative tracking algorithm?

Q12. What is the tracker for the Shaking sequence?

Q13. What is the problem with the appearance model?

Trending Questions (1)