scispace - formally typeset
Open AccessBook ChapterDOI

Real-time compressive tracking

Reads0
Chats0
TLDR
A simple yet effective and efficient tracking algorithm with an appearance model based on features extracted from the multi-scale image feature space with data-independent basis that performs favorably against state-of-the-art algorithms on challenging sequences in terms of efficiency, accuracy and robustness.
Abstract
It is a challenging task to develop effective and efficient appearance models for robust object tracking due to factors such as pose variation, illumination change, occlusion, and motion blur. Existing online tracking algorithms often update models with samples from observations in recent frames. While much success has been demonstrated, numerous issues remain to be addressed. First, while these adaptive appearance models are data-dependent, there does not exist sufficient amount of data for online algorithms to learn at the outset. Second, online tracking algorithms often encounter the drift problems. As a result of self-taught learning, these mis-aligned samples are likely to be added and degrade the appearance models. In this paper, we propose a simple yet effective and efficient tracking algorithm with an appearance model based on features extracted from the multi-scale image feature space with data-independent basis. Our appearance model employs non-adaptive random projections that preserve the structure of the image feature space of objects. A very sparse measurement matrix is adopted to efficiently extract the features for the appearance model. We compress samples of foreground targets and the background using the same sparse measurement matrix. The tracking task is formulated as a binary classification via a naive Bayes classifier with online update in the compressed domain. The proposed compressive tracking algorithm runs in real-time and performs favorably against state-of-the-art algorithms on challenging sequences in terms of efficiency, accuracy and robustness.

read more

Content maybe subject to copyright    Report

Real-time Compressive Tracking
Kaihua Zhang
1
, Lei Zhang
1
, and Ming-Hsuan Yang
2
1
Depart. of Computing, Hong Kong Polytechnic University
2
Electrical Engineering and Computer Science, University of California at Merced
{cskhzhang,cslzhang}@comp.polyu.edu.hk, mhyang@ucmerced.edu
Abstract. It is a challenging task to develop effective and efficient ap-
pearance models for robust object tracking due to factors such as pose
variation, illumination change, occlusion, and motion blur. Existing on-
line tracking algorithms often update models with samples from obser-
vations in recent frames. While much success has been demonstrated,
numerous issues remain to be addressed. First, while these adaptive
appearance models are data-dependent, there does not exist sufficien-
t amount of data for online algorithms to learn at the outset. Second,
online tracking algorithms often encounter the drift problems. As a re-
sult of self-taught learning, these mis-aligned samples are likely to be
added and degrade the appearance models. In this paper, we propose a
simple yet effective and efficient tracking algorithm with an appearance
model based on features extracted from the multi-scale image feature
space with data-independent basis. Our appearance model employs non-
adaptive random projections that preserve the structure of the image
feature space of objects. A very sparse measurement matrix is adopted
to efficiently extract the features for the appearance model. We com-
press samples of foreground targets and the background using the same
sparse measurement matrix. The tracking task is formulated as a binary
classification via a naive Bayes classifier with online update in the com-
pressed domain. The proposed compressive tracking algorithm runs in
real-time and performs favorably against state-of-the-art algorithms on
challenging sequences in terms of efficiency, accuracy and robustness.
1 Introduction
Despite that numerous algorithms have been proposed in the literature, object
tracking remains a challenging problem due to appearance change caused by
pose, illumination, occlusion, and motion, among others. An effective appearance
model is of prime importance for the success of a tracking algorithm that has
been attracting much attention in recent years [1–10]. Tracking algorithms can
be generally categorized as either generative [1, 2, 6, 10, 9] or discriminative [3–5,
7, 8] based on their appearance models.
Generative tracking algorithms typically learn a model to represent the target
object and then use it to search for the image region with minimal reconstruction
error. Black et al. [1] learn an off-line subspace model to represent the object of
interest for tracking. The IVT method [6] utilizes an incremental subspace model

2 Kaihua Zhang
1
, Lei Zhang
1
, and Ming-Hsuan Yang
2
to adapt appearance changes. Recently, sparse representation has been used in
the `
1
-tracker where an object is modeled by a sparse linear combination of target
and trivial templates [10]. However, the computational complexity of this tracker
is rather high, thereby limiting its applications in real-time scenarios. Li et al. [9]
further extend the `
1
-tracker by using the orthogonal matching pursuit algorithm
for solving the optimization problems efficiently. Despite much demonstrated
success of these online generative tracking algorithms, several problems remain
to be solved. First, numerous training samples cropped from consecutive frames
are required in order to learn an appearance model online. Since there are only
a few samples at the outset, most tracking algorithms often assume that the
target appearance does not change much during this period. However, if the
appearance of the target changes significantly at the beginning, the drift problem
is likely to occur. Second, when multiple samples are drawn at the current target
location, it is likely to cause drift as the appearance model needs to adapt to
these potentially mis-aligned examples [8]. Third, these generative algorithms do
not use the background information which is likely to improve tracking stability
and accuracy.
Discriminative algorithms pose the tracking problem as a binary classification
task in order to find the decision boundary for separating the target object from
the background. Avidan [3] extends the optical flow approach with a support
vector machine classifier for object tracking. Collins et al. [4] demonstrate that
the most discriminative features can be learned online to separate the target
object from the background. Grabner et al. [5] propose an online boosting algo-
rithm to select features for tracking. However, these trackers [3–5] only use one
positive sample (i.e., the current tracker location) and a few negative samples
when updating the classifier. As the appearance model is updated with noisy and
potentially misaligned examples, this often leads to the tracking drift problem.
Grabner et al. [7] propose an online semi-supervised boosting method to allevi-
ate the drift problem in which only the samples in the first frame are labeled
and all the other samples are unlabeled. Babenko et al. [8] introduce multiple
instance learning into online tracking where samples are considered within posi-
tive and negative bags or sets. Recently, a semi-supervised learning approach [11]
is developed in which positive and negative samples are selected via an online
classifier with structural constraints.
In this paper, we propose an effective and efficient tracking algorithm with an
appearance model based on features extracted in the compressed domain. The
main components of our compressive tracking algorithm are shown by Figure 1.
Our appearance model is generative as the object can be well represented based
on the features extracted in the compressive domain. It is also discriminative
because we use these features to separate the target from the surrounding back-
ground via a naive Bayes classifier. In our appearance model, features are select-
ed by an information-preserving and non-adaptive dimensionality reduction from
the multi-scale image feature space based on compressive sensing theories [12,
13]. It has been demonstrated that a small number of randomly generated linear
measurements can preserve most of the salient information and allow almost

Real-time Compressive Tracking 3
⎛⎞⎛⎞⎛⎞
⎜⎟⎜⎟⎜⎟
⎜⎟⎜⎟⎜⎟
⎜⎟
•••
•••
•••
•••
•••
•••
⎟⎜
⎜⎟⎜⎟⎜⎟
••
⎟⎜ ⎟⎜
⎝⎠⎝⎠⎝⎠
"
⎛⎞⎛⎞⎛⎞
⎜⎟⎜⎟⎜⎟
⎜⎟⎜⎟⎜⎟
⎜⎟
•••
•••
•••
•••
•••
•••
⎟⎜
⎜⎟⎜⎟⎜⎟
••
⎟⎜ ⎟⎜
⎝⎠⎝⎠⎝⎠
"
#
#
⎛⎞⎛⎞⎛⎞
⎜⎟⎜⎟⎜⎟
⎜⎟⎜⎟
•••
•••
•••
⎜⎟
⎝⎠
⎝⎠
"
⎛⎞⎛⎞⎛⎞
⎜⎟⎜⎟⎜⎟
⎜⎟⎜⎟
•••
•••
•••
⎜⎟
⎝⎠
⎝⎠
"
⎛⎞⎛⎞⎛⎞
⎜⎟⎜⎟⎜⎟
⎜⎟⎜⎟
•••
•••
•••
⎜⎟
⎝⎠
⎝⎠
"
⎛⎞⎛⎞⎛⎞
⎜⎟⎜⎟⎜⎟
⎜⎟⎜⎟
•••
•••
•••
⎜⎟
⎝⎠
⎝⎠
"
Frame(t)
Samples
Multiscale
filter bank
Sparse
measurement
matrix
Compressed
vectors
Classifer
Multiscale
image features
(a) Updating classifier at the t-th frame
Sparse
measurement
matrix
Compressed
vectors
Multiscale
filter bank
Frame(t+1)
Sample with maximal
classifier response
Classifier
Multiscale
image features
(b) Tracking at the (t + 1)-th frame
Fig. 1. Main components of our compressive tracking algorithm.
perfect reconstruction of the signal if the signal is compressible such as natural
images or audio [12–14]. We use a very sparse measurement matrix that satisfies
the restricted isometry property (RIP) [15], thereby facilitating efficient projec-
tion from the image feature space to a low-dimensional compressed subspace. For
tracking, the positive and negative samples are projected (i.e., compressed) with
the same sparse measurement matrix and discriminated by a simple naive Bayes
classifier learned online. The proposed compressive tracking algorithm runs at
real-time and performs favorably against state-of-the-art trackers on challenging
sequences in terms of efficiency, accuracy and robustness.
2 Preliminaries
We present some preliminaries of compressive sensing which are used in the
proposed tracking algorithm.
2.1 Random projection
A random matrix R R
n×m
whose rows have unit length projects data from
the high-dimensional image space x R
m
to a lower-dimensional space v R
n
v = Rx, (1)
where n m. Ideally, we expect R provides a stable embedding that approxi-
mately preserves the distance between all pairs of original signals. The Johnson-
Lindenstrauss lemma [16] states that with high probability the distances between
the points in a vector space are preserved if they are projected onto a random-
ly selected subspace with suitably high dimensions. Baraniuk et al. [17] proved
that the random matrix satisfying the Johnson-Lindenstrauss lemma also holds

4 Kaihua Zhang
1
, Lei Zhang
1
, and Ming-Hsuan Yang
2
true for the restricted isometry property in compressive sensing. Therefore, if
the random matrix R in (1) satisfies the Johnson-Lindenstrauss lemma, we can
reconstruct x with minimum error from v with high probability if x is com-
pressive such as audio or image. We can ensure that v preserves almost all
the information in x. This very strong theoretical support motivates us to ana-
lyze the high-dimensional signals via its low-dimensional random projections. In
the proposed algorithm, we use a very sparse matrix that not only satisfies the
Johnson-Lindenstrauss lemma, but also can be efficiently computed for real-time
tracking.
2.2 Random measurement matrix
A typical measurement matrix satisfying the restricted isometry property is the
random Gaussian matrix R R
n×m
where r
ij
N (0, 1), as used in numerous
works recently [14, 9, 18]. However, as the matrix is dense, the memory and
computational loads are still large when m is large. In this paper, we adopt a
very sparse random measurement matrix with entries defined as
r
ij
=
s ×
1 with probability
1
2s
0 with probability 1
1
s
1 with probability
1
2s
.
(2)
Achlioptas [16] proved that this type of matrix with s = 2 or 3 satisfies the
Johnson-Lindenstrauss lemma. This matrix is very easy to compute which re-
quires only a uniform random generator. More importantly, when s = 3, it is
very sparse where two thirds of the computation can be avoided. In addition, Li
et al. [19] showed that for s = O(m) (x R
m
), this matrix is asymptotically
normal. Even when s = m/ log(m), the random projections are almost as accu-
rate as the conventional random projections where r
ij
N(0, 1). In this work,
we set s = m/4 which makes a very sparse random matrix. For each row of R,
only about c, c 4, entries need to be computed. Therefore, the computational
complexity is only O(cn) which is very low. Furthermore, we only need to store
the nonzero entries of R which makes the memory requirement also very light.
3 Proposed Algorithm
In this section, we present our tracking algorithm in details. The tracking prob-
lem is formulated as a detection task and our algorithm is shown in Figure 1.
We assume that the tracking window in the first frame has been determined. At
each frame, we sample some positive samples near the current target location
and negative samples far away from the object center to update the classifier. To
predict the object location in the next frame, we draw some samples around the
current target location and determine the one with the maximal classification
score.

Real-time Compressive Tracking 5
1
2
m
x
x
x
#
nm
R
×
1
2
n
v
v
v
#
×
iijj
j
vrx=
Fig. 2. Graphical representation of compressing a high-dimensional vector x to a low-
dimensional vector v. In the matrix R, dark, gray and white rectangles represent neg-
ative, positive, and zero entries, respectively. The blue arrows illustrate that one of
nonzero entries of one row of R sensing an element in x is equivalent to a rectangle
filter convolving the intensity at a fixed position of an input image.
3.1 Efficient dimensionality reduction
For each sample z R
w×h
, to deal with the scale problem, we represent it by
convolving z with a set of rectangle filters at multiple scales {h
1,1
, . . . , h
w,h
}
defined as
h
i,j
(x, y) =
1, 1 x i, 1 y j
0, otherwise
(3)
where i and j are the width and height of a rectangle filter, respectively. Then,
we represent each filtered image as a column vector in R
wh
and then concatenate
these vectors as a very high-dimensional multi-scale image feature vector x =
(x
1
, ..., x
m
)
>
R
m
where m = (wh)
2
. The dimensionality m is typically in the
order of 10
6
to 10
10
. We adopt a sparse random matrix R in (2) with s = m/4
to project x onto a vector v R
n
in a low-dimensional space. The random
matrix R needs to be computed only once off-line and remains fixed throughout
the tracking process. For the sparse matrix R in (2), the computational load is
very light. As shown by Figure 2, we only need to store the nonzero entries in
R and the positions of rectangle filters in an input image corresponding to the
nonzero entries in each row of R. Then, v can be efficiently computed by using R
to sparsely measure the rectangular features which can be efficiently computed
using the integral image method [20].
3.2 Analysis of low-dimensional compressive features
As shown in Figure 2, each element v
i
in the low-dimensional feature v R
n
is a linear combination of spatially distributed rectangle features at different s-
cales. As the coefficients in the measurement matrix can be positive or negative
(via (2)), the compressive features compute the relative intensity difference in
a way similar to the generalized Haar-like features [8] (See also Figure 2). The
Haar-like features have been widely used for object detection with demonstrat-
ed success [20, 21, 8]. The basic types of these Haar-like features are typically

Figures
Citations
More filters
Journal ArticleDOI

Two-stage modality-graphs regularized manifold ranking for RGB-T tracking

TL;DR: A two-stage modality-graphs regularized manifold ranking algorithm to learn a robust object representation for RGB-Thermal tracking and it is suggested that the proposed tracker outperforms several state-of-the-art RGB-T tracking methods.
Journal ArticleDOI

Density-Preserving Hierarchical EM Algorithm: Simplifying Gaussian Mixture Models for Approximate Inference

TL;DR: An algorithm for simplifying a finite mixture model into a reduced mixture model with fewer mixture components that can be widely used for probabilistic data analysis, and is more accurate than other mixture simplification methods.
Journal ArticleDOI

Real-Time and Deep Learning Based Vehicle Detection and Classification Using Pixel-Wise Code Exposure Measurements

TL;DR: This paper presents a real-time framework for processing compressive measurements directly without any image reconstruction and a special type of compressive measurement known as pixel-wise coded exposure (PCE) is adopted in this framework.
Proceedings ArticleDOI

Diminished reality using appearance and 3D geometry of internet photo collections

TL;DR: A new system level framework for Diminished Reality is presented, leveraging for the first time both the appearance and 3D information provided by large photo collections on the Internet, and carefully design the various components during the online phase so as to meet both speed and quality requirements of the task.
Journal ArticleDOI

A system on chip-based real-time tracking system for amphibious spherical robots:

TL;DR: A real-time detection and tracking system adopting Gaussian background model and compressive tracking algorithm was designed and implemented, which could meet future demands of the amphibious spherical robot in biological monitoring and multi-target tracking.
References
More filters
Proceedings ArticleDOI

Rapid object detection using a boosted cascade of simple features

TL;DR: A machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates and the introduction of a new image representation called the "integral image" which allows the features used by the detector to be computed very quickly.
Book

Compressed sensing

TL;DR: It is possible to design n=O(Nlog(m)) nonadaptive measurements allowing reconstruction with accuracy comparable to that attainable with direct knowledge of the N most important coefficients, and a good approximation to those N important coefficients is extracted from the n measurements by solving a linear program-Basis Pursuit in signal processing.
Journal ArticleDOI

Robust Face Recognition via Sparse Representation

TL;DR: This work considers the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise, and proposes a general classification algorithm for (image-based) object recognition based on a sparse representation computed by C1-minimization.
Journal ArticleDOI

Decoding by linear programming

TL;DR: F can be recovered exactly by solving a simple convex optimization problem (which one can recast as a linear program) and numerical experiments suggest that this recovery procedure works unreasonably well; f is recovered exactly even in situations where a significant fraction of the output is corrupted.
Journal ArticleDOI

Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?

TL;DR: If the objects of interest are sparse in a fixed basis or compressible, then it is possible to reconstruct f to within very high accuracy from a small number of random measurements by solving a simple linear program.
Related Papers (5)
Frequently Asked Questions (13)
Q1. What contributions have the authors mentioned in the paper "Real-time compressive tracking" ?

In this paper, the authors propose a simple yet effective and efficient tracking algorithm with an appearance model based on features extracted from the multi-scale image feature space with data-independent basis. 

The generative subspace tracker (e.g., IVT [6]) has been shown to be effective in dealing with large illumination changes while the discriminative tracking method with local features (i.e., MILTrack [8]) has been demonstrated to handle pose variation adequately. 

the authors expect R provides a stable embedding that approximately preserves the distance between all pairs of original signals. 

Their appearance model is generative as the object can be well represented based on the features extracted in the compressive domain. 

As the coefficients in the measurement matrix can be positive or negative (via (2)), the compressive features compute the relative intensity difference in a way similar to the generalized Haar-like features [8] 

Their tracker is implemented in MATLAB, which runs at 35 frames per second (FPS) on a Pentium Dual-Core 2.80 GHz CPU with 4 GB RAM. 

Both their tracker and the MILTrack method are designed to handle object location ambiguity in tracking with classifiers and discriminative features. 

Because the TLD tracker relies heavily on the visual information in the first frame to re-detect the object, it also suffers from the same problem. 

the authors represent each filtered image as a column vector in Rwh and then concatenate these vectors as a very high-dimensional multi-scale image feature vector x = (x1, ..., xm)> ∈ 

Since all of the trackers except for Frag involve randomness, the authors run them 10 times and report the average result for each video clip. 

Tracking algorithms can be generally categorized as either generative [1, 2, 6, 10, 9] or discriminative [3–5, 7, 8] based on their appearance models. 

For the Shaking sequence shown in Figure 5(b), when the stage light changes drastically and the pose of the subject changes rapidly as he performs, all the other trackers fail to track the object reliably. 

As the appearance model is updated with noisy and potentially misaligned examples, this often leads to the tracking drift problem. 

Trending Questions (1)
What are the current challenges and future trends in the development of real-time image processing algorithms?

The provided paper does not discuss the current challenges and future trends in the development of real-time image processing algorithms.