scispace - formally typeset
Search or ask a question
Book ChapterDOI

Real-time compressive tracking

TL;DR: A simple yet effective and efficient tracking algorithm with an appearance model based on features extracted from the multi-scale image feature space with data-independent basis that performs favorably against state-of-the-art algorithms on challenging sequences in terms of efficiency, accuracy and robustness.
Abstract: It is a challenging task to develop effective and efficient appearance models for robust object tracking due to factors such as pose variation, illumination change, occlusion, and motion blur. Existing online tracking algorithms often update models with samples from observations in recent frames. While much success has been demonstrated, numerous issues remain to be addressed. First, while these adaptive appearance models are data-dependent, there does not exist sufficient amount of data for online algorithms to learn at the outset. Second, online tracking algorithms often encounter the drift problems. As a result of self-taught learning, these mis-aligned samples are likely to be added and degrade the appearance models. In this paper, we propose a simple yet effective and efficient tracking algorithm with an appearance model based on features extracted from the multi-scale image feature space with data-independent basis. Our appearance model employs non-adaptive random projections that preserve the structure of the image feature space of objects. A very sparse measurement matrix is adopted to efficiently extract the features for the appearance model. We compress samples of foreground targets and the background using the same sparse measurement matrix. The tracking task is formulated as a binary classification via a naive Bayes classifier with online update in the compressed domain. The proposed compressive tracking algorithm runs in real-time and performs favorably against state-of-the-art algorithms on challenging sequences in terms of efficiency, accuracy and robustness.

Summary (3 min read)

1 Introduction

  • Object tracking remains a challenging problem due to appearance change caused by pose, illumination, occlusion, and motion, among others.
  • Since there are only a few samples at the outset, most tracking algorithms often assume that the target appearance does not change much during this period.
  • As the appearance model is updated with noisy and potentially misaligned examples, this often leads to the tracking drift problem. [7] propose an online semi-supervised boosting method to alleviate the drift problem in which only the samples in the first frame are labeled and all the other samples are unlabeled.
  • The main components of their compressive tracking algorithm are shown by Figure 1 .
  • In their appearance model, features are selected by an information-preserving and non-adaptive dimensionality reduction from the multi-scale image feature space based on compressive sensing theories [12, 13] .

2.1 Random projection

  • Ideally, the authors expect R provides a stable embedding that approximately preserves the distance between all pairs of original signals.
  • The Johnson-Lindenstrauss lemma [16] states that with high probability the distances between the points in a vector space are preserved if they are projected onto a randomly selected subspace with suitably high dimensions.
  • Baraniuk et al. [17] proved that the random matrix satisfying the Johnson-Lindenstrauss lemma also holds true for the restricted isometry property in compressive sensing.
  • This very strong theoretical support motivates us to analyze the high-dimensional signals via its low-dimensional random projections.
  • In the proposed algorithm, the authors use a very sparse matrix that not only satisfies the Johnson-Lindenstrauss lemma, but also can be efficiently computed for real-time tracking.

2.2 Random measurement matrix

  • As the matrix is dense, the memory and computational loads are still large when m is large.
  • The authors adopt a very sparse random measurement matrix with entries defined as EQUATION Achlioptas [16] proved that this type of matrix with s = 2 or 3 satisfies the Johnson-Lindenstrauss lemma.
  • This matrix is very easy to compute which requires only a uniform random generator.
  • Therefore, the computational complexity is only O(cn) which is very low.
  • Furthermore, the authors only need to store the nonzero entries of R which makes the memory requirement also very light.

3 Proposed Algorithm

  • The authors assume that the tracking window in the first frame has been determined.
  • At each frame, the authors sample some positive samples near the current target location and negative samples far away from the object center to update the classifier.
  • To predict the object location in the next frame, the authors draw some samples around the current target location and determine the one with the maximal classification score.
  • In the matrix R, dark, gray and white rectangles represent negative, positive, and zero entries, respectively.

3.1 Efficient dimensionality reduction

  • The random matrix R needs to be computed only once off-line and remains fixed throughout the tracking process.
  • For the sparse matrix R in (2), the computational load is very light.
  • Then, v can be efficiently computed by using R to sparsely measure the rectangular features which can be efficiently computed using the integral image method [20] .

3.2 Analysis of low-dimensional compressive features

  • As the coefficients in the measurement matrix can be positive or negative (via (2)), the compressive features compute the relative intensity difference in a way similar to the generalized Haar-like features [8].
  • The basic types of these Haar-like features are typically designed for different tasks [20, 21] .
  • This problem is alleviated by boosting algorithms for selecting important features [20, 21] .
  • In their work, the large set of Haar-like features are compressively sensed with a very sparse measurement matrix.
  • Therefore, the authors can classify the projected features in the compressed domain efficiently without curse of dimensionality.

3.3 Classifier construction and update

  • Diaconis and Freedman [23] showed that the random projections of high dimensional random vectors are almost always Gaussian.
  • The above equations can be easily derived by maximal likelihood estimation.
  • Figure 3 shows the probability distributions for three different features of the positive and negative samples cropped from a few frames of a sequence for clarity of presentation.
  • The main steps of their algorithm are summarized in Algorithm 1.

3. Sample two sets of image patches

  • Extract the features with these two sets of samples and update the classifier parameters according to (6) .
  • Tracking location lt and classifier parameters, also known as Output.

3.4 Discussion

  • The authors note that simplicity is the prime characteristic of their algorithm in which the proposed sparse measurement matrix R is independent of any training samples, thereby resulting in a very efficient method.
  • It should be noted that their algorithm is different from the recently proposed 1 -tracker [10] and compressive sensing tracker [9] .
  • The sample in red rectangle is the most "correct" positive sample while other two in yellow rectangles are less "correct" positive samples.
  • These methods need to update the appearance models frequently for robust tracking.
  • Similar representations, e.g., local binary patterns [26] and generalized Haar-like features [8] , have been shown to be more effective in handling occlusion.

4 Experiments

  • The authors evaluate their tracking algorithm with 7 state-or-the-art methods on 20 challenging sequences among which 16 are publicly available and 4 are their own.
  • The Animal, Shaking and Soccer sequences are provided in [28] and the Box and Jumping are from [29] .
  • The authors note that the source code of [9] is not available for evaluation and the implementation requires some technical details and parameters not discussed therein.
  • It is worth noticing that the authors use the most challenging sequences from the existing works.
  • For their compared trackers, the authors either use the tuned parameters from the source codes or empirically set them for best results.

4.2 Experimental results

  • All of the video frames are in gray scale and the authors use two metrics to evaluate the proposed algorithm with 7 state-of-the-art trackers.
  • The authors note that although TLD tracker is able to relocate on the target during tracking, it is easy to lose the target completely for some frames in most of the test sequences.
  • For the David indoor sequence shown in Figure 5 (a), the illumination and pose of the object both change gradually.
  • The MILTrack, TLD and Struck methods perform well on this sequence.
  • In addition, their tracker performs well on the Sylvester and Panda sequences in which the target objects undergo significant pose changes (See the supplementary material for details).

5 Concluding Remarks

  • The authors proposed a simple yet robust tracking algorithm with an appearance model based on non-adaptive random projections that preserve the structure of original image space.
  • A very sparse measurement matrix was adopted to efficiently compress features from the foreground targets and background ones.
  • The tracking task was formulated as a binary classification problem with online update in the compressed domain.
  • The authors algorithm combines the merits of generative and discriminative appearance models to account for scene changes.
  • Numerous experiments with state-of-the-art algorithms on challenging sequences demonstrated that the proposed algorithm performs well in terms of accuracy, robustness, and speed.

Did you find this useful? Give us your feedback

Figures (7)

Content maybe subject to copyright    Report

Real-time Compressive Tracking
Kaihua Zhang
1
, Lei Zhang
1
, and Ming-Hsuan Yang
2
1
Depart. of Computing, Hong Kong Polytechnic University
2
Electrical Engineering and Computer Science, University of California at Merced
{cskhzhang,cslzhang}@comp.polyu.edu.hk, mhyang@ucmerced.edu
Abstract. It is a challenging task to develop effective and efficient ap-
pearance models for robust object tracking due to factors such as pose
variation, illumination change, occlusion, and motion blur. Existing on-
line tracking algorithms often update models with samples from obser-
vations in recent frames. While much success has been demonstrated,
numerous issues remain to be addressed. First, while these adaptive
appearance models are data-dependent, there does not exist sufficien-
t amount of data for online algorithms to learn at the outset. Second,
online tracking algorithms often encounter the drift problems. As a re-
sult of self-taught learning, these mis-aligned samples are likely to be
added and degrade the appearance models. In this paper, we propose a
simple yet effective and efficient tracking algorithm with an appearance
model based on features extracted from the multi-scale image feature
space with data-independent basis. Our appearance model employs non-
adaptive random projections that preserve the structure of the image
feature space of objects. A very sparse measurement matrix is adopted
to efficiently extract the features for the appearance model. We com-
press samples of foreground targets and the background using the same
sparse measurement matrix. The tracking task is formulated as a binary
classification via a naive Bayes classifier with online update in the com-
pressed domain. The proposed compressive tracking algorithm runs in
real-time and performs favorably against state-of-the-art algorithms on
challenging sequences in terms of efficiency, accuracy and robustness.
1 Introduction
Despite that numerous algorithms have been proposed in the literature, object
tracking remains a challenging problem due to appearance change caused by
pose, illumination, occlusion, and motion, among others. An effective appearance
model is of prime importance for the success of a tracking algorithm that has
been attracting much attention in recent years [1–10]. Tracking algorithms can
be generally categorized as either generative [1, 2, 6, 10, 9] or discriminative [3–5,
7, 8] based on their appearance models.
Generative tracking algorithms typically learn a model to represent the target
object and then use it to search for the image region with minimal reconstruction
error. Black et al. [1] learn an off-line subspace model to represent the object of
interest for tracking. The IVT method [6] utilizes an incremental subspace model

2 Kaihua Zhang
1
, Lei Zhang
1
, and Ming-Hsuan Yang
2
to adapt appearance changes. Recently, sparse representation has been used in
the `
1
-tracker where an object is modeled by a sparse linear combination of target
and trivial templates [10]. However, the computational complexity of this tracker
is rather high, thereby limiting its applications in real-time scenarios. Li et al. [9]
further extend the `
1
-tracker by using the orthogonal matching pursuit algorithm
for solving the optimization problems efficiently. Despite much demonstrated
success of these online generative tracking algorithms, several problems remain
to be solved. First, numerous training samples cropped from consecutive frames
are required in order to learn an appearance model online. Since there are only
a few samples at the outset, most tracking algorithms often assume that the
target appearance does not change much during this period. However, if the
appearance of the target changes significantly at the beginning, the drift problem
is likely to occur. Second, when multiple samples are drawn at the current target
location, it is likely to cause drift as the appearance model needs to adapt to
these potentially mis-aligned examples [8]. Third, these generative algorithms do
not use the background information which is likely to improve tracking stability
and accuracy.
Discriminative algorithms pose the tracking problem as a binary classification
task in order to find the decision boundary for separating the target object from
the background. Avidan [3] extends the optical flow approach with a support
vector machine classifier for object tracking. Collins et al. [4] demonstrate that
the most discriminative features can be learned online to separate the target
object from the background. Grabner et al. [5] propose an online boosting algo-
rithm to select features for tracking. However, these trackers [3–5] only use one
positive sample (i.e., the current tracker location) and a few negative samples
when updating the classifier. As the appearance model is updated with noisy and
potentially misaligned examples, this often leads to the tracking drift problem.
Grabner et al. [7] propose an online semi-supervised boosting method to allevi-
ate the drift problem in which only the samples in the first frame are labeled
and all the other samples are unlabeled. Babenko et al. [8] introduce multiple
instance learning into online tracking where samples are considered within posi-
tive and negative bags or sets. Recently, a semi-supervised learning approach [11]
is developed in which positive and negative samples are selected via an online
classifier with structural constraints.
In this paper, we propose an effective and efficient tracking algorithm with an
appearance model based on features extracted in the compressed domain. The
main components of our compressive tracking algorithm are shown by Figure 1.
Our appearance model is generative as the object can be well represented based
on the features extracted in the compressive domain. It is also discriminative
because we use these features to separate the target from the surrounding back-
ground via a naive Bayes classifier. In our appearance model, features are select-
ed by an information-preserving and non-adaptive dimensionality reduction from
the multi-scale image feature space based on compressive sensing theories [12,
13]. It has been demonstrated that a small number of randomly generated linear
measurements can preserve most of the salient information and allow almost

Real-time Compressive Tracking 3
⎛⎞⎛⎞⎛⎞
⎜⎟⎜⎟⎜⎟
⎜⎟⎜⎟⎜⎟
⎜⎟
•••
•••
•••
•••
•••
•••
⎟⎜
⎜⎟⎜⎟⎜⎟
••
⎟⎜ ⎟⎜
⎝⎠⎝⎠⎝⎠
"
⎛⎞⎛⎞⎛⎞
⎜⎟⎜⎟⎜⎟
⎜⎟⎜⎟⎜⎟
⎜⎟
•••
•••
•••
•••
•••
•••
⎟⎜
⎜⎟⎜⎟⎜⎟
••
⎟⎜ ⎟⎜
⎝⎠⎝⎠⎝⎠
"
#
#
⎛⎞⎛⎞⎛⎞
⎜⎟⎜⎟⎜⎟
⎜⎟⎜⎟
•••
•••
•••
⎜⎟
⎝⎠
⎝⎠
"
⎛⎞⎛⎞⎛⎞
⎜⎟⎜⎟⎜⎟
⎜⎟⎜⎟
•••
•••
•••
⎜⎟
⎝⎠
⎝⎠
"
⎛⎞⎛⎞⎛⎞
⎜⎟⎜⎟⎜⎟
⎜⎟⎜⎟
•••
•••
•••
⎜⎟
⎝⎠
⎝⎠
"
⎛⎞⎛⎞⎛⎞
⎜⎟⎜⎟⎜⎟
⎜⎟⎜⎟
•••
•••
•••
⎜⎟
⎝⎠
⎝⎠
"
Frame(t)
Samples
Multiscale
filter bank
Sparse
measurement
matrix
Compressed
vectors
Classifer
Multiscale
image features
(a) Updating classifier at the t-th frame
Sparse
measurement
matrix
Compressed
vectors
Multiscale
filter bank
Frame(t+1)
Sample with maximal
classifier response
Classifier
Multiscale
image features
(b) Tracking at the (t + 1)-th frame
Fig. 1. Main components of our compressive tracking algorithm.
perfect reconstruction of the signal if the signal is compressible such as natural
images or audio [12–14]. We use a very sparse measurement matrix that satisfies
the restricted isometry property (RIP) [15], thereby facilitating efficient projec-
tion from the image feature space to a low-dimensional compressed subspace. For
tracking, the positive and negative samples are projected (i.e., compressed) with
the same sparse measurement matrix and discriminated by a simple naive Bayes
classifier learned online. The proposed compressive tracking algorithm runs at
real-time and performs favorably against state-of-the-art trackers on challenging
sequences in terms of efficiency, accuracy and robustness.
2 Preliminaries
We present some preliminaries of compressive sensing which are used in the
proposed tracking algorithm.
2.1 Random projection
A random matrix R R
n×m
whose rows have unit length projects data from
the high-dimensional image space x R
m
to a lower-dimensional space v R
n
v = Rx, (1)
where n m. Ideally, we expect R provides a stable embedding that approxi-
mately preserves the distance between all pairs of original signals. The Johnson-
Lindenstrauss lemma [16] states that with high probability the distances between
the points in a vector space are preserved if they are projected onto a random-
ly selected subspace with suitably high dimensions. Baraniuk et al. [17] proved
that the random matrix satisfying the Johnson-Lindenstrauss lemma also holds

4 Kaihua Zhang
1
, Lei Zhang
1
, and Ming-Hsuan Yang
2
true for the restricted isometry property in compressive sensing. Therefore, if
the random matrix R in (1) satisfies the Johnson-Lindenstrauss lemma, we can
reconstruct x with minimum error from v with high probability if x is com-
pressive such as audio or image. We can ensure that v preserves almost all
the information in x. This very strong theoretical support motivates us to ana-
lyze the high-dimensional signals via its low-dimensional random projections. In
the proposed algorithm, we use a very sparse matrix that not only satisfies the
Johnson-Lindenstrauss lemma, but also can be efficiently computed for real-time
tracking.
2.2 Random measurement matrix
A typical measurement matrix satisfying the restricted isometry property is the
random Gaussian matrix R R
n×m
where r
ij
N (0, 1), as used in numerous
works recently [14, 9, 18]. However, as the matrix is dense, the memory and
computational loads are still large when m is large. In this paper, we adopt a
very sparse random measurement matrix with entries defined as
r
ij
=
s ×
1 with probability
1
2s
0 with probability 1
1
s
1 with probability
1
2s
.
(2)
Achlioptas [16] proved that this type of matrix with s = 2 or 3 satisfies the
Johnson-Lindenstrauss lemma. This matrix is very easy to compute which re-
quires only a uniform random generator. More importantly, when s = 3, it is
very sparse where two thirds of the computation can be avoided. In addition, Li
et al. [19] showed that for s = O(m) (x R
m
), this matrix is asymptotically
normal. Even when s = m/ log(m), the random projections are almost as accu-
rate as the conventional random projections where r
ij
N(0, 1). In this work,
we set s = m/4 which makes a very sparse random matrix. For each row of R,
only about c, c 4, entries need to be computed. Therefore, the computational
complexity is only O(cn) which is very low. Furthermore, we only need to store
the nonzero entries of R which makes the memory requirement also very light.
3 Proposed Algorithm
In this section, we present our tracking algorithm in details. The tracking prob-
lem is formulated as a detection task and our algorithm is shown in Figure 1.
We assume that the tracking window in the first frame has been determined. At
each frame, we sample some positive samples near the current target location
and negative samples far away from the object center to update the classifier. To
predict the object location in the next frame, we draw some samples around the
current target location and determine the one with the maximal classification
score.

Real-time Compressive Tracking 5
1
2
m
x
x
x
#
nm
R
×
1
2
n
v
v
v
#
×
iijj
j
vrx=
Fig. 2. Graphical representation of compressing a high-dimensional vector x to a low-
dimensional vector v. In the matrix R, dark, gray and white rectangles represent neg-
ative, positive, and zero entries, respectively. The blue arrows illustrate that one of
nonzero entries of one row of R sensing an element in x is equivalent to a rectangle
filter convolving the intensity at a fixed position of an input image.
3.1 Efficient dimensionality reduction
For each sample z R
w×h
, to deal with the scale problem, we represent it by
convolving z with a set of rectangle filters at multiple scales {h
1,1
, . . . , h
w,h
}
defined as
h
i,j
(x, y) =
1, 1 x i, 1 y j
0, otherwise
(3)
where i and j are the width and height of a rectangle filter, respectively. Then,
we represent each filtered image as a column vector in R
wh
and then concatenate
these vectors as a very high-dimensional multi-scale image feature vector x =
(x
1
, ..., x
m
)
>
R
m
where m = (wh)
2
. The dimensionality m is typically in the
order of 10
6
to 10
10
. We adopt a sparse random matrix R in (2) with s = m/4
to project x onto a vector v R
n
in a low-dimensional space. The random
matrix R needs to be computed only once off-line and remains fixed throughout
the tracking process. For the sparse matrix R in (2), the computational load is
very light. As shown by Figure 2, we only need to store the nonzero entries in
R and the positions of rectangle filters in an input image corresponding to the
nonzero entries in each row of R. Then, v can be efficiently computed by using R
to sparsely measure the rectangular features which can be efficiently computed
using the integral image method [20].
3.2 Analysis of low-dimensional compressive features
As shown in Figure 2, each element v
i
in the low-dimensional feature v R
n
is a linear combination of spatially distributed rectangle features at different s-
cales. As the coefficients in the measurement matrix can be positive or negative
(via (2)), the compressive features compute the relative intensity difference in
a way similar to the generalized Haar-like features [8] (See also Figure 2). The
Haar-like features have been widely used for object detection with demonstrat-
ed success [20, 21, 8]. The basic types of these Haar-like features are typically

Citations
More filters
Journal ArticleDOI
TL;DR: A new kernelized correlation filter is derived, that unlike other kernel algorithms has the exact same complexity as its linear counterpart, which is called dual correlation filter (DCF), which outperform top-ranking trackers such as Struck or TLD on a 50 videos benchmark, despite being implemented in a few lines of code.
Abstract: The core component of most modern trackers is a discriminative classifier, tasked with distinguishing between the target and the surrounding environment. To cope with natural image changes, this classifier is typically trained with translated and scaled sample patches. Such sets of samples are riddled with redundancies—any overlapping pixels are constrained to be the same. Based on this simple observation, we propose an analytic model for datasets of thousands of translated patches. By showing that the resulting data matrix is circulant, we can diagonalize it with the discrete Fourier transform, reducing both storage and computation by several orders of magnitude. Interestingly, for linear regression our formulation is equivalent to a correlation filter, used by some of the fastest competitive trackers. For kernel regression, however, we derive a new kernelized correlation filter (KCF), that unlike other kernel algorithms has the exact same complexity as its linear counterpart. Building on it, we also propose a fast multi-channel extension of linear correlation filters, via a linear kernel, which we call dual correlation filter (DCF). Both KCF and DCF outperform top-ranking trackers such as Struck or TLD on a 50 videos benchmark, despite running at hundreds of frames-per-second, and being implemented in a few lines of code (Algorithm 1). To encourage further developments, our tracking framework was made open-source.

4,994 citations


Cites background from "Real-time compressive tracking"

  • ...The task of tracking, a crucial component of many computer vision systems, can be naturally specified as an online learning problem [1], [2]....

    [...]

  • ...F...

    [...]

  • ...An extremely challenging factor is the virtually unlimited amount of negative samples that can be obtained from an image....

    [...]

Proceedings ArticleDOI
23 Jun 2013
TL;DR: Large scale experiments are carried out with various evaluation criteria to identify effective approaches for robust tracking and provide potential future research directions in this field.
Abstract: Object tracking is one of the most important components in numerous applications of computer vision. While much progress has been made in recent years with efforts on sharing code and datasets, it is of great importance to develop a library and benchmark to gauge the state of the art. After briefly reviewing recent advances of online object tracking, we carry out large scale experiments with various evaluation criteria to understand how these algorithms perform. The test image sequences are annotated with different attributes for performance evaluation and analysis. By analyzing quantitative results, we identify effective approaches for robust tracking and provide potential future research directions in this field.

3,828 citations

Journal ArticleDOI
TL;DR: An extensive evaluation of the state-of-the-art online object-tracking algorithms with various evaluation criteria is carried out to identify effective approaches for robust tracking and provide potential future research directions in this field.
Abstract: Object tracking has been one of the most important and active research areas in the field of computer vision. A large number of tracking algorithms have been proposed in recent years with demonstrated success. However, the set of sequences used for evaluation is often not sufficient or is sometimes biased for certain types of algorithms. Many datasets do not have common ground-truth object positions or extents, and this makes comparisons among the reported quantitative results difficult. In addition, the initial conditions or parameters of the evaluated tracking algorithms are not the same, and thus, the quantitative results reported in literature are incomparable or sometimes contradictory. To address these issues, we carry out an extensive evaluation of the state-of-the-art online object-tracking algorithms with various evaluation criteria to understand how these methods perform within the same framework. In this work, we first construct a large dataset with ground-truth object positions and extents for tracking and introduce the sequence attributes for the performance analysis. Second, we integrate most of the publicly available trackers into one code library with uniform input and output formats to facilitate large-scale performance evaluation. Third, we extensively evaluate the performance of 31 algorithms on 100 sequences with different initialization settings. By analyzing the quantitative results, we identify effective approaches for robust tracking and provide potential future research directions in this field.

2,974 citations


Cites background from "Real-time compressive tracking"

  • ...Here, we discuss the relevant performance evaluation work on object tracking and challenging factors in object tracking....

    [...]

Proceedings ArticleDOI
01 Jan 2014
TL;DR: This paper presents a novel approach to robust scale estimation that can handle large scale variations in complex image sequences and shows promising results in terms of accuracy and efficiency.
Abstract: Robust scale estimation is a challenging problem in visual object tracking. Most existing methods fail to handle large scale variations in complex image sequences. This paper presents a novel appro ...

2,038 citations


Cites background or methods from "Real-time compressive tracking"

  • ...We compare our approach with 11 state-of-the-art trackers: CT [19], TLD [15], DFT [17], EDFT [6], ASLA [14], L1APG [1], CSK [11], SCM [20], LOT [16], Struck [9] and LSHT [10], which have shown to provide excellent performance in literature....

    [...]

  • ...Ours ASLA [14] SCM [20] Struck [9] TLD [15] EDFT [6] L1APG [1] DFT [17] LOT [16] CSK [11] LSHT [10] CT [19] Median OP 75....

    [...]

  • ...In recent years, tracking-by-detection methods [3, 9, 11, 19] have shown to provide excellent tracking performance....

    [...]

Proceedings ArticleDOI
07 Dec 2015
TL;DR: This paper adaptively learn correlation filters on each convolutional layer to encode the target appearance and hierarchically infer the maximum response of each layer to locate targets.
Abstract: Visual object tracking is challenging as target objects often undergo significant appearance changes caused by deformation, abrupt motion, background clutter and occlusion. In this paper, we exploit features extracted from deep convolutional neural networks trained on object recognition datasets to improve tracking accuracy and robustness. The outputs of the last convolutional layers encode the semantic information of targets and such representations are robust to significant appearance variations. However, their spatial resolution is too coarse to precisely localize targets. In contrast, earlier convolutional layers provide more precise localization but are less invariant to appearance changes. We interpret the hierarchies of convolutional layers as a nonlinear counterpart of an image pyramid representation and exploit these multiple levels of abstraction for visual tracking. Specifically, we adaptively learn correlation filters on each convolutional layer to encode the target appearance. We hierarchically infer the maximum response of each layer to locate targets. Extensive experimental results on a largescale benchmark dataset show that the proposed algorithm performs favorably against state-of-the-art methods.

1,812 citations


Cites background from "Real-time compressive tracking"

  • ...Ours DLT KCF STC Struck SCM CT LSHT CSK MIL TLD MEEM TGPR [30] [17] [35] [13] [38] [36] [15] [16] [1] [19] [34] [10] DP rate (%) I 89....

    [...]

  • ...These trackers can be broadly categorized into three classes: (i) deep learning tracker DLT [30] (ii) correlation filter trackers including the CSK [16], STC [35], and KCF [17]; and (iii) representative tracking algorithms using single or multiple online classifiers, including the MIL [1], Struck [13], CT [36], LSHT [15], TLD [19], SCM [38], MEEM [34], and TGPR [10] methods....

    [...]

References
More filters
Proceedings ArticleDOI
01 Dec 2001
TL;DR: A machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates and the introduction of a new image representation called the "integral image" which allows the features used by the detector to be computed very quickly.
Abstract: This paper describes a machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates. This work is distinguished by three key contributions. The first is the introduction of a new image representation called the "integral image" which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features from a larger set and yields extremely efficient classifiers. The third contribution is a method for combining increasingly more complex classifiers in a "cascade" which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. The cascade can be viewed as an object specific focus-of-attention mechanism which unlike previous approaches provides statistical guarantees that discarded regions are unlikely to contain the object of interest. In the domain of face detection the system yields detection rates comparable to the best previous systems. Used in real-time applications, the detector runs at 15 frames per second without resorting to image differencing or skin color detection.

18,620 citations


"Real-time compressive tracking" refers background or methods in this paper

  • ...This problem is alleviated by boosting algorithms for selecting important features [20, 21]....

    [...]

  • ...The basic types of these Haar-like features are typically designed for different tasks [20, 21]....

    [...]

  • ...Then, v can be efficiently computed by using R to sparsely measure the rectangular features which can be efficiently computed using the integral image method [20]....

    [...]

  • ...The Haar-like features have been widely used for object detection with demonstrated success [20, 21, 8]....

    [...]

Book
D.L. Donoho1
01 Jan 2004
TL;DR: It is possible to design n=O(Nlog(m)) nonadaptive measurements allowing reconstruction with accuracy comparable to that attainable with direct knowledge of the N most important coefficients, and a good approximation to those N important coefficients is extracted from the n measurements by solving a linear program-Basis Pursuit in signal processing.
Abstract: Suppose x is an unknown vector in Ropfm (a digital image or signal); we plan to measure n general linear functionals of x and then reconstruct. If x is known to be compressible by transform coding with a known transform, and we reconstruct via the nonlinear procedure defined here, the number of measurements n can be dramatically smaller than the size m. Thus, certain natural classes of images with m pixels need only n=O(m1/4log5/2(m)) nonadaptive nonpixel samples for faithful recovery, as opposed to the usual m pixel samples. More specifically, suppose x has a sparse representation in some orthonormal basis (e.g., wavelet, Fourier) or tight frame (e.g., curvelet, Gabor)-so the coefficients belong to an lscrp ball for 0

18,609 citations

Journal ArticleDOI
TL;DR: This work considers the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise, and proposes a general classification algorithm for (image-based) object recognition based on a sparse representation computed by C1-minimization.
Abstract: We consider the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise. We cast the recognition problem as one of classifying among multiple linear regression models and argue that new theory from sparse signal representation offers the key to addressing this problem. Based on a sparse representation computed by C1-minimization, we propose a general classification algorithm for (image-based) object recognition. This new framework provides new insights into two crucial issues in face recognition: feature extraction and robustness to occlusion. For feature extraction, we show that if sparsity in the recognition problem is properly harnessed, the choice of features is no longer critical. What is critical, however, is whether the number of features is sufficiently large and whether the sparse representation is correctly computed. Unconventional features such as downsampled images and random projections perform just as well as conventional features such as eigenfaces and Laplacianfaces, as long as the dimension of the feature space surpasses certain threshold, predicted by the theory of sparse representation. This framework can handle errors due to occlusion and corruption uniformly by exploiting the fact that these errors are often sparse with respect to the standard (pixel) basis. The theory of sparse representation helps predict how much occlusion the recognition algorithm can handle and how to choose the training images to maximize robustness to occlusion. We conduct extensive experiments on publicly available databases to verify the efficacy of the proposed algorithm and corroborate the above claims.

9,658 citations


"Real-time compressive tracking" refers background in this paper

  • ...A typical measurement matrix satisfying the restricted isometry property is the random Gaussian matrix R ∈ Rn×m where rij ∼ N(0, 1), as used in numerous works recently [14, 9, 18]....

    [...]

Journal ArticleDOI
TL;DR: F can be recovered exactly by solving a simple convex optimization problem (which one can recast as a linear program) and numerical experiments suggest that this recovery procedure works unreasonably well; f is recovered exactly even in situations where a significant fraction of the output is corrupted.
Abstract: This paper considers a natural error correcting problem with real valued input/output. We wish to recover an input vector f/spl isin/R/sup n/ from corrupted measurements y=Af+e. Here, A is an m by n (coding) matrix and e is an arbitrary and unknown vector of errors. Is it possible to recover f exactly from the data y? We prove that under suitable conditions on the coding matrix A, the input f is the unique solution to the /spl lscr//sub 1/-minimization problem (/spl par/x/spl par//sub /spl lscr/1/:=/spl Sigma//sub i/|x/sub i/|) min(g/spl isin/R/sup n/) /spl par/y - Ag/spl par//sub /spl lscr/1/ provided that the support of the vector of errors is not too large, /spl par/e/spl par//sub /spl lscr/0/:=|{i:e/sub i/ /spl ne/ 0}|/spl les//spl rho//spl middot/m for some /spl rho/>0. In short, f can be recovered exactly by solving a simple convex optimization problem (which one can recast as a linear program). In addition, numerical experiments suggest that this recovery procedure works unreasonably well; f is recovered exactly even in situations where a significant fraction of the output is corrupted. This work is related to the problem of finding sparse solutions to vastly underdetermined systems of linear equations. There are also significant connections with the problem of recovering signals from highly incomplete measurements. In fact, the results introduced in this paper improve on our earlier work. Finally, underlying the success of /spl lscr//sub 1/ is a crucial property we call the uniform uncertainty principle that we shall describe in detail.

6,853 citations


"Real-time compressive tracking" refers background or methods in this paper

  • ...Another bound derived from the restricted isometry property in compressive sensing [15] is much tighter than that from Johnson-Lindenstrauss lemma, where n ≥ κβ log(m/β) and κ and β are constants....

    [...]

  • ...We use a very sparse measurement matrix that satisfies the restricted isometry property (RIP) [15], thereby facilitating efficient projection from the image feature space to a low-dimensional compressed subspace....

    [...]

Journal ArticleDOI
TL;DR: If the objects of interest are sparse in a fixed basis or compressible, then it is possible to reconstruct f to within very high accuracy from a small number of random measurements by solving a simple linear program.
Abstract: Suppose we are given a vector f in a class FsubeRopfN , e.g., a class of digital signals or digital images. How many linear measurements do we need to make about f to be able to recover f to within precision epsi in the Euclidean (lscr2) metric? This paper shows that if the objects of interest are sparse in a fixed basis or compressible, then it is possible to reconstruct f to within very high accuracy from a small number of random measurements by solving a simple linear program. More precisely, suppose that the nth largest entry of the vector |f| (or of its coefficients in a fixed basis) obeys |f|(n)lesRmiddotn-1p/, where R>0 and p>0. Suppose that we take measurements yk=langf# ,Xkrang,k=1,...,K, where the Xk are N-dimensional Gaussian vectors with independent standard normal entries. Then for each f obeying the decay estimate above for some 0

6,342 citations


"Real-time compressive tracking" refers methods in this paper

  • ...In our appearance model, features are selected by an information-preserving and non-adaptive dimensionality reduction from the multi-scale image feature space based on compressive sensing theories [12, 13]....

    [...]

Frequently Asked Questions (13)
Q1. What contributions have the authors mentioned in the paper "Real-time compressive tracking" ?

In this paper, the authors propose a simple yet effective and efficient tracking algorithm with an appearance model based on features extracted from the multi-scale image feature space with data-independent basis. 

The generative subspace tracker (e.g., IVT [6]) has been shown to be effective in dealing with large illumination changes while the discriminative tracking method with local features (i.e., MILTrack [8]) has been demonstrated to handle pose variation adequately. 

the authors expect R provides a stable embedding that approximately preserves the distance between all pairs of original signals. 

Their appearance model is generative as the object can be well represented based on the features extracted in the compressive domain. 

As the coefficients in the measurement matrix can be positive or negative (via (2)), the compressive features compute the relative intensity difference in a way similar to the generalized Haar-like features [8] 

Their tracker is implemented in MATLAB, which runs at 35 frames per second (FPS) on a Pentium Dual-Core 2.80 GHz CPU with 4 GB RAM. 

Both their tracker and the MILTrack method are designed to handle object location ambiguity in tracking with classifiers and discriminative features. 

Because the TLD tracker relies heavily on the visual information in the first frame to re-detect the object, it also suffers from the same problem. 

the authors represent each filtered image as a column vector in Rwh and then concatenate these vectors as a very high-dimensional multi-scale image feature vector x = (x1, ..., xm)> ∈ 

Since all of the trackers except for Frag involve randomness, the authors run them 10 times and report the average result for each video clip. 

Tracking algorithms can be generally categorized as either generative [1, 2, 6, 10, 9] or discriminative [3–5, 7, 8] based on their appearance models. 

For the Shaking sequence shown in Figure 5(b), when the stage light changes drastically and the pose of the subject changes rapidly as he performs, all the other trackers fail to track the object reliably. 

As the appearance model is updated with noisy and potentially misaligned examples, this often leads to the tracking drift problem. 

Trending Questions (1)
What are the current challenges and future trends in the development of real-time image processing algorithms?

The provided paper does not discuss the current challenges and future trends in the development of real-time image processing algorithms.