scispace - formally typeset
Open AccessProceedings ArticleDOI

Online Object Tracking: A Benchmark

TLDR
Large scale experiments are carried out with various evaluation criteria to identify effective approaches for robust tracking and provide potential future research directions in this field.
Abstract
Object tracking is one of the most important components in numerous applications of computer vision. While much progress has been made in recent years with efforts on sharing code and datasets, it is of great importance to develop a library and benchmark to gauge the state of the art. After briefly reviewing recent advances of online object tracking, we carry out large scale experiments with various evaluation criteria to understand how these algorithms perform. The test image sequences are annotated with different attributes for performance evaluation and analysis. By analyzing quantitative results, we identify effective approaches for robust tracking and provide potential future research directions in this field.

read more

Content maybe subject to copyright    Report

Online Object Tracking: A Benchmark
Yi Wu
University of California at Merced
ywu29@ucmerced.edu
Jongwoo Lim
Hanyang University
jlim@hanyang.ac.kr
Ming-Hsuan Yang
University of California at Merced
mhyang@ucmerced.edu
Abstract
Object tracking is one of the most important components
in numerous applications of computer vision. While much
progress has been made in recent years with efforts on shar-
ing code and datasets, it is of great importance to develop
a library and benchmark to gauge the state of the art. After
briefly reviewing recent advances of online object tracking,
we carry out large scale experiments with various evalua-
tion criteria to understand how these algorithms perform.
The test image sequences are annotated with different at-
tributes for performance evaluation and analysis. By ana-
lyzing quantitative results, we identify effective approaches
for robust tracking and provide potential future research di-
rections in this field.
1. Introduction
Object tracking is one of the most important components
in a wide range of applications in computer vision, such
as surveillance, human computer interaction, and medical
imaging [60, 12]. Given the initialized state (e.g., position
and size) of a target object in a frame of a video, the goal
of tracking is to estimate the states of the target in the sub-
sequent frames. Although object tracking has been studied
for several decades, and much progress has been made in re-
cent years [28, 16, 47, 5, 40, 26, 19], it remains a very chal-
lenging problem. Numerous factors affect the performance
of a tracking algorithm, such as illumination variation, oc-
clusion, as well as background clutters, and there exists no
single tracking approach that can successfully handle all s-
cenarios. Therefore, it is crucial to evaluate the performance
of state-of-the-art trackers to demonstrate their strength and
weakness and help identify future research directions in this
field for designing more robust algorithms.
For comprehensive performance evaluation, it is criti-
cal to collect a representative dataset. There exist sever-
al datasets for visual tracking in the surveillance scenarios,
such as the VIVID [14], CAVIAR [21], and PETS databas-
es. However, the target objects are usually humans or cars
of small size in these surveillance sequences, and the back-
ground is usually static. Although some tracking dataset-
s [47, 5, 33] for generic scenes are annotated with bounding
box, most of them are not. For sequences without labeled
ground truth, it is difficult to evaluate tracking algorithms
as the reported results are based on inconsistently annotated
object locations.
Recently, more tracking source codes have been made
publicly available, e.g., the OAB [22], IVT [47], MIL [5],
L1 [40], and TLD [31] algorithms, which have been com-
monly used for evaluation. However, the input and output
formats of most trackers are different and thus it is inconve-
nient for large scale performance evaluation. In this work,
we build a code library that includes most publicly available
trackers and a test dataset with ground-truth annotations to
facilitate the evaluation task. Additionally each sequence
in the dataset is annotated with attributes that often affect
tracking performance, such as occlusion, fast motion, and
illumination variation.
One common issue in assessing tracking algorithms is
that the results are reported based on just a few sequences
with different initial conditions or parameters. Thus, the
results do not provide the holistic view of these algorithm-
s. For fair and comprehensive performance evaluation, we
propose to perturb the initial state spatially and temporally
from the ground-truth target locations. While the robust-
ness to initialization is a well-known problem in the field,
it is seldom addressed in the literature. To the best of our
knowledge, this is the first comprehensive work to address
and analyze the initialization problem of object tracking.
We use the precision plots based on location error metric
and the success plots based on the overlap metric, to ana-
lyze the performance of each algorithm.
The contribution of this work is three-fold:
Dataset. We build a tracking dataset with 50 fully annotat-
ed sequences to facilitate tracking evaluation.
Code library. We integrate most publicly available tracker-
s in our code library with uniform input and output formats
to facilitate large scale performance evaluation. At present,
it includes 29 tracking algorithms.
Robustness evaluation. The initial bounding boxes for
tracking are sampled spatially and temporally to evaluate
the robustness and characteristics of trackers. Each track-
1

er is extensively evaluated by analyzing more than 660,000
bounding box outputs.
This work mainly focuses on the online
1
tracking of sin-
gle target. The code library, annotated dataset and all the
tracking results are available on the website http://visual-
tracking.net .
2. Related Work
In this section, we review recent algorithms for object
tracking in terms of several main modules: target represen-
tation scheme, search mechanism, and model update. In
addition, some methods have been proposed that build on
combing some trackers or mining context information.
Representation Scheme. Object representation is one
of major components in any visual tracker and numerous
schemes have been presented [35]. Since the pioneer-
ing work of Lucas and Kanade [37, 8], holistic templates
(raw intensity values) have been widely used for track-
ing [25, 39, 2]. Subsequently, subspace-based tracking ap-
proaches [11, 47] have been proposed to better account
for appearance changes. Furthermore, Mei and Ling [40]
proposed a tracking approach based on sparse representa-
tion to handle the corrupted appearance and recently it has
been further improved [41, 57, 64, 10, 55, 42]. In ad-
dition to template, many other visual features have been
adopted in tracking algorithms, such as color histogram-
s [16], histograms of oriented gradients (HOG) [17, 52],
covariance region descriptor [53, 46, 56] and Haar-like fea-
tures [54, 22]. Recently, the discriminative model has been
widely adopted in tracking [15, 4], where a binary classifier
is learned online to discriminate the target from the back-
ground. Numerous learning methods have been adapted
to the tracking problem, such as SVM [3], structured out-
put SVM [26], ranking SVM [7], boosting [4, 22], semi-
boosting [23] and multi-instance boosting [5]. To make
trackers more robust to pose variation and partial occlusion,
an object can be represented by parts where each one is rep-
resented by descriptors or histograms. In [1] several local
histograms are used to represent the object in a pre-defined
grid structure. Kwon and Lee [32] propose an approach to
automatically update the topology of local patches to handle
large pose changes. To better handle appearance variation-
s, some approaches regarding integration of multiple repre-
sentation schemes have recently been proposed [62, 51, 33].
Search Mechanism. To estimate the state of the target ob-
jects, deterministic or stochastic methods have been used.
When the tracking problem is posed within an optimiza-
tion framework, assuming the objective function is differ-
entiable with respect to the motion parameters, gradient
descent methods can be used to locate the target efficient-
ly [37, 16, 20, 49]. However, these objective functions are
1
Here, the word online means during tracking only the information of
previous few frames is used for inference at any time instance.
usually nonlinear and contain many local minima. To allevi-
ate this problem, dense sampling methods have been adopt-
ed [22, 5, 26] at the expense of high computational load.
On the other hand, stochastic search algorithms such as par-
ticle filters [28, 44] have been widely used since they are
relatively insensitive to local minima and computationally
efficient [47, 40, 30].
Model Update. It is crucial to update the target repre-
sentation or model to account for appearance variations.
Matthews et al. [39] address the template update problem
for the Lucas-Kanade algorithm [37] where the template is
updated with the combination of the fixed reference tem-
plate extracted from the first frame and the result from the
most recent frame. Effective update algorithms have also
been proposed via online mixture model [29], online boost-
ing [22], and incremental subspace update [47]. For dis-
criminative models, the main issue has been improving the
sample collection part to make the online-trained classifier
more robust [23, 5, 31, 26]. While much progress has been
made, it is still difficult to get an adaptive appearance model
to avoid drifts.
Context and Fusion of Trackers. Context information is
also very important for tracking. Recently some approach-
es have been proposed by mining auxiliary objects or lo-
cal visual information surrounding the target to assist track-
ing [59, 24, 18]. The context information is especially help-
ful when the target is fully occluded or leaves the image
region [24]. To improve the tracking performance, some
tracker fusion methods have been proposed recently. Sant-
ner et al. [48] proposed an approach that combines stat-
ic, moderately adaptive and highly adaptive trackers to ac-
count for appearance changes. Even multiple trackers [34]
or multiple feature sets [61] are maintained and selected
in a Bayesian framework to better account for appearance
changes.
3. Evaluated Algorithms and Datasets
For fair evaluation, we test the tracking algorithms
whose original source or binary codes are publicly avail-
able as all implementations inevitably involve technical de-
tails and specific parameter settings
2
. Table 1 shows the list
of the evaluated tracking algorithms. We also evaluate the
trackers in the VIVID testbed [14] including the mean shift
(MS-V), template matching (TM-V), ratio shift (RS-V) and
peak difference (PD-V) methods.
In recent years, many benchmark datasets have been de-
veloped for various vision problems, such as the Berkeley
segmentation [38], FERET face recognition [45] and opti-
cal flow dataset [9]. There exist some datasets for the track-
ing in the surveillance scenario, such as the VIVID [14] and
CAVIAR [21] datasets. For generic visual tracking, more
2
Some source codes [36, 58] are obtained from direct contact, and some
methods are implemented on our own [44, 16].

Method Representation Search MU Code FPS
CPF [44] L, IH PF N C 109
LOT [43] L, color PF Y M 0.70
IVT [47] H, PCA, GM PF Y MC 33.4
ASLA [30] L, SR, GM PF Y MC 8.5
SCM [65] L, SR, GM+DM PF Y MC 0.51
L1APG [10] H, SR, GM PF Y MC 2.0
MTT [64] H, SR, GM PF Y M 1.0
VTD [33] H, SPCA, GM MCMC Y MC-E 5.7
VTS [34] L, SPCA, GM MCMC Y MC-E 5.7
LSK [36] L, SR, GM LOS Y M-E 5.5
ORIA [58] H, T, GM LOS Y M 9.0
DFT [49] L, T LOS Y M 13.2
KMS [16] H, IH LOS N C 3,159
SMS [13] H, IH LOS N C 19.2
VR-V [15] H, color LOS Y MC 109
Frag [1] L, IH DS N C 6.3
OAB [22] H, Haar, DM DS Y C 22.4
SemiT [23] H, Haar, DM DS Y C 11.2
BSBT [50] H, Haar, DM DS Y C 7.0
MIL [5] H, Haar, DM DS Y C 38.1
CT [63] H, Haar, DM DS Y MC 64.4
TLD [31] L, BP, DM DS Y MC 28.1
Struck [26] H, Haar, DM DS Y C 20.2
CSK [27] H, T, DM DS Y M 362
CXT [18] H, BP, DM DS Y C 15.3
Table 1. Evaluated tracking algorithms (MU: model update, FP-
S: frames per second). For representation schemes, L: local, H:
holistic, T: template, IH: intensity histogram, BP: binary pattern,
PCA: principal component analysis, SPCA: sparse PCA, SR: s-
parse representation, DM: discriminative model, GM: generative
model. For search mechanism, PF: particle filter, MCMC: Markov
Chain Monte Carlo, LOS: local optimum search, DS: dense sam-
pling search. For the model update, N: No, Y: Yes. In the Code
column, M: Matlab, C:C/C++, MC: Mixture of Matlab and C/C++,
suffix E: executable binary code.
sequences have been used for evaluation [47, 5]. However,
most sequences do not have the ground truth annotation-
s, and the quantitative evaluation results may be generated
with different initial conditions. To facilitate fair perfor-
mance evaluation, we have collected and annotated most
commonly used tracking sequences. Figure 1 shows the first
frame of each sequence where the target object is initialized
with a bounding box.
Attributes of a test sequence. Evaluating trackers is dif-
ficult because many factors can affect the tracking perfor-
mance. For better evaluation and analysis of the strength
and weakness of tracking approaches, we propose to catego-
rize the sequences by annotating them with the 11 attributes
shown in Table 2.
The attribute distribution in our dataset is shown in Fig-
ure 2(a). Some attributes occur more frequently, e.g., OPR
and IPR, than others. It also shows that one sequence is
often annotated with several attributes. Aside from summa-
rizing the performance on the whole dataset, we also con-
struct several subsets corresponding to attributes to report
specific challenging conditions. For example, the OCC sub-
set contains 29 sequences which can be used to analyze the
Attr Description
IV Illumination Variation - the illumination in the target region is
significantly changed.
SV Scale Variation - the ratio of the bounding boxes of the first
frame and the current frame is out of the range [1/t
s
, t
s
], t
s
>
1 (t
s
=2).
OCC Occlusion - the target is partially or fully occluded.
DEF Deformation - non-rigid object deformation.
MB Motion Blur - the target region is blurred due to the motion of
target or camera.
FM Fast Motion - the motion of the ground truth is larger than t
m
pixels (t
m
=20).
IPR In-Plane Rotation - the target rotates in the image plane.
OPR Out-of-Plane Rotation - the target rotates out of the image
plane.
OV Out-of-View - some portion of the target leaves the view.
BC Background Clutters - the background near the target has the
similar color or texture as the target.
LR Low Resolution - the number of pixels inside the ground-truth
bounding box is less than t
r
(t
r
=400).
Table 2. List of the attributes annotated to test sequences. The
threshold values used in this work are also shown.
(a) (b)
Figure 2. (a) Attribute distribution of the entire testset, and (b) the
distribution of the sequences with occlusion (OCC) attribute.
performance of trackers to handle occlusion. The attribute
distributions in OCC subset is shown in Figure 2(b) and oth-
ers are available in the supplemental material.
4. Evaluation Methodology
In this work, we use the precision and success rate for
quantitative analysis. In addition, we evaluate the robust-
ness of tracking algorithms in two aspects.
Precision plot. One widely used evaluation metric on track-
ing precision is the center location error, which is defined
as the average Euclidean distance between the center loca-
tions of the tracked targets and the manually labeled ground
truths. Then the average center location error over all the
frames of one sequence is used to summarize the overall
performance for that sequence. However, when the tracker
loses the target, the output location can be random and the
average error value may not measure the tracking perfor-
mance correctly [6]. Recently the precision plot [6, 27] has
been adopted to measure the overall tracking performance.
It shows the percentage of frames whose estimated location
is within the given threshold distance of the ground truth.
As the representative precision score for each tracker we
use the score for the threshold = 20 pixels [6].
Success plot. Another evaluation metric is the bounding
box overlap. Given the tracked bounding box r
t
and the

Figure 1. Tracking sequences for evaluation. The first frame with the bounding box of the target object is shown for each sequence. The
sequences are ordered based on our ranking results (See supplementary material): the ones on the top left are more difficult for tracking
than the ones on the bottom right. Note that we annotated two targets for the jogging sequence.
ground truth bounding box r
a
, the overlap score is defined
as S =
| r
t
T
r
a
|
| r
t
S
r
a
|
, where
T
and
S
represent the intersec-
tion and union of two regions, respectively, and | · | denotes
the number of pixels in the region. To measure the perfor-
mance on a sequence of frames, we count the number of
successful frames whose overlap S is larger than the given
threshold t
o
. The success plot shows the ratios of success-
ful frames at the thresholds varied from 0 to 1. Using one
success rate value at a specific threshold (e.g. t
o
=0.5) for
tracker evaluation may not be fair or representative. Instead
we use the area under curve (AUC) of each success plot to
rank the tracking algorithms.
Robustness Evaluation. The conventional way to evaluate
trackers is to run them throughout a test sequence with ini-
tialization from the ground truth position in the first frame
and report the average precision or success rate. We re-
fer this as one-pass evaluation (OPE). However a tracker
may be sensitive to the initialization, and its performance
with different initialization at a different start frame may
become much worse or better. Therefore, we propose two
ways to analyze a tracker’s robustness to initialization, by
perturbing the initialization temporally (i.e., start at differ-
ent frames) and spatially (i.e., start by different bounding
boxes). These tests are referred as temporal robustness e-
valuation (TRE) and spatial robustness evaluation (SRE)
respectively.
The proposed test scenarios happen a lot in the real-
world applications as a tracker is often initialized by an ob-
ject detector, which is likely to introduce initialization er-
rors in terms of position and scale. In addition, an object
detector may be used to re-initialize a tracker at differen-
t time instances. By investigating a tracker’s characteristic
in the robustness evaluation, more thorough understanding
and analysis of the tracking algorithm can be carried out.
Temporal Robustness Evaluation. Given one initial frame
together with the ground-truth bounding box of target, one
tracker is initialized and runs to the end of the sequence, i.e.,
one segment of the entire sequence. The tracker is evaluat-
ed on each segment, and the overall statistics are tallied.
Spatial Robustness Evaluation. We sample the initial
bounding box in the first frame by shifting or scaling the
ground truth. Here, we use 8 spatial shifts including 4 cen-
ter shifts and 4 corner shifts, and 4 scale variations (supple-
ment). The amount for shift is 10% of target size, and the
scale ratio varys among 0.8, 0.9, 1.1 and 1.2 to the ground
truth. Thus, we evaluate each tracker 12 times for SRE.
5. Evaluation Results
For each tracker, the default parameters with the source
code are used in all evaluations. Table 1 lists the average
FPS of each tracker in OPE running on a PC with Intel i7
3770 CPU (3.4GHz). More detailed speed statistics, such as
minimum and maximum, are available in the supplement.
For OPE, each tracker is tested on more than 29,000
frames. For SRE, each tracker is evaluated 12 times on each
sequence, where more than 350,000 bounding box results
are generated. For TRE, each sequence is partitioned into
20 segments and thus each tracker is performed on around
310,000 frames. To the best of our knowledge, this is the
largest scale performance evaluation of visual tracking. We
report the most important findings in this manuscript and
more details and figures can be found in the supplement.
5.1. Overall Performance
The overall performance for all the trackers is summa-
rized by the success and precision plots as shown in Fig-

Figure 3. Plots of OPE, SRE, and TRE. The performance score for each tracker is shown in the legend. For each figure, the top 10 trackers
are presented for clarity and complete plots are in the supplementary material (best viewed on high-resolution display).
ure 3 where only the top 10 algorithms are presented for
clarity and the complete plots are displayed in the supple-
mentary material. For success plots, we use AUC scores to
summarize and rank the trackers, while for precision plots
we use the results at error threshold of 20 for ranking. In
the precision plots, the rankings of some trackers are slight-
ly different from the rankings in the success plots in that
they are based on different metrics which measure different
characteristics of trackers. Because the AUC score of suc-
cess plot measures the overall performance which is more
accurate than the score at one threshold of the plot, in the
following we mainly analyze the rankings based on success
plots but use the precision plots as auxiliary.
The average TRE performance is higher than that of OPE
in that the number of frames decreases from the first to last
segment of TRE. As the trackers tend to perform well in
shorter sequences, the average of all the results in TRE tend
to be higher. On the other hand, the average performance
of SRE is lower than that of OPE. The initialization errors
tend to cause trackers to update with imprecise appearance
information, thereby causing gradual drifts.
In the success plots, the top ranked tracker SCM in OPE
outperforms Struck by 2.6% but is 1.9% below Struck in
SRE. The results also show that OPE is not the best perfor-
mance indicator as the OPE is one trial of SRE or TRE. The
ranking of TLD in TRE is lower than OPE and SRE. This
is because TLD performs well in long sequences with a re-
detection module while there are numerous short segments
in TRE. The success plots of Struck in TRE and SRE show
that the success rate of Struck is higher than SCM and AL-
SA when the overlap threshold is small, but less than SCM
and ALSA when the overlap threshold is large. This is be-
cause Struck only estimates the location of target and does
not handle scale variation.
Sparse representations are used in SCM, ASLA, LSK,
MTT and L1APG. These trackers perform well in SRE and
TRE, which suggests sparse representations are effective
models to account for appearance change (e.g., occlusion).
We note that SCM, ASLA and LSK outperform MTT and
L1APG. The results suggest that local sparse representa-
tions are more effective than the ones with holistic sparse
templates. The AUC score of ASLA deceases less than
the other top 5 trackers from OPE to SRE and the rank-
ing of ASLA also increases. It indicates the alignment-
pooling technique adopted by ASLA is more robust to mis-
alignments and background clutters.
Among the top 10 trackers, CSK has the highest speed
where the proposed circulant structure plays a key role.
The VTD and VTS methods adopt mixture models to im-
prove the tracking performance. Compared with other high-
er ranked trackers, the performance bottleneck of them can
be attributed to their adopted representation based on sparse

Figures
Citations
More filters
Journal ArticleDOI

High-Speed Tracking with Kernelized Correlation Filters

TL;DR: A new kernelized correlation filter is derived, that unlike other kernel algorithms has the exact same complexity as its linear counterpart, which is called dual correlation filter (DCF), which outperform top-ranking trackers such as Struck or TLD on a 50 videos benchmark, despite being implemented in a few lines of code.
Journal ArticleDOI

Object Tracking Benchmark

TL;DR: An extensive evaluation of the state-of-the-art online object-tracking algorithms with various evaluation criteria is carried out to identify effective approaches for robust tracking and provide potential future research directions in this field.
Book ChapterDOI

Fully-Convolutional Siamese Networks for Object Tracking

TL;DR: A basic tracking algorithm is equipped with a novel fully-convolutional Siamese network trained end-to-end on the ILSVRC15 dataset for object detection in video and achieves state-of-the-art performance in multiple benchmarks.
Proceedings ArticleDOI

Accurate scale estimation for robust visual tracking

TL;DR: This paper presents a novel approach to robust scale estimation that can handle large scale variations in complex image sequences and shows promising results in terms of accuracy and efficiency.
Proceedings ArticleDOI

ECO: Efficient Convolution Operators for Tracking

TL;DR: This work revisit the core DCF formulation and introduces a factorized convolution operator, which drastically reduces the number of parameters in the model, and a compact generative model of the training sample distribution that significantly reduces memory and time complexity, while providing better diversity of samples.
References
More filters
Proceedings ArticleDOI

Histograms of oriented gradients for human detection

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Journal ArticleDOI

Robust Real-Time Face Detection

TL;DR: In this paper, a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates is described. But the detection performance is limited to 15 frames per second.
Proceedings Article

An iterative image registration technique with an application to stereo vision

TL;DR: In this paper, the spatial intensity gradient of the images is used to find a good match using a type of Newton-Raphson iteration, which can be generalized to handle rotation, scaling and shearing.
Proceedings ArticleDOI

Robust real-time face detection

TL;DR: A new image representation called the “Integral Image” is introduced which allows the features used by the detector to be computed very quickly and a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions.
Journal ArticleDOI

C ONDENSATION —Conditional Density Propagation forVisual Tracking

TL;DR: The Condensation algorithm uses “factored sampling”, previously applied to the interpretation of static images, in which the probability distribution of possible interpretations is represented by a randomly generated set.
Related Papers (5)
Frequently Asked Questions (18)
Q1. What are the contributions mentioned in the paper "Online object tracking: a benchmark" ?

After briefly reviewing recent advances of online object tracking, the authors carry out large scale experiments with various evaluation criteria to understand how these algorithms perform. By analyzing quantitative results, the authors identify effective approaches for robust tracking and provide potential future research directions in this field. 

The authors use the precision plots based on location error metric and the success plots based on the overlap metric, to analyze the performance of each algorithm. 

the discriminative model has been widely adopted in tracking [15, 4], where a binary classifier is learned online to discriminate the target from the background. 

In addition to template, many other visual features have been adopted in tracking algorithms, such as color histograms [16], histograms of oriented gradients (HOG) [17, 52], covariance region descriptor [53, 46, 56] and Haar-like features [54, 22]. 

Numerous learning methods have been adapted to the tracking problem, such as SVM [3], structured output SVM [26], ranking SVM [7], boosting [4, 22], semiboosting [23] and multi-instance boosting [5]. 

The proposed test scenarios happen a lot in the realworld applications as a tracker is often initialized by an object detector, which is likely to introduce initialization errors in terms of position and scale. 

In this work, the authors build a code library that includes most publicly available trackers and a test dataset with ground-truth annotations to facilitate the evaluation task. 

The conventional way to evaluate trackers is to run them throughout a test sequence with initialization from the ground truth position in the first frame and report the average precision or success rate. 

The results show that trackers with affine motion models (e.g., ASLA and SCM) often handle scale variation better than others that are designed to account for only translational motion with a few exceptions such as Struck. 

Given the tracked bounding box rt and theground truth bounding box ra, the overlap score is defined as S = | rt⋂ ra || rt ⋃ ra |, where ⋂ and ⋃ represent the intersec-tion and union of two regions, respectively, and | · | denotes the number of pixels in the region. 

more tracking source codes have been made publicly available, e.g., the OAB [22], IVT [47], MIL [5], L1 [40], and TLD [31] algorithms, which have been commonly used for evaluation. 

This indicates that the Haar-like features are somewhat robust to background clutters due to the summation operations when computing features. 

It can be exploited by using advanced learning techniques to encode the background information in the discriminative model implicitly (e.g., Struck), or serving as the tracking context explicitly (e.g., CXT). 

One common issue in assessing tracking algorithms is that the results are reported based on just a few sequences with different initial conditions or parameters. 

the authors propose two ways to analyze a tracker’s robustness to initialization, by perturbing the initialization temporally (i.e., start at different frames) and spatially (i.e., start by different bounding boxes). 

Due to the space limitation, the plots of SRE are presented for analysis in the following sections, and more results are included in the supplement. 

Compared with other higher ranked trackers, the performance bottleneck of them can be attributed to their adopted representation based on sparseprincipal component analysis, where the holistic templates are used. 

Matthews et al. [39] address the template update problem for the Lucas-Kanade algorithm [37] where the template is updated with the combination of the fixed reference template extracted from the first frame and the result from the most recent frame.