scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Changedetection.net: A new change detection benchmark dataset

TL;DR: A unique change detection benchmark dataset consisting of nearly 90,000 frames in 31 video sequences representing 6 categories selected to cover a wide range of challenges in 2 modalities (color and thermal IR).
Abstract: Change detection is one of the most commonly encountered low-level tasks in computer vision and video processing. A plethora of algorithms have been developed to date, yet no widely accepted, realistic, large-scale video dataset exists for benchmarking different methods. Presented here is a unique change detection benchmark dataset consisting of nearly 90,000 frames in 31 video sequences representing 6 categories selected to cover a wide range of challenges in 2 modalities (color and thermal IR). A distinguishing characteristic of this dataset is that each frame is meticulously annotated for ground-truth foreground, background, and shadow area boundaries — an effort that goes much beyond a simple binary label denoting the presence of change. This enables objective and precise quantitative comparison and ranking of change detection algorithms. This paper presents and discusses various aspects of the new dataset, quantitative performance metrics used, and comparative results for over a dozen previous and new change detection algorithms. The dataset, evaluation tools, and algorithm rankings are available to the public on a website1 and will be updated with feedback from academia and industry in the future.

Summary (2 min read)

1. Introduction

  • Detection of change, and in particular motion, is a fundamental low-level task in many computer vision and video processing applications.
  • Secondly, not all authors are willing to (or have the resources to) compare their methods against the most advanced and promising approaches.
  • CDnet also supplies a selection of evaluation tools in MATLAB and Python for quantitatively assessing the performance of different methods according to 7 distinct metrics.

2.2. Survey Papers

  • Below, the authors list key survey papers that are devoted to the comparison and ranking of change and motion detection algorithms.
  • The authors also use semisynthetic videos composed of synthetic foreground objects (people and cars) moving over a camera-captured background.
  • The sequences include illumination changes, dynamic background, shadows and noise, while lacking frames with no activity.
  • Rosin and Ioannidis, 2003 [21] use a labeling program that automatically locates moving objects based on their position in space and properties such as color, size, shape, etc.
  • Thirdly, the survey papers often report common, fairly simple motion detection methods, and do not report the performance of more complex methods.

3. New Dataset: CDnet

  • CDnet, presented at the IEEE Change Detection Workshop [1], consists of 31 videos depicting indoor and outdoor scenes with boats, cars, trucks, and pedestrians that have been captured in different scenarios and contain a range of challenges.
  • The length of the videos also varies from 1,000 to 8,000 frames and the videos shot by low-end IP cameras suffer from noticeable radial distortion.
  • Different cameras may have different hue bias (due to different white balancing algorithms employed) and some cameras apply automatic exposure adjustment resulting in global brightness fluctuations in time.
  • The videos are grouped into six categories according to the type of challenge each represents.
  • Such a grouping is essential for a clear identification of the strengths and weaknesses of different change detection methods.

3.1. Video Categories

  • This category contains one indoor and three outdoor videos captured by unstable (e.g., vibrating) cameras.
  • Some shadows are fairly narrow while others occupy most of the scene.
  • This category is intended for testing how various algorithms adapt to background changes.

3.2. Ground-Truth Labels

  • As mentioned in Section 2, the current online datasets have been designed mainly for testing tracking and scene understanding algorithms, and thus the ground truth is provided in the form of bounding boxes.
  • This is particularly difficult near moving object boundaries and in semi-transparent areas.
  • The Shadow label is associated with hard and well-defined moving shadows such as the one in Fig.
  • This prevents evaluation metrics from being corrupted by pixels whose status is unclear.
  • Firstly, since most change detection methods incur a delay before their background model stabilizes, the authors labeled the first few hundred frames of each video sequence as Non-ROI.

3.3. Evaluation Metrics

  • Finding the right metric to accurately measure the ability of a method to detect motion or change without producing excessive false positives and false negatives is not trivial.
  • Recall favors methods with a low False Negative Rate.
  • For each method, the above metrics are first computed for each video in each category.
  • The overall-average metrics such as Re are reported in Table 1 while category-average metrics such as Rec are reported on their website.
  • The average ranking Ri for method i across all overallaverage metrics is given by Ri = 1 7 ∑ m′ ranki(m ′) where m′ is an overall-average metric such as the one computed in equation (1) and ranki(m′) denotes the rank of method i according to the overall-average metric m′.

4. Methods Tested

  • A total of 18 change detection methods were evaluated for the IEEE Change Detection Workshop [1].
  • Out of the above methods, all except for the Euclidean and Mahalanobis distance methods [4], are robust to background motion, four are robust to shadows [13, 28, 12, 18] and two are robust to artifacts stemming from intermittent motion [2, 7].
  • For each method, only one set of parameters was used for all the videos.
  • All parameters are available on their website.

5. Experimental Results

  • The overall results are shown in Table 1 where the methods have been sorted according to their average ranking across categories (RC).
  • Such an approach gives remarkable results on baseline and intermittent object motion videos.
  • As can be seen in Table 2, videos with intermittent motion, shadows and camera jitter pose a greater challenge than videos in the other categories.
  • As can be seen in Table 3, the tested methods attained FPR in shadow areas between 0.33 and 0.64.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

changedetection.net: A New Change Detection Benchmark Dataset
Nil Goyette
1
Pierre-Marc Jodoin
1
Fatih Porikli
2
Janusz Konrad
3
Prakash Ishwar
3
1
Universit
´
e de Sherbrooke
2
Mitsubishi Electric Research Laboratories
3
Boston University
MOIVRE Imaging Group ECE Department
Sherbrooke, J1K 2R1, Canada Cambridge MA, 02139, USA Boston MA, 02215, USA
Abstract
Change detection is one of the most commonly encoun-
tered low-level tasks in computer vision and video process-
ing. A plethora of algorithms have been developed to date,
yet no widely accepted, realistic, large-scale video dataset
exists for benchmarking different methods. Presented here
is a unique change detection benchmark dataset consisting
of nearly 90,000 frames in 31 video sequences representing
6 categories selected to cover a wide range of challenges
in 2 modalities (color and thermal IR). A distinguishing
characteristic of this dataset is that each frame is meticu-
lously annotated for ground-truth foreground, background,
and shadow area boundaries an effort that goes much be-
yond a simple binary label denoting the presence of change.
This enables objective and precise quantitative comparison
and ranking of change detection algorithms. This paper
presents and discusses various aspects of the new dataset,
quantitative performance metrics used, and comparative re-
sults for over a dozen previous and new change detection
algorithms. The dataset, evaluation tools, and algorithm
rankings are available to the public on a website
1
and will
be updated with feedback from academia and industry in
the future.
1. Introduction
Detection of change, and in particular motion, is a fun-
damental low-level task in many computer vision and video
processing applications. Examples include visual surveil-
lance (people counting, crowd monitoring, action recogni-
tion, anomaly detection, forensic retrieval, etc.), smart en-
vironments (occupancy analysis, parking lot management,
etc.), and content retrieval (video annotation, event detec-
tion, object tracking). Change detection is closely coupled
with higher level inference tasks such as detection, local-
ization, tracking, and classification of moving objects, and
is often considered to be critical preprocessing step. Its im-
portance can be gauged by the large number of algorithms
that have been developed to-date and the even larger num-
ber of articles that have been published on this topic. A
1
www.changedetection.net
quick search for ‘motion detection’ on IEEE Xplore
c
re-
turns over 4,400 papers.
Among the many variants of change detection algo-
rithms, there seems to be no single algorithm that compe-
tently addresses all of the inherent real-life challenges in-
cluding sudden illumination variations, background move-
ments, shadows, camouflage effects (photometric similarit y
of object and background) and ghosting artifacts (delayed
detection of a moving object after it has moved away). Fur-
thermore, due to the tremendous effort required to generate
a benchmark dataset that contains pixel precision ground-
truth labels and provides a balanced coverage of the range
of challenges representative of the real world, no compre-
hensive large-scale evaluation of change detection has been
conducted to date.
The lack of a comprehensive dataset has a number of
negative implications. Firstly, it makes it difficult to as-
certain with confidence which algorithms would perform
robustly when the assumptions they are built upon are vi-
olated. Moreover, many algorithms tend to overfit specific
scenarios. For example, a method may be tuned to be robust
to shadows but may not be as robust to background motion.
A dataset that includes many different scenarios and uses a
variety of performance measures would go a long way to-
wards providing an objective assessment. Secondly, not all
authors are willing to (or have the resources to) compare
their methods against the most advanced and promising ap-
proaches. As a consequence, an overwhelming importance
has been accorded to a small subset of easily implementable
methods such as [23, 9, 26] that were developed in the late
1990’s. The more recent and advanced methods have been
marginalized as a result. Besides, the implementation of the
same method varies significantly from one research group
to another in the choice of parameters and the use of other
pre- and post-processing steps. Thirdly, the fact that authors
often use their own data (that are not widely available to ev-
eryone) makes a fair comparison much more problematic if
not impossible.
Recognizing the importance of change detection to
the computer vision and video processing communities,
we have prepared a unique change detection benchmark
dataset: changedetection.net (CDnet) that consists of nearly
978-1-4673-1612-5/12/$31.00 ©2012 IEEE 1

90,000 frames in 31 video sequences representing 6 video
categories (including thermal). This new dataset is the foun-
dation of the 2012 IEEE Change Detection Workshop [1].
CDnet contains diverse motion and change detection chal-
lenges in addition to typical indoor and outdoor scenes that
are encountered in most surveillance, smart environments,
and video analytics applications. A distinguishing feature
of CDnet is the fact that each image is meticulously anno-
tated for ground-truth foreground, background, and shadow
region boundaries; an effort that goes much beyond a sim-
ple binary label denoting the presence of the change. The
existence of ground-truth masks permits a precise compar-
ison and ranking of change detection algorithms. CDnet
also supplies a selection of evaluation tools in MATLAB and
Python for quantitatively assessing the performance of dif-
ferent methods according to 7 distinct metrics.
The overarching objectives of CDnet and its associated
workshop can be listed as:
1. To provide the research community with a rigorous and
comprehensive scientific benchmarking facility, a rich
dataset of videos, a set of utilities, and an access to
author-approved algorithm implementations for testing
and ranking of existing and new algorithms for motion
and change detection. The already extensive dataset
will be regularly revised and expanded with feedback
from the academia and industry.
2. To establish, maintain, and update a rank list of the
most accurate motion and change detection algorithms
in the various categories for years to come.
3. To help identify the remaining challenges in order to
provide focus for future research.
Next, we provide an overview of the existing datasets
and then present the details of CDnet including its cate-
gories, ground-truth annotations, performance metrics, and
a summary of the comparative rankings of the methods that
we tested at the IEEE Change Detection Workshop held in
conjunction with CVPR 2012.
2. Overview of Prior Efforts
Several datasets and survey papers have been presented
for the evaluation of change detection algorithms in the past.
2.1. Previous Datasets
Without aiming to be exhaustive, we list below a few key
datasets and describe their characteristics:
Wallflower [25]: This is a fairly well-known dataset
that continues to be used today. It contains 7 videos,
each representing a specific challenge such as illu-
mination change, background motion, etc. Only one
frame per video has been labeled.
PETS [27]: The Performance Evaluation of Tracking
and Surveillance (PETS) program was launched with
the goal of evaluating visual tracking and surveillance
algorithms. The program has been collecting videos
for the scientific community since the year 2000 and
now contains several dozen videos. Many of these
videos have been manually labeled by bounding boxes
with the goal of evaluating the performance of tracking
algorithms.
CAVIAR
2
: This dataset contains more than 80 staged
indoor videos representing all kinds of human behav-
ior such as walking, browsing, shopping, fighting, etc.
Like the PETS dataset, a bounding box is associated
with each moving character.
i-LIDS
3
: This dataset contains 4 scenarios (parked ve-
hicle, abandoned object, people walking in a restricted
area, doorway). Due to the size of the videos (more
than 24 hours of footage) the videos are not fully la-
beled.
ETISEO
4
: This dataset contains more than 80 video
clips of various indoor and outdoor scenes. Since the
ground truth consists mainly of high-level information
such as the bounding box, object class, event type, etc.,
this dataset is more suitable for tracking, classification
and event recognition than change detection.
VSSN 2006
5
: This dataset contains 9 semi-synthetic
videos composed of a real background and artificially-
moving objects. The videos contain animated back-
ground, illumination changes and shadows, however
include no frames void of activity.
IBM
6
: This dataset contains 15 indoor and outdoor
videos taken from PETS 2001 plus additional videos.
For each video, 1 frame out of 30 is labeled with a
bounding box around each foreground moving object.
Further details about these datasets, and many others,
can be found on a web page of the European CANTATA
project
7
. With the exception of the Wallflower and VSSN
2006 datasets, all others have ground-truth information rep-
resented in terms of the bounding box for each foreground
object. Furthermore, the focus in the above datasets is
more on tracking as well as human behavior and interac-
tion recognition than change detection. As such, the above
datasets do not contain the diversity of video categories
present in the new dataset.
2
http://homepages.inf.ed.ac.uk/rbf/CAVIARDATA1
3
http://www.homeoffice.gov.uk/science-research/hosdb/i-lids
4
http://www-sop.inria.fr/orion/ETISEO
5
http://mmc36.informatik.uni-augsburg.de/VSSN06 OSAC
6
http://www.research.ibm.com/peoplevision/performanceevaluation.html
7
http://www.hitech-projects.com/euprojects/cantata/datasets cantata/
2

2.2. Survey Papers
Below, we list key survey papers that are devoted to the
comparison and ranking of change and motion detection al-
gorithms. Each paper uses its own dataset.
Benezeth et al., 2010 [4] use a collection of 29 videos
(15 camera-captured, 10 semi-synthetic, and 4 syn-
thetic) taken from PETS 2001, the IBM dataset, and
the VSSN 2006 dataset. The authors also use semi-
synthetic videos composed of synthetic foreground ob-
jects (people and cars) moving over a camera-captured
background.
Bouwmans et al., 2008 [5] survey only GMM methods
and use the Wallflower dataset.
Nascimento and Marques, 2006 [16] use a single PETS
2011 video sequence which they manually label at
pixel resolution using a graphical editor.
Brutzer et al., 2011 [6] use a synthetic (computer-
generated) dataset produced from only one 3D scene
representing a street corner. The sequences include
illumination changes, dynamic background, shadows
and noise, while lacking frames with no activity.
Prati et al., 2001 [19] use indoor sequences containing
one moving person that are manually segmented into
foreground (human), shadow, and background areas.
Only 112 frames have ground-truth labels.
Rosin and Ioannidis, 2003 [21]usealabelingpro-
gram that automatically locates moving objects based
on their position in space and properties such as color,
size, shape, etc. These properties were not used by
the change detection algorithms tested. However, the
videos used are not realistic as they are limited to lab
scenes with balls rolling on the floor.
Bashir and Porikli, 2006 [3] conduct a performance
evaluation of tracking algorithms using the PETS 2001
dataset by comparing the detected bounding box loca-
tions with the ground-truth.
At a high level, the existing surveys suffer from three
main limitations. First, the statistics reported in these papers
have not been computed on a well-balanced dataset com-
posed of real (camera-captured) videos. Typically, synthetic
videos, real videos with synthetic moving objects pasted in,
or real videos out of which only 1 frame has been manually
segmented for ground truth are used. Furthermore, very few
datasets contain more than 10 videos. Secondly, none of
the papers are accompanied by a fully-operational web site
that allows users to upload their results and compare them
against those of others. Thirdly, the survey papers often re-
port common, fairly simple motion detection methods, and
do not report the performance of more complex methods.
3. New Dataset: CDnet
CDnet, presented at the IEEE Change Detection Work-
shop [1], consists of 31 videos depicting indoor and out-
door scenes with boats, cars, trucks, and pedestrians that
have been captured in different scenarios and contain a
range of challenges. The videos have been obtained with
different cameras ranging from low-resolution IP cameras,
through mid-resolution camcorders and PTZ cameras, to
thermal cameras. As a consequence, spatial resolutions of
the videos in CDnet vary from 320 × 240 to 720 × 576.
Also, due to diverse lighting conditions present and com-
pression parameters used, the level of noise and compres-
sion artifacts varies from one video to another. The length
of the videos also varies from 1,000 to 8,000 frames and
the videos shot by low-end IP cameras suffer from notice-
able radial distortion. Different cameras may have differ-
ent hue bias (due to different white balancing algorithms
employed) and some cameras apply automatic exposure ad-
justment resulting in global brightness fluctuations in time.
We believe that the fact that our videos have been captured
under a range of settings will help prevent this dataset from
favoring a certain family of change detection methods over
others.
The videos are grouped into six categories according to
the type of challenge each represents. We selected videos so
that the challenge in one category is unique to that category.
For example, only videos in the “Shadows” category con-
tain strong shadows and only those in the “Dynamic Back-
ground” category contain strong parasitic background mo-
tion. Such a grouping is essential for a clear identification of
the strengths and weaknesses of different change detection
methods. With the exception of one video in the “Baseline”
category, that comes from the PETS 2006 dataset, all the
videos have been captured by the authors.
3.1. Video Categories
31 videos totaling nearly 90,000 frames are grouped into
the following 6 categories (Fig. 1) that have been selected
to cover a wide range of change detection challenges that
are representative of typical visual data captured today in
surveillance, smart environment, and video analytics appli-
cations:
1. Baseline: This category contains four videos, two in-
door and two outdoor. These videos represent a mix-
ture of mild challenges typical of the next 4 categories.
Some videos have subtle background motion, others
have isolated shadows, some have an abandoned object
and others have pedestrians that stop for a short while
and then move away. These videos are fairly easy, but
not trivial, to process, and are provided mainly as ref-
erence.
3

“Baseline” “Dynamic Background” “Camera Jitter” “Shadows” “Interm. Object Motion” “Thermal”
Figure 1. Sample video frames from each of the 6 categories in the new dataset available at www.changedetection.net and typical
detection results obtained using basic background subtraction [4] reported in the last row of Table 1.
2. Dynamic Background: There are six videos in this
category depicting outdoor scenes with strong (para-
sitic) background motion. Two videos represent boats
on shimmering water, two videos show cars passing
next to a fountain, and the last two depict pedestrians,
cars and trucks passing in front of a tree shaken by the
wind (second column in Fig. 1).
3. Camera Jitter: This category contains one indoor and
three outdoor videos captured by unstable (e.g., vibrat-
ing) cameras. The jitter magnitude varies from one
video to another.
4. Shadows: This category consists of two indoor and
four outdoor videos exhibiting strong as well as faint
shadows. Some shadows are fairly narrow while others
occupy most of the scene. Also, some shadows are cast
by moving objects while others are cast by trees and
buildings.
5. Intermittent Object Motion: This category contains
six videos with scenarios known for causing “ghost-
ing” artifacts in the detected motion, i.e., objects move,
then stop for a short while, after which they start
moving again. Some videos include still objects that
suddenly start moving, e.g., a parked vehicle driving
away, and also abandoned objects. This category is
intended for testing how various algorithms adapt to
background changes. One example of such a challenge
is shown in the 5-th column of Fig. 1 where new ob-
jects are added to or existing objects are removed from
the scene.
6. Thermal: In this category, five videos (three outdoor
and two indoor) have been captured by far-infrared
cameras. These videos contain typical thermal arti-
facts such as heat stamps (e.g., bright spots left on a
seat after a person gets up and leaves), heat reflection
on floors and windows (see the last column of Fig. 1),
and camouflage effects, when a moving object has the
same temperature as the surrounding regions.
We would like to mention that although camou-
flage, caused by moving objects that have very similar
color/texture to the background, is among the most glaring
change detection issues, we have not created a camouflage
category. This is partially because almost every real video
sequence contains some level of camouflage. It is difficult
to create a dataset in which there is a category exclusively
with camouflage challenges while other categories are void
of it.
3.2. Ground-Truth Labels
As mentioned in Section 2, the current online datasets
have been designed mainly for testing tracking and scene
understanding algorithms, and thus the ground truth is pro-
vided in the form of bounding boxes. Although this can be
used to validate change detection methods, a precise vali-
dation requires ground truth at pixel resolution. Therefore,
ideally, videos should be labeled a number of times by dif-
ferent persons and the results averaged out. This, however,
is impractical due to resource and time constraints. Fur-
thermore, it is very difficult for a person to produce uncon-
troversial binary ground-truth images for camera-captured
videos. This is particularly difficult near moving object
boundaries and in semi-transparent areas. Due to motion
blur and partially-opaque objects (e.g., sparse bushes, di rty
windows, fountains), pixels in these areas may contain both
the moving object and background. As a consequence, one
cannot reliably classify such pixels as belonging to either
Static or Moving class. Since these areas carry a certain
level of uncertainty, evaluation metrics should not be com-
puted for pixels in these areas. Therefore, we decided to
produce ground-truth images with the following labels:
Static: assigned grayscale value of 0,
Shadow: assigned grayscale value of 50,
4

Non-ROI
8
: assigned grayscale value of 85,
Unknown: assigned grayscale value of 170,
Moving assigned grayscale value of 255.
The Static and Moving classes are associated with pixels
for which the motion status is obvious. The Shadow label
is associated with hard and well-defined moving shadows
such as the one in Fig. 2. Hard shadows are among the most
difficult artifacts to cope with and we believe that adding
this extra information improves the richness and utility of
the dataset. Please note that evaluation metrics discussed in
Section 3.3 consider the Shadow pixels as Static pixels. The
Unknown label is assigned to pixels that are half-occluded
and those corrupted by motion blur. All pixels located close
to moving-object boundaries are automatically labeled as
Unknown (Fig. 2). This prevents evaluation metrics from
being corrupted by pixels whose status is unclear.
The Non-ROI (not in region of interest) label serves two
purposes. Firstly, since most change detection methods in-
cur a delay before their background model stabilizes, we la-
beled the first few hundred frames of each video sequence as
Non-ROI. This prevents the corruption of evaluation metrics
due to errors during initialization. Secondly, the Non-ROI
label prevents the metrics from being corrupted by activities
unrelated to the category considered. An example of this sit-
uation is shown in the second row of Fig. 2, which illustrates
a sequence of cars that arrive, stop at a street light and then
move away. The goal of the video is to measure how well
a change detection method can handle intermittent motion.
However, since the scene is cluttered with unrelated activi-
ties (cars on the highway) the Non-ROI label puts the focus
on street-light activities. Similarly, the top row in Fig. 2 il-
lustrates the Shadow category; the Non-ROI label is used to
prevent the metrics from corruption by trees moving in the
background.
3.3. Evaluation Metrics
Finding the right metric to accurately measure the ability
of a method to detect motion or change without producing
excessive false positives and false negatives is not trivial.
For instance, recall favors methods with a low False Nega-
tive Rate. On the contrary, specificity favors methods with a
low False Positive Rate. Having the entire precision-recall
tradeoff curve or the ROC curve would be ideal, but not
all methods have the flexibility to sweep through the com-
plete gamut of tradeoffs. In addition, one cannot, in general,
rank-order methods based on a curve. We deal with these
difficulties by reporting the average performance of each
method for each video category with respect to 7 different
performance metrics each of which has been well-studied
in the literature. Specifically, for each method, each video
category, and each metric, we report the performance (as
8
ROI stands for Region of Interest.
Motion
Static
Outside ROI
hard shadow
Unknown
Static
Motion
Outside ROI
Unknown
Figure 2. Sample video frames from the Bungalows and Street
light sequences and corresponding 5-class ground-truth label
fields.
measured by the value of the metric) of the method aver-
aged across all the videos of the category.
Let TP = number of true positives, TN = number of
true negatives, FN = number of false negatives, and FP =
number of false positives. The 7 metrics that we use are:
1. Recall (Re): TP/(TP + FN)
2. Specificity (Sp): TN/(TN + FP)
3. False Positive Rate (FPR): FP/(FP + TN)
4. False Negative Rate (FNR): FN/(TN + FP)
5. Percentage of Wrong Classifications (PWC):
100(FN + FP)/(TP + FN + FP + TN)
6. Precision (Pr): TP/(TP + FP)
7. F -measure: 2
Pr·Re
Pr+Re
For the Shadow category, we also provide an average
False Positive Rate that is confined to the hard-shadow areas
(FPR-S).
For each method, the above metrics are first computed
for each video in each category. For example, the recall
metric for a particular video v in a category c is computed
as follows:
Re
v,c
= TP
v,c
/(TP
v,c
+ FN
v,c
).
Then, a category-average metric for each category is com-
puted from the values of the metric for all videos in a single
category. For example, the average recall metric of category
c is given by
Re
c
=
1
|N
c
|
v
Re
v,c
where |N
c
| is the number of videos in category c. We also
report an overall-average metric which is the simple average
5

Citations
More filters
Book ChapterDOI
Matej Kristan1, Ales Leonardis2, Jiří Matas3, Michael Felsberg4, Roman Pflugfelder5, Luka Cehovin1, Tomas Vojir3, Gustav Häger4, Alan Lukežič1, Gustavo Fernandez5, Abhinav Gupta6, Alfredo Petrosino7, Alireza Memarmoghadam8, Alvaro Garcia-Martin9, Andres Solis Montero10, Andrea Vedaldi11, Andreas Robinson4, Andy J. Ma12, Anton Varfolomieiev13, A. Aydin Alatan14, Aykut Erdem15, Bernard Ghanem16, Bin Liu, Bohyung Han17, Brais Martinez18, Chang-Ming Chang19, Changsheng Xu20, Chong Sun21, Daijin Kim17, Dapeng Chen22, Dawei Du20, Deepak Mishra23, Dit-Yan Yeung24, Erhan Gundogdu25, Erkut Erdem15, Fahad Shahbaz Khan4, Fatih Porikli26, Fatih Porikli27, Fei Zhao20, Filiz Bunyak28, Francesco Battistone7, Gao Zhu26, Giorgio Roffo29, Gorthi R. K. Sai Subrahmanyam23, Guilherme Sousa Bastos30, Guna Seetharaman31, Henry Medeiros32, Hongdong Li26, Honggang Qi20, Horst Bischof33, Horst Possegger33, Huchuan Lu21, Hyemin Lee17, Hyeonseob Nam34, Hyung Jin Chang35, Isabela Drummond30, Jack Valmadre11, Jae-chan Jeong36, Jaeil Cho36, Jae-Yeong Lee36, Jianke Zhu37, Jiayi Feng20, Jin Gao20, Jin-Young Choi, Jingjing Xiao2, Ji-Wan Kim36, Jiyeoup Jeong, João F. Henriques11, Jochen Lang10, Jongwon Choi, José M. Martínez9, Junliang Xing20, Junyu Gao20, Kannappan Palaniappan28, Karel Lebeda38, Ke Gao28, Krystian Mikolajczyk35, Lei Qin20, Lijun Wang21, Longyin Wen19, Luca Bertinetto11, Madan Kumar Rapuru23, Mahdieh Poostchi28, Mario Edoardo Maresca7, Martin Danelljan4, Matthias Mueller16, Mengdan Zhang20, Michael Arens, Michel Valstar18, Ming Tang20, Mooyeol Baek17, Muhammad Haris Khan18, Naiyan Wang24, Nana Fan39, Noor M. Al-Shakarji28, Ondrej Miksik11, Osman Akin15, Payman Moallem8, Pedro Senna30, Philip H. S. Torr11, Pong C. Yuen12, Qingming Huang39, Qingming Huang20, Rafael Martin-Nieto9, Rengarajan Pelapur28, Richard Bowden38, Robert Laganiere10, Rustam Stolkin2, Ryan Walsh32, Sebastian B. Krah, Shengkun Li19, Shengping Zhang39, Shizeng Yao28, Simon Hadfield38, Simone Melzi29, Siwei Lyu19, Siyi Li24, Stefan Becker, Stuart Golodetz11, Sumithra Kakanuru23, Sunglok Choi36, Tao Hu20, Thomas Mauthner33, Tianzhu Zhang20, Tony P. Pridmore18, Vincenzo Santopietro7, Weiming Hu20, Wenbo Li40, Wolfgang Hübner, Xiangyuan Lan12, Xiaomeng Wang18, Xin Li39, Yang Li37, Yiannis Demiris35, Yifan Wang21, Yuankai Qi39, Zejian Yuan22, Zexiong Cai12, Zhan Xu37, Zhenyu He39, Zhizhen Chi21 
08 Oct 2016
TL;DR: The Visual Object Tracking challenge VOT2016 goes beyond its predecessors by introducing a new semi-automatic ground truth bounding box annotation methodology and extending the evaluation system with the no-reset experiment.
Abstract: The Visual Object Tracking challenge VOT2016 aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results of 70 trackers are presented, with a large number of trackers being published at major computer vision conferences and journals in the recent years. The number of tested state-of-the-art trackers makes the VOT 2016 the largest and most challenging benchmark on short-term tracking to date. For each participating tracker, a short description is provided in the Appendix. The VOT2016 goes beyond its predecessors by (i) introducing a new semi-automatic ground truth bounding box annotation methodology and (ii) extending the evaluation system with the no-reset experiment. The dataset, the evaluation kit as well as the results are publicly available at the challenge website (http://votchallenge.net).

744 citations


Cites background from "Changedetection.net: A new change d..."

  • ..., CAVIAR(1), i-LIDS(2), ETISEO(3)), change detection [11], sports analytics (e....

    [...]

Proceedings ArticleDOI
23 Jun 2014
TL;DR: The latest release of the changedetection.net dataset is presented, which includes 22 additional videos spanning 5 new categories that incorporate challenges encountered in many surveillance settings and highlights strengths and weaknesses of these methods and identifies remaining issues in change detection.
Abstract: Change detection is one of the most important lowlevel tasks in video analytics. In 2012, we introduced the changedetection.net (CDnet) benchmark, a video dataset devoted to the evalaution of change and motion detection approaches. Here, we present the latest release of the CDnet dataset, which includes 22 additional videos (70; 000 pixel-wise annotated frames) spanning 5 new categories that incorporate challenges encountered in many surveillance settings. We describe these categories in detail and provide an overview of the results of more than a dozen methods submitted to the IEEE Change DetectionWorkshop 2014. We highlight strengths and weaknesses of these methods and identify remaining issues in change detection.

680 citations


Cites background or methods from "Changedetection.net: A new change d..."

  • ...Let us mention the 2014 handbook by Bouwmans et al [6] which, to our knowledge, is the most complete manuscript devoted to change detection recently published....

    [...]

  • ...We highlight strengths and weaknesses of these methods and identify remaining issues in change detection....

    [...]

  • ...An interesting finding is that methods appear to be complementary in nature: the best-performing methods can be beaten by combining several of them with a majority vote....

    [...]

Proceedings ArticleDOI
07 Dec 2015
TL;DR: The Visual Object Tracking challenge 2015, VOT2015, aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance and presents a new VOT 2015 dataset twice as large as in VOT2014 with full annotation of targets by rotated bounding boxes and per-frame attribute.
Abstract: The Visual Object Tracking challenge 2015, VOT2015, aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results of 62 trackers are presented. The number of tested trackers makes VOT 2015 the largest benchmark on short-term tracking to date. For each participating tracker, a short description is provided in the appendix. Features of the VOT2015 challenge that go beyond its VOT2014 predecessor are: (i) a new VOT2015 dataset twice as large as in VOT2014 with full annotation of targets by rotated bounding boxes and per-frame attribute, (ii) extensions of the VOT2014 evaluation methodology by introduction of a new performance measure. The dataset, the evaluation kit as well as the results are publicly available at the challenge website.

667 citations


Additional excerpts

  • ..., CAVIAR2, i-LIDS 3, ETISEO4, change detection [22], sports analytics (e....

    [...]

Journal ArticleDOI
TL;DR: The purpose of this paper is to provide a complete survey of the traditional and recent approaches to background modeling for foreground detection, and categorize the different approaches in terms of the mathematical models used.
Abstract: Background modeling for foreground detection is often used in different applications to model the background and then detect the moving objects in the scene like in video surveillance. The last decade witnessed very significant publications in this field. Furthermore, several surveys can be found in literature but none of them addresses an overall review in this field. So, the purpose of this paper is to provide a complete survey of the traditional and recent approaches. First, we categorize the different approaches found in literature. We have classified them in terms of the mathematical models used and we have discussed them in terms of the critical situations that they claim to handle. Furthermore, we present the available resources, datasets and libraries. Then, we conclude with several promising directions for future research.

664 citations


Cites background from "Changedetection.net: A new change d..."

  • ...The recent background representation models can be classified in the following categories: advanced statistical background models, fuzzy background models, discriminative subspace learning models, RPCA models, sparse models and transform domain models....

    [...]

Book ChapterDOI
Matej Kristan1, Ales Leonardis2, Jiří Matas3, Michael Felsberg4  +155 moreInstitutions (47)
23 Jan 2019
TL;DR: The Visual Object Tracking challenge VOT2018 is the sixth annual tracker benchmarking activity organized by the VOT initiative; results of over eighty trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years.
Abstract: The Visual Object Tracking challenge VOT2018 is the sixth annual tracker benchmarking activity organized by the VOT initiative. Results of over eighty trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The evaluation included the standard VOT and other popular methodologies for short-term tracking analysis and a “real-time” experiment simulating a situation where a tracker processes images as if provided by a continuously running sensor. A long-term tracking subchallenge has been introduced to the set of standard VOT sub-challenges. The new subchallenge focuses on long-term tracking properties, namely coping with target disappearance and reappearance. A new dataset has been compiled and a performance evaluation methodology that focuses on long-term tracking capabilities has been adopted. The VOT toolkit has been updated to support both standard short-term and the new long-term tracking subchallenges. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website (http://votchallenge.net).

639 citations


Cites methods from "Changedetection.net: A new change d..."

  • ...Several initiatives have been established to promote tracking, such as PETS [95], CAVIAR61, i-LIDS 62, ETISEO63, CDC [25], CVBASE 64, FERET [67], LTDT 65, MOTC [44,76] and Videonet 66, and since 2013 short-term single target visual object tracking has been receiving a strong push toward performance evaluation standardisation from the VOT 60 initiative....

    [...]

  • ...Several initiatives have been established to promote tracking, such as PETS [95], CAVIAR(61), i-LIDS (62), ETISEO(63), CDC [25], CVBASE (64), FERET [67], LTDT (65), MOTC [44,76] and Videonet (66), and since 2013 short-term single target visual object tracking has been receiving a strong push toward performance evaluation standardisation from the VOT 60 initiative....

    [...]

References
More filters
Proceedings ArticleDOI
23 Jun 1999
TL;DR: This paper discusses modeling each pixel as a mixture of Gaussians and using an on-line approximation to update the model, resulting in a stable, real-time outdoor tracker which reliably deals with lighting changes, repetitive motions from clutter, and long-term scene changes.
Abstract: A common method for real-time segmentation of moving regions in image sequences involves "background subtraction", or thresholding the error between an estimate of the image without moving objects and the current image. The numerous approaches to this problem differ in the type of background model used and the procedure used to update the model. This paper discusses modeling each pixel as a mixture of Gaussians and using an on-line approximation to update the model. The Gaussian, distributions of the adaptive mixture model are then evaluated to determine which are most likely to result from a background process. Each pixel is classified based on whether the Gaussian distribution which represents it most effectively is considered part of the background model. This results in a stable, real-time outdoor tracker which reliably deals with lighting changes, repetitive motions from clutter, and long-term scene changes. This system has been run almost continuously for 16 months, 24 hours a day, through rain and snow.

7,660 citations


"Changedetection.net: A new change d..." refers background in this paper

  • ...As a consequence, an overwhelming importance has been accorded to a small subset of easily implementable methods such as [23, 9, 26] that were developed in the late 1990’s....

    [...]

Journal ArticleDOI
TL;DR: Pfinder is a real-time system for tracking people and interpreting their behavior that uses a multiclass statistical model of color and shape to obtain a 2D representation of head and hands in a wide range of viewing conditions.
Abstract: Pfinder is a real-time system for tracking people and interpreting their behavior. It runs at 10 Hz on a standard SGI Indy computer, and has performed reliably on thousands of people in many different physical locations. The system uses a multiclass statistical model of color and shape to obtain a 2D representation of head and hands in a wide range of viewing conditions. Pfinder has been successfully used in a wide range of applications including wireless interfaces, video databases, and low-bandwidth coding.

4,280 citations

Journal ArticleDOI
TL;DR: This paper focuses on motion tracking and shows how one can use observed motion to learn patterns of activity in a site and create a hierarchical binary-tree classification of the representations within a sequence.
Abstract: Our goal is to develop a visual monitoring system that passively observes moving objects in a site and learns patterns of activity from those observations. For extended sites, the system will require multiple cameras. Thus, key elements of the system are motion tracking, camera coordination, activity classification, and event detection. In this paper, we focus on motion tracking and show how one can use observed motion to learn patterns of activity in a site. Motion segmentation is based on an adaptive background subtraction method that models each pixel as a mixture of Gaussians and uses an online approximation to update the model. The Gaussian distributions are then evaluated to determine which are most likely to result from a background process. This yields a stable, real-time outdoor tracker that reliably deals with lighting changes, repetitive motions from clutter, and long-term scene changes. While a tracking system is unaware of the identity of any object it tracks, the identity remains the same for the entire tracking sequence. Our system leverages this information by accumulating joint co-occurrences of the representations within a sequence. These joint co-occurrence statistics are then used to create a hierarchical binary-tree classification of the representations. This method is useful for classifying sequences, as well as individual instances of activities in a site.

3,631 citations


"Changedetection.net: A new change d..." refers methods in this paper

  • ...Two fairly old, but frequently-cited, methods: KDE-based estimation by Elgammal et al. [8] and GMM by Stauffer and Grimson [24], as well as 5 improved versions of these methods: self-adapting GMM by KaewTraKulPong [12], improved GMM by Zivkovic and van der Heijden [28], block-based GMM by Dora (RECTGAUSS-Tex) et al. [20], multi-level KDE by Nonaka et al. [17], and spatio-temporal KDE by Yoshinaga et al. [1] were also tested....

    [...]

  • ...[8] and GMM by Stauffer and Grimson [24], as well as 5 improved versions of these methods: self-adapting GMM by KaewTraKulPong [12], improved GMM by Zivkovic and van der Heijden [28], block-based GMM by Dora (RECTGAUSS-Tex) et al....

    [...]

Book ChapterDOI
26 Jun 2000
TL;DR: A novel non-parametric background model that can handle situations where the background of the scene is cluttered and not completely static but contains small motions such as tree branches and bushes is presented.
Abstract: Background subtraction is a method typically used to segment moving regions in image sequences taken from a static camera by comparing each new frame to a model of the scene background. We present a novel non-parametric background model and a background subtraction approach. The model can handle situations where the background of the scene is cluttered and not completely static but contains small motions such as tree branches and bushes. The model estimates the probability of observing pixel intensity values based on a sample of intensity values for each pixel. The model adapts quickly to changes in the scene which enables very sensitive detection of moving targets. We also show how the model can use color information to suppress detection of shadows. The implementation of the model runs in real-time for both gray level and color imagery. Evaluation shows that this approach achieves very sensitive detection with very low false alarm rates.

2,432 citations


"Changedetection.net: A new change d..." refers background in this paper

  • ...As a consequence, an overwhelming importance has been accorded to a small subset of easily implementable methods such as [23, 9, 26] that were developed in the late 1990’s....

    [...]

Proceedings ArticleDOI
01 Sep 1999
TL;DR: This work develops Wallflower, a three-component system for background maintenance that is shown to outperform previous algorithms by handling a greater set of the difficult situations that can occur.
Abstract: Background maintenance is a frequent element of video surveillance systems. We develop Wallflower, a three-component system for background maintenance: the pixel-level component performs Wiener filtering to make probabilistic predictions of the expected background; the region-level component fills in homogeneous regions of foreground objects; and the frame-level component detects sudden, global changes in the image and swaps in better approximations of the background. We compare our system with 8 other background subtraction algorithms. Wallflower is shown to outperform previous algorithms by handling a greater set of the difficult situations that can occur. Finally, we analyze the experimental results and propose normative principles for background maintenance.

1,971 citations


"Changedetection.net: A new change d..." refers background in this paper

  • ...• Wallflower [25]: This is a fairly well-known dataset that continues to be used today....

    [...]

Frequently Asked Questions (9)
Q1. What are the contributions mentioned in the paper "Changedetection.net: a new change detection benchmark dataset" ?

This paper presents and discusses various aspects of the new dataset, quantitative performance metrics used, and comparative results for over a dozen previous and new change detection algorithms. 

Examples include visual surveillance (people counting, crowd monitoring, action recognition, anomaly detection, forensic retrieval, etc.), smart environments (occupancy analysis, parking lot management, etc.), and content retrieval (video annotation, event detection, object tracking). 

The success of the number 1 [10] method can be attributed to the use of a dynamiccontrol algorithm for automatically adapting thresholds and other parameter values. 

since most change detection methods incur a delay before their background model stabilizes, the authors labeled the first few hundred frames of each video sequence as Non-ROI. 

The videos have been obtained with different cameras ranging from low-resolution IP cameras, through mid-resolution camcorders and PTZ cameras, to thermal cameras. 

Due to motion blur and partially-opaque objects (e.g., sparse bushes, dirty windows, fountains), pixels in these areas may contain both the moving object and background. 

The authors would like to mention that although camouflage, caused by moving objects that have very similar color/texture to the background, is among the most glaring change detection issues, the authors have not created a camouflage category. 

The CDnet undertaking aims to provide the research community with a rigorous and comprehensive scientific benchmarking facility, a rich dataset of videos, a set of utilities, and an access to author-approved algorithm implementations for testing and ranking of existing and new algorithms for motion and change detection. 

Benezeth et al., 2010 [4] use a collection of 29 videos (15 camera-captured, 10 semi-synthetic, and 4 synthetic) taken from PETS 2001, the IBM dataset, and the VSSN 2006 dataset.