scispace - formally typeset
Open AccessJournal ArticleDOI

Background and foreground modeling using nonparametric kernel density estimation for visual surveillance

Reads0
Chats0
TLDR
This paper constructs a statistical representation of the scene background that supports sensitive detection of moving objects in the scene, but is robust to clutter arising out of natural scene variations.
Abstract
Automatic understanding of events happening at a site is the ultimate goal for many visual surveillance systems. Higher level understanding of events requires that certain lower level computer vision tasks be performed. These may include detection of unusual motion, tracking targets, labeling body parts, and understanding the interactions between people. To achieve many of these tasks, it is necessary to build representations of the appearance of objects in the scene. This paper focuses on two issues related to this problem. First, we construct a statistical representation of the scene background that supports sensitive detection of moving objects in the scene, but is robust to clutter arising out of natural scene variations. Second, we build statistical representations of the foreground regions (moving objects) that support their tracking and support occlusion reasoning. The probability density functions (pdfs) associated with the background and foreground are likely to vary from image to image and will not in general have a known parametric form. We accordingly utilize general nonparametric kernel density estimation techniques for building these statistical representations of the background and the foreground. These techniques estimate the pdf directly from the data without any assumptions about the underlying distributions. Example results from applications are presented.

read more

Content maybe subject to copyright    Report

Background and Foreground Modeling Using
Nonparametric Kernel Density Estimation for
Visual Surveillance
AHMED ELGAMMAL, RAMANI DURAISWAMI, MEMBER, IEEE, DAVID HARWOOD, AND
LARRY S. DAVIS, FELLOW, IEEE
Invited Paper
Automatic understanding of events happening at a site is the
ultimate goal for many visual surveillance systems. Higher level
understanding of events requires that certain lower level computer
vision tasks be performed. These may include detection of unusual
motion, tracking targets, labeling body parts, and understanding
the interactions between people. To achieve many of these tasks,
it is necessary to build representations of the appearance of
objects in the scene. This paper focuses on two issues related to
this problem. First, we construct a statistical representation of
the scene background that supports sensitive detection of moving
objects in the scene, but is robust to clutter arising out of natural
scene variations. Second, we build statistical representations of
the foreground regions (moving objects) that support their tracking
and support occlusion reasoning. The probability density functions
(pdfs) associated with the background and foreground are likely
to vary from image to image and will not in general have a known
parametric form. We accordingly utilize general nonparametric
kernel density estimation techniques for building these statistical
representations of the background and the foreground. These
techniques estimate the pdf directly from the data without any
assumptions about the underlying distributions. Example results
from applications are presented.
Keywords—Background subtraction, color modeling, kernel
density estimation, occlusion modeling, tracking, visual surveil-
lance.
Manuscript received May 31, 2001; revised February 15, 2002. This
work was supported in part by the ARDA Video Analysis and Content
Exploitation project under Contract MDA 90400C2110 and in part by
Philips Research.
A. Elgammal is with the Computer Vision Laboratory, University
of Maryland Institute for Advanced Computer Studies, Department of
Computer Science, University of Maryland, College Park, MD 20742 USA
(e-mail: elgammal@cs.umd.edu).
R. Duraiswami, D. Harwood, and L. S. Davis are with the Computer
Vision Laboratory, University of Maryland Institute for AdvancedComputer
Studies, University of Maryland, College Park, MD 20742 USA (e-mail:
ramani@umiacs.umd.edu; harwood@umiacs.umd.edu; lsd@cs.umd.edu).
Publisher Item Identifier 10.1109/JPROC.2002.801448.
I. INTRODUCTION
In automated surveillance systems, cameras and other sen-
sors are typically used to monitor activities at a site with the
goal of automatically understanding events happening at the
site. Automatic event understanding would enable function-
alities such as detection of suspicious activities and site se-
curity. Current systems archive huge volumes of video for
eventual off-line human inspection. The automatic detection
of events in videos would facilitate efficient archiving and
automatic annotation. It could be used to direct the attention
of human operators to potential problems. The automatic de-
tection of events would also dramatically reduce the band-
width required for video transmission and storage as only in-
teresting pieces would need to be transmitted or stored.
Higher level understanding of events requires certain
lower level computer vision tasks to be performed such
as detection of unusual motion, tracking targets, labeling
body parts, and understanding the interactions between
people. For many of these tasks, it is necessary to build
representations of the appearance of objects in the scene. For
example, the detection of unusual motions can be achieved
by building a representation of the scene background and
comparing new frames with this representation. This process
is called background subtraction. Building representations
for foreground objects (targets) is essential for tracking
them and maintaining their identities. This paper focuses
on two issues: how to construct a statistical representation
of the scene background that supports sensitive detection
of moving objects in the scene and how to build statistical
representations of the foreground (moving objects) that
support their tracking.
One useful tool for building such representations is sta-
tistical modeling, where a process is modeled as a random
variablein a feature space with an associated probability den-
sity function (pdf). The densityfunction could be represented
parametrically using a specified statistical distribution, that
0018-9219/02$17.00 © 2002 IEEE
PROCEEDINGS OF THE IEEE, VOL. 90, NO. 7, JULY 2002 1151

is assumed to approximate the actual distribution, with the
associated parameters estimated from training data. Alterna-
tively, nonparametric approaches could be used. These esti-
mate the density function directly from the data without any
assumptions about the underlying distribution. This avoids
having to choose a model and estimating its distribution pa-
rameters.
A particular nonparametric technique that estimates the
underlying density, avoids having to store the complete data,
and is quite general is the kernel density estimation tech-
nique. In this technique, the underlying pdf is estimated as
(1)
where
is a “kernel function” (typically a Gaussian) cen-
tered at the data points in feature space,
, and
are weighting coefficients (typically uniform weights are
used, i.e.,
). Kernel density estimators asymptoti-
cally converge to any density function [1], [2]. This property
makes these techniques quite general and applicable to many
vision problems where the underlying density is not known.
In this paper, kernel density estimation techniques are
utilized for building representations for both the background
and the foreground. We present an adaptive background
modeling and background subtraction technique that is able
to detect moving targets in challenging outdoor environ-
ments with moving trees and changing illumination. We also
present a technique for modeling foreground regions and
show how it can be used for segmenting major body parts of
a person and for segmenting groups of people.
II. K
ERNEL DENSITY ESTIMATION TECHNIQUES
Given a sample from a distribution with
density function
, an estimate of the density at
can be calculated using
(2)
where
is a kernel function (sometimes called a “window”
function) with a bandwidth (scale)
such that
. The kernel function should satisfy
and . We can think of (2) as estimating
the pdf by averaging the effect of a set of kernel functions
centered at each data point. Alternatively, since the kernel
function is symmetric, we can also regard this computation
as averaging the effect of a kernel function centered at the
estimation point and evaluated at each data point. Kernel
density estimators asymptotically converge to any density
function with sufficientsamples [1], [2]. This property makes
the technique quite general for estimating the density of
any distribution. In fact, all other nonparametric density
estimation methods, e.g., histograms, can be shown to be
asymptotically kernel methods [1].
Forhigher dimensions, products of one-dimensional (1-D)
kernels [1] can be used as
(3)
where the same kernel function is used in each dimension
with a suitable bandwidth
for each dimension. We can
avoid having to store the complete data set by weighting the
samples as
where the ’s are weighting coefficients that sum up to one.
A variety of kernelfunctions with different properties have
been used in the literature. Typically the Gaussian kernel is
used for its continuity, differentiability, and locality proper-
ties. Note that choosing the Gaussian as a kernel function
is different from fitting the distribution to a Gaussian model
(normal distribution). Here, the Gaussian is only used as a
function to weight the data points. Unlike parametric fitting
of a mixture of Gaussians, kernel density estimation is a more
general approach that does not assume any specific shape for
the density function. A good discussion of kernel estimation
techniques can be found in [1]. The major drawback of using
the nonparametric kernel density estimator is its computa-
tional cost. This becomes less of a problem as the available
computationalpowerincreasesandasefficientcomputational
methods have become available recently [3], [4].
III. M
ODELING THE BACKGROUND
A. Background Subtraction: A Review
1) The Concept: In video surveillance systems, sta-
tionary cameras are typically used to monitor activities at
outdoor or indoor sites. Since the cameras are stationary, the
detection of moving objects can be achieved by comparing
each new frame with a representation of the scene back-
ground. This process is called background subtraction and
the scene representation is called the background model.
Typically, background subtraction forms the first stage
in an automated visual surveillance system. Results from
background subtraction are used for further processing, such
as tracking targets and understanding events.
A central issue in building a representation for the scene
background is what features to use for this representation
or, in other words, what to model in the background. In
the literature, a variety of features have been used for
background modeling, including pixel-based features (pixel
intensity, edges, disparity) and region-based features (e.g.,
block correlation). The choice of the features affects how
the background model tolerates changes in the scene and the
granularity of the detected foreground objects.
In any indoor or outdoor scene, there are changes that
occur over time and may be classified as changes to the scene
background. It is important that the background model toler-
ates these kind of changes, either by being invariant to them
or by adapting to them. These changes can be local, affecting
only part of the background, or global, affecting the entire
background. The study of these changes is essential to un-
derstand the motivations behind different background sub-
traction techniques. We classify these changes according to
their source.
Illumination changes:
gradual change in illumination, as might occur in out-
door scenes due to the change in the location of the sun;
1152 PROCEEDINGS OF THE IEEE, VOL. 90, NO. 7, JULY 2002

sudden change in illumination as might occur in an in-
door environment by switching the lights on or off, or
in an outdoor environment by a change between cloudy
and sunny conditions;
shadows cast on the background by objects in the back-
ground itself (e.g., buildings and trees) or by moving
foreground objects.
Motion changes:
image changes due to small camera displacements
(these are common in outdoor situations due to wind
load or other sources of motion which causes global
motion in the images);
motion in parts of the background, for example, tree
branches moving with the wind or rippling water.
Changes introduced to the background: These include any
change in the geometry or the appearance of the background
of the scene introduced by targets. Such changes typically
occur when something relatively permanent is introduced
into the scene background (for example, if somebody moves
(introduces) something from (to) the background, or if a car
is parkedin the scene or moves out of the scene, or if a person
stays stationary in the scene for an extended period).
2) Practice: Many researchers haveproposed methods to
address some of the issues regarding the background mod-
eling,and we provide a brief reviewof therelevantwork here.
Pixel intensity is the most commonly used feature in back-
ground modeling. If we monitor the intensity value of a pixel
over time in a completely static scene, then the pixel in-
tensity can be reasonably modeled with a Gaussian distri-
bution
, given that the image noise over time can
be modeled by a zero mean Gaussian distribution
.
This Gaussian distribution model for the intensity value of a
pixel is the underlying model for many background subtrac-
tiontechniques. For example, one of thesimplest background
subtraction techniques is to calculate an average image of
the scene, subtract each new frame from this image, and
threshold the result. This basic Gaussian model can adapt to
slow changes in the scene (for example, gradual illumination
changes) by recursively updating the model using a simple
adaptive filter. This basic adaptive model is used in [5]; also,
Kalman filtering for adaptation is used in [6]–[8].
Typically, in outdoor environments with moving trees and
bushes, the scene background is not completely static. For
example, one pixel can be the image of the sky in one frame,
a tree leaf in another frame, a tree branch in a third frame,
and some mixture subsequently. In each situation, the pixel
will have a different intensity (color), so a single Gaussian
assumption for the pdf of the pixel intensity will not hold.
Instead, a generalization based on a mixture of Gaussians
has been used in [9]–[11] to model such variations. In [9]
and [10], the pixel intensity was modeled by a mixture of
Gaussian distributions ( is a small number from 3 to 5).
The mixture is weighted by the frequency with which each
of the Gaussians explains the background. In [11], a mixture
of three Gaussian distributions was used to model the pixel
value for traffic surveillance applications. The pixel inten-
sity was modeled as a weighted mixture of three Gaussian
distributions corresponding to road, shadow, and vehicle dis-
tribution. Adaptation of the Gaussian mixture models can be
achieved using an incremental version of the EM algorithm.
In [12], linear prediction using the Wiener filter is used to
predict pixel intensity given a recent history of values. The
prediction coefficients are recomputed each frame from the
sample covariance to achieve adaptivity. Linear prediction
using the Kalman filter was also used in [6]–[8].
All of the previously mentioned models are based on sta-
tistical modeling of pixel intensity with the ability to adapt
the model. While pixel intensity is not invariant to illumi-
nation changes, model adaptation makes it possible for such
techniques to adapt to gradual changes in illumination. On
the other hand, a sudden change in illumination presents a
challenge to such models.
Another approach to model a wide range of variations
in the pixel intensity is to represent these variations as dis-
crete states corresponding to modes of the environment, e.g.,
lights on/off or cloudy/sunny skies. Hidden Markov models
(HMMs) have been used for this purpose in [13] and [14].
In [13], a three-state HMM has been used to model the in-
tensity of a pixel for a traffic-monitoring application where
the three states correspond to the background, shadow, and
foreground.The use of HMMs imposes a temporalcontinuity
constraint on the pixel intensity, i.e., if the pixel is detected as
a part of the foreground, then it is expected to remain part of
the foreground for a period of time before switching back to
be part of the background. In [14], the topology of the HMM
representing global image intensity is learned while learning
the background. At each global intensity state, the pixel in-
tensity is modeled using a single Gaussian. It was shown that
the model is able to learn simple scenarios like switching the
lights on and off.
Alternatively, edge features have also been used to model
the background. The use of edge features to model the back-
ground is motivated by the desire to have a representation
of the scene background that is invariant to illumination
changes. In [15], foreground edges are detected by com-
paring the edges in each new frame with an edge map of the
background which is called the background “primal sketch.”
The major drawback of using edge features to model the
background is that it would only be possible to detect edges
of foreground objects instead of the dense connected regions
that result from pixel-intensity-based approaches. A fusion
of intensity and edge information was used in [16].
Block-based approaches have been also used for modeling
the background. Block matching has been extensively used
for change detection between consecutive frames. In [17],
each image block is fitto a second-order bivariatepolynomial
and the remaining variations are assumed to be noise. A sta-
tistical likelihood test is then used to detect blocks with sig-
nificant change. In [18], each block was represented with its
median template over the background learning period and its
block standard deviation. Subsequently, at each new frame,
each block is correlated with its corresponding template, and
blocks with too much deviation relative to the measured stan-
dard deviation are considered to be foreground. The major
drawback with block-based approaches is that the detection
unit is a whole image block and therefore they are only suit-
able for coarse detection.
ELGAMMAL et al.: MODELING USING NONPARAMETRIC KERNEL DENSITY ESTIMATION FOR VISUAL SURVEILLANCE 1153

In order to monitor wide areas with sufficient resolution,
cameras with zoom lenses are often mounted on pan-tilt plat-
forms. This enables high-resolution imagery to be obtained
from any arbitrary viewing angle from the location where
the camera is mounted. The use of background subtraction
in such situations requires a representation of the scene
background for any arbitrary pan-tilt-zoom combination,
which is an extension to the original background subtraction
concept with a stationary camera. In [19], image mosaicing
techniques are used to build panoramic representations of
the scene background. Alternatively, in [20], a represen-
tation of the scene background as a finite set of images
on a virtual polyhedron is used to construct images of the
scene background at any arbitrary pan-tilt-zoom setting.
Both techniques assume that the camera rotation is around
its optical axis and so that there is no significant motion
parallax.
B. Nonparametric Background Modeling
In this section, we describe a background model and a
background subtraction process that we have developed,
based on nonparametric kernel density estimation. The
model uses pixel intensity (color) as the basic feature for
modeling the background. The model keeps a sample of
intensity values for each pixel in the image and uses this
sample to estimate the density function of the pixel intensity
distribution. Therefore, the model is able to estimate the
probability of any newly observed intensity value. The
model can handle situations where the background of the
scene is cluttered and not completely static but contains
small motions that are due to moving tree branches and
bushes. The model is updated continuously and therefore
adapts to changes in the scene background.
1) Background Subtraction: Let
be a
sample of intensity values for a pixel. Given this sample,
we can obtain an estimate of the pixel intensity pdf at
any intensity value using kernel density estimation. Given
the observed intensity
at time , we can estimate the
probability of this observation as
(4)
where
is a kernel function with bandwidth . This esti-
mate can be generalized to use color features by using kernel
products as
(5)
where
is a -dimensional color feature and is a kernel
function with bandwidth
in the th color space dimension.
If we choose our kernel function
to be Gaussian, then the
density can be estimated as
(6)
Fig. 1. Background Subtraction. (a) Original image. (b) Estimated
probability image.
Using this probability estimate, the pixel is considered to
be a foreground pixel if
, where the threshold
is a global threshold over all the images that can be ad-
justed to achievea desired percentage of false positives. Prac-
tically, the probability estimation in (6) can be calculated in a
very fast way using precalculated lookup tables for the kernel
function values given the intensity value difference
and the kernel function bandwidth. Moreover, a partial eval-
uation of the sum in (6) is usually sufficient to surpass the
threshold at most image pixels, since most of the image is
typically from the background. This allows us to construct a
very fast implementation.
Since kernel density estimation is a general approach, the
estimate of (4) can converge to any pixel intensity density
function. Here, the estimate is based on the most recent
samples used in the computation. Therefore, adaptation of
the model can be achievedsimply by adding new samples and
ignoring older samples [21]. Fig. 1(b) shows the estimated
backgroundprobabilitywherebrighterpixelsrepresent lower
background probability pixels.
One major issue that needs to be addressed when using
kernel density estimation technique is the choice of suitable
kernel bandwidth (scale). Theoretically, as the number of
samples reaches infinity, the choice of the bandwidth is
insignificant and the estimate will approach the actual
density. Practically, since only a finite number of samples
are used and the computation must be performed in real
time, the choice of suitable bandwidth is essential. Too
small a bandwidth will lead to a ragged density estimate,
1154 PROCEEDINGS OF THE IEEE, VOL. 90, NO. 7, JULY 2002

while too wide a bandwidth will lead to an over-smoothed
density estimate [2]. Since the expected variations in pixel
intensity over time are different from one location to another
in the image, a different kernel bandwidth is used for each
pixel. Also, a different kernel bandwidth is used for each
color channel.
To estimate the kernel bandwidth
for the th color
channel for a given pixel, we compute the median absolute
deviation over the sample for consecutive intensity values
of the pixel. That is, the median
of for each
consecutive pair
in the sample is calculated inde-
pendently for each color channel. The motivation behind the
use of median of absolute deviation is that pixel intensities
over time are expected to have jumps because different
objects (e.g., sky, branch, leaf, and mixtures when an edge
passes through the pixel) are projected onto the same pixel at
different times. Since we are measuring deviations between
two consecutive intensity values, the pair
usually
comes from the same local-in-time distribution, and only
a few pairs are expected to come from cross distributions
(intensity jumps). The median is a robust estimate and
should not be affected by few jumps.
If we assume that this local-in-time distribution is
Gaussian
, then the distribution for the deviation
is also Gaussian . Since this distri-
bution is symmetric, the median of the absolute deviations
is equivalent to the quarter percentile of the deviation
distribution. That is,
and therefore the standard deviation of the first distribution
can be estimated as
Since the deviations are integer gray scale (color) values,
linear interpolation is used to obtain more accurate median
values.
2) Probabilistic Suppression of False Detection: In out-
door environments with fluctuating backgrounds, there are
two sources of false detections. First, there are false detec-
tions due to random noise which are expected to be homo-
geneous over the entire image. Second, there are false detec-
tions due to small movements in the scene background that
are not represented by the background model. This can occur
locally, for example, if a tree branch moves further than it
did during model generation. This can also occur globally in
the image as a result of small camera displacements caused
by wind load, which is common in outdoor surveillance and
causes many false detections. These kinds of false detections
are usually spatially clustered in the image, and they are not
easy to eliminate using morphological techniques or noise
filtering because these operations might also affect detection
of small and/or occluded targets.
If a part of the background (a tree branch, for example)
moves to occupy a new pixel, but it was not part of the model
for that pixel, then it will be detected as a foreground object.
However, this object will have a high probability of being
a part of the background distribution corresponding to its
original pixel. Assuming that only a small displacement can
occur between consecutive frames, we decide if a detected
pixel is caused by a background object that has moved by
considering the background distributions of a small neigh-
borhood of the detection location.
Let
be the observed value of a pixel detected as a
foreground pixel at time
. We define the pixel displacement
probability
to be the maximum probability that the
observed value,
, belongs to the background distribution of
some point in the neighborhood
of
where is the background sample for pixel , and the prob-
ability estimation
is calculated using the kernel
function estimation as in (6). By thresholding
for de-
tected pixels, we can eliminate many false detections due
to small motions in the background scene. To avoid losing
true detections that might accidentally be similar to the back-
ground of some nearby pixel (e.g., camouflaged targets), a
constraint is added that the whole detected foreground ob-
ject must have moved from a nearby location, and not only
some of its pixels. The component displacement probability
is defined to be the probability that a detected connected
component
has been displaced from a nearby location. This
probability is estimated by
For a connected component corresponding to a real target,
the probability that this component has displaced from the
background will be very small. So, a detected pixel
will be
considered to be a part of the background only if
.
Fig. 2 illustrates the effect of the second stage of detec-
tion. The result after the first stage is shown in Fig. 2(b).
In this example, the background has not been updated for
several seconds, and the camera has been slightly displaced
during this time interval, so we see many false detections
along high-contrast edges. Fig. 2(c) shows the result after
suppressing the detected pixels with high displacement prob-
ability. Most false detections due to displacement were elim-
inated, and only random noise that is uncorrelated with the
scene remains as false detections. However, some true de-
tected pixels were also lost. The final result of the second
stage of the detection is shown in Fig. 2(d), where the com-
ponent displacement probability constraint was added. Fig.
3(b) showsresults fora casewhere as a resultof the wind load
the camera is shaking slightly, resulting in a lot of clustered
false detections, especially on the edges. After probabilistic
suppression of false detection [Fig. 3(c)], most of these clus-
tered false detection are suppressed, while the small target on
the left side of the image remains.
ELGAMMAL et al.: MODELING USING NONPARAMETRIC KERNEL DENSITY ESTIMATION FOR VISUAL SURVEILLANCE 1155

Citations
More filters
Journal ArticleDOI

Object tracking: A survey

TL;DR: The goal of this article is to review the state-of-the-art tracking methods, classify them into different categories, and identify new trends to discuss the important issues related to tracking including the use of appropriate image features, selection of motion models, and detection of objects.
Journal ArticleDOI

ViBe: A Universal Background Subtraction Algorithm for Video Sequences

TL;DR: Efficiency figures show that the proposed technique for motion detection outperforms recent and proven state-of-the-art methods in terms of both computation speed and detection rate.

Image change detectio algorithms : A systematic survey

R. J. Radke
TL;DR: A systematic survey of the common processing steps and core decision rules in modern change detection algorithms, including significance and hypothesis testing, predictive models, the shading model, and background modeling is presented.
Journal ArticleDOI

Image change detection algorithms: a systematic survey

TL;DR: In this paper, the authors present a systematic survey of the common processing steps and core decision rules in modern change detection algorithms, including significance and hypothesis testing, predictive models, the shading model, and background modeling.
Journal ArticleDOI

A texture-based method for modeling the background and detecting moving objects

TL;DR: A novel and efficient texture-based method for modeling the background and detecting moving objects from a video sequence that provides many advantages compared to the state-of-the-art.
References
More filters
Proceedings ArticleDOI

Adaptive background mixture models for real-time tracking

TL;DR: This paper discusses modeling each pixel as a mixture of Gaussians and using an on-line approximation to update the model, resulting in a stable, real-time outdoor tracker which reliably deals with lighting changes, repetitive motions from clutter, and long-term scene changes.
Journal ArticleDOI

Pfinder: real-time tracking of the human body

TL;DR: Pfinder is a real-time system for tracking people and interpreting their behavior that uses a multiclass statistical model of color and shape to obtain a 2D representation of head and hands in a wide range of viewing conditions.
Proceedings ArticleDOI

Real-time tracking of non-rigid objects using mean shift

TL;DR: The theoretical analysis of the approach shows that it relates to the Bayesian framework while providing a practical, fast and efficient solution for real time tracking of non-rigid objects seen from a moving camera.
Book ChapterDOI

Non-parametric Model for Background Subtraction

TL;DR: A novel non-parametric background model that can handle situations where the background of the scene is cluttered and not completely static but contains small motions such as tree branches and bushes is presented.
Related Papers (5)
Frequently Asked Questions (12)
Q1. What are the contributions mentioned in the paper "Background and foreground modeling using nonparametric kernel density estimation for visual surveillance" ?

This paper focuses on two issues related to this problem. First, the authors construct a statistical representation of the scene background that supports sensitive detection of moving objects in the scene, but is robust to clutter arising out of natural scene variations. Second, the authors build statistical representations of the foreground regions ( moving objects ) that support their tracking and support occlusion reasoning. The authors accordingly utilize general nonparametric kernel density estimation techniques for building these statistical representations of the background and the foreground. 

Since color spaces are low in dimensionality, efficient computation of kernel density estimation for color pdfs can be achieved using the Fast Gauss Transform algorithm [34], [35]. 

The use of HMMs imposes a temporal continuity constraint on the pixel intensity, i.e., if the pixel is detected as a part of the foreground, then it is expected to remain part of the foreground for a period of time before switching back to be part of the background. 

The use of edge features to model the background is motivated by the desire to have a representation of the scene background that is invariant to illumination changes. 

The major drawback with block-based approaches is that the detection unit is a whole image block and therefore they are only suitable for coarse detection. 

The use of background subtraction in such situations requires a representation of the scene background for any arbitrary pan-tilt-zoom combination, which is an extension to the original background subtraction concept with a stationary camera. 

(c) detection using chromaticity coordinates (r; g) and the lightness variable, s.Although using chromaticity coordinates helps in the suppression of shadows, they have the disadvantage of losing lightness information. 

The Hydra system [36] tracks people in groups by tracking their heads based onthe silhouette of the foreground regions corresponding to the group. 

The major drawback of using edge features to model the background is that it would only be possible to detect edges of foreground objects instead of the dense connected regions that result from pixel-intensity-based approaches. 

One other important advantage of using kernel density estimation is that the adaptation of the model is trivial and can be achieved by adding new samples. 

A central issue in building a representation for the scene background is what features to use for this representation or, in other words, what to model in the background. 

Since kernel density estimation does not assume any specific underlying distribution and the estimate can converge to any density shape with enough samples, this approach is suitable to model the color distribution of regions with patterns and mixture of colors.