scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Hardware Architecture for Real-Time Video Segmentation Utilizing Memory Reduction Techniques

01 Feb 2009-IEEE Transactions on Circuits and Systems for Video Technology (IEEE--Institute of Electrical and Electronics Engineers Inc.)-Vol. 19, Iss: 2, pp 226-236
TL;DR: To achieve real-time performance with high resolution video streams, a dedicated hardware architecture with streamlined dataflow and memory access reduction schemes are developed to implement a video segmentation unit used for embedded automated video surveillance systems.
Abstract: This paper presents the implementation of a video segmentation unit used for embedded automated video surveillance systems. Various aspects of the underlying segmentation algorithm are explored and modifications are made with potential improvements of segmentation results and hardware efficiency. In addition, to achieve real-time performance with high resolution video streams, a dedicated hardware architecture with streamlined dataflow and memory access reduction schemes are developed. The whole system is implemented on a Xilinx field-programmable gate array platform, capable of real-time segmentation with VGA resolution at 25 frames per second. Substantial memory bandwidth reduction of more than 70% is achieved by utilizing pixel locality as well as wordlength reduction. The hardware platform is intended as a real-time testbench, especially for observations of long term effects with different parameter settings.

Summary (3 min read)

Introduction

  • A multimodal background is caused by repetitive background object motion, e.g., swaying trees, flickering of the monitor, etc. Furthermore, FPGAs gives us real time performance, hard to achieve with DSP processors, while limiting the extensive design work required for application specific integrated circuits .
  • In this work an automated surveillance system has been chosen as the target applications.
  • Sections II and III discuss the original algorithm and possible modifications for hardware efficiency.

II. GAUSSIAN MIXTURE BACKGROUND MODEL

  • For a more thorough description the authors refer to [13].
  • Each distribution , has a weight, , that indicates the probability of matching a new incoming pixel, .
  • The higher the weight, the more likely the distribution belongs to the background.
  • For those unmatched, the weight is updated according to (8) while the mean and the variance remain the same.
  • Instead of mainly focusing on improving robustness, the authors propose several modifications to the algorithm with the major concern on their impact on potentially improved hardware efficiency.

A. Color Space Transformation

  • Multimodal situations only occur when repetitive background objects are present in the scene.
  • By transforming RGB into space, the correlation among different color coordinates are mostly removed, resulting in nearly independent color components.
  • As shown in Fig. 2(a), most pixel distributions are transformed from cylinders back to spheres, capable of being modeled with a single spherical distribution.
  • The authors propose two simplifications to the algorithm.
  • The distribution with dominant weight but large variance does not get to the top, identified as background distribution.

III. HARDWARE ARCHITECTURE

  • To perform the algorithm with VGA resolution in real-time, a dedicated hardware architecture, with a streamlined data flow and memory bandwidth reduction schemes, is implemented to address the computation capacity and memory bandwidth bottlenecks.
  • Algorithm modifications covered in previous sections are implemented with potential benefits on hardware efficiency and segmentation quality.
  • The image data is captured with one color component at a time, and the three color components are sent to the system after serial-parallel transformation.
  • With the image data captured and transformed, the match and switch block tries to match the incoming pixel with Gaussian distributions obtained from the previous frame.
  • Similar updating schemes are utilized for variance update.

A. Sorting

  • The updated Gaussian parameters have to be sorted for use in the next frame.
  • By observing that only one Gaussian distribution is updated at a time and all the distributions are initially sorted, the sorting of Gaussian distributions can be changed to rearranging an updated distribution among ordered distributions.
  • The output of each comparator signifies which distribution is to be multiplexed to the output, e.g., if the weight of any unmatched distribution is smaller than the updated one, all unmatched distributions below the current one is switched to the output at the next lower MUX.
  • Since only three Gaussians are utilized in their implementation this is a trivial task.
  • A comparison of hardware complexity between proposed sorting architecture and other schemes mentioned above is shown in Authorized licensed use limited to: Lunds Universitetsbibliotek.

B. Wordlength Reduction

  • Slow background updating requires large dynamic range for each parameter in the distributions, since parameter values are changed slightly between frames but could accumulate over time.
  • Together with 16 bits weight and integer parts of the mean and the variance, 81–100 bits are needed for a single Gaussian distribution.
  • From (13), a small positive or negative number is derived depending on whether the incoming pixel is above or below the current mean.
  • The coarse updating scheme on the other hand relieves the problem to certain extent, where consecutive ones are added or subtracted to keep track of the relatively fast changes.

C. Pixel Locality

  • In addition to wordlength reduction, a data compression scheme for further bandwidth reduction is proposed by utilizing pixel locality for Gaussian distributions in adjacent areas.
  • The reason for such a criteria lies in the fact that a pixel that matches one distribution will most likely match the other.
  • Various threshold values are selected to evaluate the efficiency for the memory bandwidth reduction.
  • With foreground objects entering the scene, part of Gaussian distributions are replaced, which results in the decrease of number of similar Gaussian distributions.
  • Foreground objects activities can vary in different video scenes, e.g., continuous activities in Fig. 8(a) where people going up and down the stairs all the time, and the two peak activity periods around frames 600–900 and frames 2100–2500 in Fig. 8(b), where people walk by in two discrete time periods.

IV. RESULTS

  • The segmentation unit is prototyped on an Xilinx VirtexII vp30 development board, as shown in Fig. 11.
  • The 24 BRAMs used for the DDR controller can be reduced by using low depth Gaussian parameter buffers to write/read to the off-chip DDR memory.
  • Dual-port block RAMs are used as video RAMS in the VGA controller, which are shared by different blocks of the complete surveillance system to display the results from different stages on a monitor.
  • Thus, the memory requirements directly dedicated to the algorithm is low while the DDR and VGA controller utilize a substantial amount of memory.

V. CONCLUSION

  • By utilizing combined memory reduction schemes, off-chip memory access can be reduced by over 70%.
  • With real time performance, tracking schemes can be evaluated in varied environments for system robustness testing.
  • To address the issue a joint memory reduction scheme is proposed by utilizing pixel locality and wordlength reduction.
  • By measuring similarity of neighboring Gaussian distributions with overlapping volume of two cubes, threshold can be set to classify Gaussian similarities.
  • Careful tradeoffs should be made based on different application environments.

Did you find this useful? Give us your feedback

Figures (14)

Content maybe subject to copyright    Report

226 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 2, FEBRUARY 2009
A Hardware Architecture for Real-Time
Video Segmentation Utilizing
Memory Reduction Techniques
Hongtu Jiang, Håkan Ardö, and Viktor Öwall, Member, IEEE
Abstract—This paper presents the implementation of a video
segmentation unit used for embedded automated video surveil-
lance systems. Various aspects of the underlying segmentation
algorithm are explored and modifications are made with potential
improvements of segmentation results and hardware efficiency. In
addition, to achieve real-time performance with high resolution
video streams, a dedicated hardware architecture with streamlined
dataflow and memory access reduction schemes are developed.
The whole system is implemented on a Xilinx field-programmable
gate array platform, capable of real-time segmentation with VGA
resolution at 25 frames per second. Substantial memory band-
width reduction of more than 70% is achieved by utilizing pixel
locality as well as wordlength reduction. The hardware platform
is intended as a real-time testbench, especially for observations of
long term effects with different parameter settings.
Index Terms—Field-programmable gate array (FPGA), mixture
of Gaussian (MoG), video segmentation.
I. INTRODUCTION
A
UTOMATED video surveillance systems have been
gaining substantial interests in the research community in
recent years. This is partially due to the progress in technology
scaling that enables more robust yet computationally intensive
algorithms to be realized with reasonable performance. The
advantage of surveillance automation over traditional closed
circuit TV (CCTV)-based system lies in the fact that it is a self
contained system capable of automatic information extraction,
e.g., detection of moving objects and tracking. The result is a
fully or semi automated surveillance system, with the potential
of increased usage of mounted cameras and the reduced cost of
human resources for observing the output. Typical applications
may include both civilian and military scenarios, e.g., traffic
control, security surveillance in banks or antiterrorism.
Manuscript received March 26, 2007; revised February 08, 2008. First pub-
lished December 09, 2008; current version published January 30, 2009. This
work was supported in part by VINNOVA Competence Center for Circuit De-
sign (CCCD). This paper was recommended by Associate Editor C. N. Taylor.
H. Jiang was with the Department of Electrical and Information Technology,
Lund University, SE-22100 Lund, Sweden. He is now with the Ericsson Mobile
Platforms, SE-22370 Lund, Sweden (e-mail: Hongtu.Jiang@ericsson.com).
H. Ardö is with the Centre for Mathematical Sciences Lund University,
SE-22100 Lund, Sweden (e-mail: ardo@maths.lth.se).
V. Öwall is with the Department of Electrical and Information Technology,
Lund University, SE-22100 Lund, Sweden (e-mail: viktor.owall@eit.lth.se)
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TCSVT.2008.2009244
Crucial to all these applications is the quality of the video
segmentation, which is a process of extracting objects of in-
terest (foreground) from an irrelevant background scene. The
foreground information, usually composed of moving objects,
is passed on to later analysis units, where objects are tracked and
their activities are analyzed. A wide range of segmentation al-
gorithms have been proposed in the literatures, with robustness
aimed for different situations. In [1], comparisons on segmen-
tation qualities are made to evaluate a variety of approaches.
Table I shows five segmentation algorithms that are cited by
many literatures, namely frame difference (FD) [2]–[5], median
filter [6]–[8], linear predictive filter (LPF) [1], [9]–[12], mix-
ture of Gaussian (MoG) [13]–[19] and kernel density estima-
tion (KDE) [20]. Using FD, background/foreground detection
is achieved by simply observing the difference of the pixels be-
tween two adjacent frames. By setting a threshold value, a pixel
is identified as foreground if the difference is higher than the
threshold value, otherwise background. The simplicity of the
algorithm comes at the cost of the segmentation quality. In gen-
eral, bigger regions than the actual moving part are detected as
foreground area. Also it fails to detect inner pixels of a large,
uniformly-colored moving object, a problem known as aper-
ture effect [3]. As more sophisticated algorithms are utilized
aiming for improved robustness and segmentation quality, the
complexity of realizing such systems increases.
In fact, no perfect system exists to handle all kinds of issues
within different background models. For realistic implementa-
tion of such systems, trade-offs have to be made between system
robustness (quality) and system performance (frame rate, res-
olution, etc.). A background model based on pixel wise mul-
timodal Gaussian distribution was proposed in [13] with ro-
bustness to multimodal background situations, which are quite
common in both indoor and outdoor environments. From Table I
it can be seen that the KDE approach has the highest segmen-
tation quality which however comes at the cost of a high hard-
ware complexity and even to a larger extent, increased memory
requirements. These facts, has led us to choose the MoG ap-
proach for the developed system. A multimodal background is
caused by repetitive background object motion, e.g., swaying
trees, flickering of the monitor, etc. As a pixel lying in the re-
gion where repetitive motion occurs, its value will consist of two
or more background colors, i.e., the RGB value of that specific
pixel changes over time. This will result in false foreground de-
tection in most other approaches. Various modifications to the
algorithm for potential improvements are reported [14]–[19].
However, none of these works address the issue of algorithm
1051-8215/$25.00 © 2009 IEEE
Authorized licensed use limited to: Lunds Universitetsbibliotek. Downloaded on February 16, 2009 at 03:01 from IEEE Xplore. Restrictions apply.

JIANG et al.: HARDWARE ARCHITECTURE FOR REAL-TIME VIDEO SEGMENTATION 227
TABLE I
C
OMPARISON OF
DIFFERENT
SEGMENTATION
ALGORITHMS
performance in terms of meeting real-time requirements with
reasonable resolution. In [13], only a frame rate of 11–13 fps is
obtained even for a small frame size of 160
120 on an SGI O2
workstation. In our software implementation on an AMD 4400+
processor, a frame rate of 4–6 fps is observed for video se-
quences with 352
288 resolution. In addition to performance
issues, we have found no studies on possible algorithm modifi-
cations that could lead to potentially better hardware efficiency.
In this paper, we present a dedicated hardware architecture ca-
pable of real-time segmentation with VGA resolution at 25 fps.
Preliminary results of the architecture were presented in [21].
The architecture is implemented on Xilinx VirtexII pro Vp30
FPGA platform together with a variety of memory access reduc-
tion schemes, which results in more than 70% memory band-
width reduction. Furthermore, various modifications to the al-
gorithm are made, with potential improvements of hardware ef-
ficiency.
The choice of an FPGA as the target platform is mainly moti-
vated from the possibility to perform algorithm changes in late
stages of the system development, provided by the reconfig-
urability aspect. Furthermore, FPGAs gives us real time per-
formance, hard to achieve with DSP processors, while limiting
the extensive design work required for application specific inte-
grated circuits (ASICs). A rather early overview (1999) of com-
puter vision algorithms on FPGAs can be found in [22] while
a more recent evaluation can be found in [23]. Even with the
difference of 6 years, one common conclusion is that dedicated
architectures are in many cases needed to achieve required per-
formance and that reconfigurable platforms are a good way to
achieve it at reasonable design time and cost. Even though ad-
vanced EDA flows exist, there is a considerable amount of re-
quired knowledge regarding hardware architecture design, often
not available in image processing groups. A cooperation be-
tween theoretical and hardware research is therefore a healthy
mix. In this work an automated surveillance system has been
chosen as the target applications. We have found no hardware
architectures for the chosen algorithm that can be used as di-
rect comparison. However, other applications in vision systems
include robotic control [24], [25], medical imaging [26], and
stereo processing [27]. One common challenge is memory size
and bandwidth, as in the presented design often solved with ex-
ternal memories.
The paper is organized as follows. Sections II and III dis-
cuss the original algorithm and possible modifications for
hardware efficiency. The hardware architecture is presented
in Section IV, together with the memory bandwidth reduction
scheme explained in detail. Finally, the results and conclusions
are covered in Sections V and VI.
II. G
AUSSIAN MIXTURE BACKGROUND MODEL
In this section the used algorithm is briefly described, for a
more thorough description we refer to [13]. The algorithm is for-
mulated as follows: Measured from consecutive video frames,
the values of a particular pixel over time can be regarded as a sto-
chastic process. At any time
, what is observed for a particular
pixel at
, is a collection of the most recent measurements
over time
(1)
where
is the image sequence. To model such a process, a
Gaussian distribution can be used. Characterized by its mean
and variance values, the distribution represents a location cen-
tered at its mean values in the RGB color space, where the pixel
value is most likely to be observed over frames. A pixel con-
taining several background object colors, e.g., the leaves of a
swaying tree and a road, can be modeled with a mixture of
Gaussian distributions. Each distribution
, has a weight, ,
that indicates the probability of matching a new incoming pixel,
. The probability of observing the current pixel value is
(2)
where
is the number of Gaussian distributions and is a
Gaussian probability density function. Furthermore,
is the
weighting factor,
is the mean value and is the covari-
ance matrix of the
th Gaussian at time , which takes the form
of
(3)
is determined by available resources concerning hardware
complexity and memory resources.
A match is defined as the incoming pixel within
times
the standard deviation off the center. In [13]
is set to 2.5,
a value that has also been used in our application. The higher
the weight, the more likely the distribution belongs to the back-
ground. The portion of the Gaussian distributions belonging to
the background is defined to be
(4)
where min is used to calculate the minimum number of Gaussian
distributions,
, that satisfies the condition in which the sum of
the weights,
, is less than predefined parameter , i.e., mea-
suring the minimum portion of the distributions that should be
Authorized licensed use limited to: Lunds Universitetsbibliotek. Downloaded on February 16, 2009 at 03:01 from IEEE Xplore. Restrictions apply.

228 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 2, FEBRUARY 2009
Fig. 1. Background pixel distributions in the RGB color space. Instead of Sphere like distributions, each pixel cluster is rather cylindrical.
accounted for by the background. If a small value is chosen for
, the background model is usually unimodal.
If a match is found, the matched distribution is updated as
(5)
(6)
(7)
where
, are the mean and standard deviation respectively
while
is the learning factors and . The
mean, variance and weight factors are updated frame by frame.
For those unmatched, the weight is updated according to
(8)
while the mean and the variance remain the same. If none of
the distributions are matched, the one with the lowest weight is
replaced by a distribution with the incoming pixel value as its
mean, a low weight and a large variance.
A
LGORITHM MODIFICATIONS
The algorithm works efficiently only in controlled environ-
ments. Issues regarding algorithm weaknesses in different situa-
tions are addressed in many publications [14]–[16], [18]. In this
section, instead of mainly focusing on improving robustness, we
propose several modifications to the algorithm with the major
concern on their impact on potentially improved hardware effi-
ciency.
A. Color Space Transformation
In theory, multimodal situations only occur when repetitive
background objects are present in the scene. However, this is
not always true in practice. Consider an indoor environment
where the illumination comes from a fluorescence lamp. An ex-
ample of a video sequence from such an environment is taken
given recorded in our lab, where five pixels are picked up evenly
from the scene and measured over time. Their RGB value dis-
tributions are drawn in Fig. 1(a). From the figure it can be seen
that instead of five sphere like pixel distributions, the shapes
of the pixel clusters are rather cylindrical. Pixel values tend to
jump around more in one direction than another in the presence
of illumination variations caused by the fluorescence lamp and
camera jitter. This should be distinguished from the situation
where one sphere distribution is moving slowly towards one di-
rection due to slight daylight changes. Such a case is handled by
updating the corresponding mean values in the original back-
ground model. Without an upper bound for the variance, the
sphere describing the distribution tends to grow until it covers
nearly every pixel in the most distributed direction, thus taking
up a large space such that most of it does not belong to the
distribution [sphere A in Fig. 1(b)]. A simple solution to this
problem is to set an upper limit on the variance, e.g., the max-
imum value of the variance in the least distributed direction.
The result is multimodal distributions represented as a series
of smaller spheres (spheres B-E in the same figure). Although
a background pixel distribution is modeled more precisely by
such a method, additional Gaussian distributions are inferred
which are hardware costly in terms of extra parameter update
and storage. In [28] D. Magee proposed a cylindrical model to
address the issue with primary axes of all distribution cylinders
pointing at the origin. However, more parameters are needed for
each cylindrical distribution than for the spherical counterpart,
i.e., the parameters of a cylindrical distribution contains dis-
tance, two angles, a diameter and the height. Furthermore, it is a
hardware costly computation to transform RGB values to cylin-
drical coordinates, e.g., division and square root. In addition, not
every distribution cylinder is oriented towards the origin, see the
left middle distribution in Fig. 1(a).
To be able to model background pixels using a single distri-
bution without extensive hardware overhead, color space trans-
formation has been investigated. Both HSV and
spaces
are investigated and their corresponding distributions are shown
in Fig. 2. By transforming RGB into
space, the correla-
tion among different color coordinates are mostly removed, re-
sulting in nearly independent color components. With varying
illumination environment, the Y component (intensity) varies
Authorized licensed use limited to: Lunds Universitetsbibliotek. Downloaded on February 16, 2009 at 03:01 from IEEE Xplore. Restrictions apply.

JIANG et al.: HARDWARE ARCHITECTURE FOR REAL-TIME VIDEO SEGMENTATION 229
Fig. 2. Five distributions in
YC C
and HSV color spaces. (a). Most pixel distributions are transformed from cylindrical pixel distributions in RGB color space
into sphere like pixel distributions in
YC
C
color space. This is due to the correlation that exists among different color components in RGB color space is almost
removed in
YC
C
color space. (b). With correlated color space as well, HSV color space is no better than RGB color space, unpredictable pixel distributions
appears occasionally.
the most accordingly, leaving and components (chro-
maticity) more or less independent. In [29], this property is uti-
lized for shadow reduction. Consequently, values of three inde-
pendent components of a pixel in
color space tends to
spread equally. As shown in Fig. 2(a), most pixel distributions
are transformed from cylinders back to spheres, capable of being
modeled with a single spherical distribution. The transformation
from RGB to
is linear, and can be calculated according
to the following:
(9)
(10)
(11)
Only minor hardware overhead with a few extra multipliers and
adders are introduced, where multiplication with constants can
be further utilized to reduce hardware complexity. Simplifica-
tions can be performed to further reduce the number of multi-
plications to 4 [30]. The HSV color space, on the other hand,
also with correlated coordinates, is no better than RGB color
space if not worse. Unpredictable pixel clusters appeared oc-
casionally as shown in Fig. 2(b), which is impossible to model
using Gaussian distributions.
Color space transformation has also been performed on out-
door video sequences where similar results have been observed.
These results are also in line with [28].
Algorithm Simplifications
We propose two simplifications to the algorithm. In the
original algorithm specification, unbounded growing distribu-
tion will absorb more pixels. As a result, the weight of that
distribution will soon dominate all others. To overcome this, in
[13], all updated Gaussian distributions are sorted according
to the ratio
. In this way, the distribution with dominant
weight but large variance does not get to the top, identified as
background distribution. In our approach, with
color
space transformation, no upper bound is needed. All distribu-
tions can be simply sorted by their weights only, effectively
eliminating division operations in the implementation.
Another simplification made in the process of foreground/
background detection is that instead of using (4), the determi-
nation can be made by checking the weight of each distribution
separately. This is due to the fact that one pixel cluster will not
spread out in several distributions by the color space transforma-
tion to
. The set of distributions belonging to the back-
ground is modified to be
(12)
III. H
ARDWARE ARCHITECTURE
To perform the algorithm with VGA resolution in real-time,
a dedicated hardware architecture, with a streamlined data flow
and memory bandwidth reduction schemes, is implemented
to address the computation capacity and memory bandwidth
bottlenecks. Due to memory bandwidth limitations, only three
Gaussians are utilized which has been shown to be sufficient
in our test sequences and aught to be sufficient for many
environments, i.e., two background and one foreground distri-
bution. The architecture in itself does not put any restriction
on the number of distribution which could be increased if the
environment so requires. Algorithm modifications covered
in previous sections are implemented with potential benefits
on hardware efficiency and segmentation quality. This is a
large improvement to the previous work [31], [32], where
only 352
288 resolution is achieved without any memory
reduction schemes and algorithm modifications. In this section,
a thorough description of the architecture of the segmentation
unit is given, followed by detailed discussions of the memory
reduction schemes.
To give a better understanding of the architecture, a simpli-
fied conceptual block diagram of the whole system is given in
Fig. 3 to illustrate the dataflow within the system. The system
Authorized licensed use limited to: Lunds Universitetsbibliotek. Downloaded on February 16, 2009 at 03:01 from IEEE Xplore. Restrictions apply.

230 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 2, FEBRUARY 2009
Fig. 3. Conceptual block diagram of the segmentation unit.
starts at the CMOS image sensor capturing video sequence in
real-time and feeding it to the system through a sensor interface.
The interface is designed to be able to control the parameters
of the image sensor at run-time, e.g., analog gain and integra-
tion time, so that better image quality can be obtained within
different environments. The sensor interface is also responsible
for sampling the image data transferred from off-chip. In our im-
plementation, an over sampling scheme by a higher clock fre-
quency (100Mhz) is used to ensure the accuracy of the image
data. The image data is captured with one color component at
a time, and the three color components are sent to the system
after serial-parallel transformation. To handle different clock
frequencies, input FIFOs, implemented as distributed RAMs,
are used to interface to both the segmentation logic and the VGA
controller where the original video data can be monitored on
a screen. The RGB values are converted into
compo-
nents before entering the segmentation logic block, according
to the algorithm modification above. Each pixel has a number
of corresponding Gaussian distributions and the parameters of
those are stored in off-chip memories (DDR SDRAM) due the
overall amount of data. For each pixel, the Gaussian parameters
are read from the DDR SDRAM and decoded by the param-
eter encoder/decoder. The match and switch unit checks if the
incoming pixel matches any of the existing distributions. The
output from this unit is reordered Gaussian distributions with
the matching distribution switched to a specific port. The match
and switch block is mainly composed of comparators and mul-
tiplexer. The Gaussian distributions are updated according to
the algorithm modifications presented in the previous section.
From this point foreground/background detection can start by
checking the weight of the updated matched Gaussian distribu-
tion. The output is a binary stream to be multiplexed to the mon-
itor, indicating foreground and background pixels with white
and black colors. The updated Gaussian parameters have to be
sorted for use in the next frame, and all distributions should
be ordered according to their weight. This is implemented in
a dedicated sorting network that will be covered in more detail
in Section V. To reduce the heavy memory bandwidth incurred
by accessing off-chip DDR SDRAM that stores one frame of
Gaussian distributions, an encoding/decoding block is designed
by utilizing pixel localities in succeeding neighboring pixels.
This is covered in more detail in Section IV-C.
In the following, implementation details of the architec-
ture shown in Fig. 4 are explained with an emphasis on the
parts with algorithm modifications, indicated by shaded area.
With the image data captured and transformed, the match and
switch block tries to match the incoming pixel with Gaussian
distributions obtained from the previous frame. To avoid the
competition between several Gaussian distributions matching
the incoming pixel, only the one with highest likelihood (large
weight) is selected as the matching distribution. A matched
Gaussian is switched to the bottom (3 in the figure). In case no
matching distribution is found, a No_match signal is asserted. If
there is a match, a parameter update should be performed. For
the matched Gaussian distribution, a proposed updating scheme
is implemented with only incrementers/decrementers for the
mean and variance values. Depending on whether the incoming
value is larger than the mean values, a addition or
subtraction is applied for parameter updating. Similar updating
schemes are utilized for variance update. The proposed param-
eter update results in low hardware complexity by replacing
the hardware costly operations in (6) and (7), e.g., square and
multiplication with large wordlength with incrementer/decre-
menter. Other benefits of the proposed updating schemes are
described in detail in Section IV-B. For the case that no match
is found, a MUX is used together with the No_match signal to
update all parameters for the distribution (3 in the figure) with
predefined values.
A. Sorting
The updated Gaussian parameters have to be sorted for use in
the next frame. In order to reduce hardware complexity found
in parallel sorting networks, such as [33]–[35], while still main-
taining the speed, a specific feature in the algorithm is explored.
By observing that only one Gaussian distribution is updated at
a time and all the distributions are initially sorted, the sorting
of
Gaussian distributions can be changed to rearranging an
updated distribution among
ordered distributions. As
a result, both the number of sorting stages and the number of
comparators are reduced to only one sorting stage with
comparators and MUXes, resulting in both increased speed
and reduced area. The architecture for a sorting network for
five Gaussians is shown in Fig. 5, five distributions are used in-
stead of three to get a more generalized architecture. From the
figure, all unmatched ordered Gaussian distributions are com-
pared with the updated distribution, i.e., three in the figure. The
output of each comparator signifies which distribution is to be
multiplexed to the output, e.g., if the weight of any unmatched
distribution is smaller than the updated one, all unmatched dis-
tributions below the current one is switched to the output at the
next lower MUX. This architecture scales very easily to support
sorting more distributions since the number of stages will not
increase accordingly. Since only three Gaussians are utilized
in our implementation this is a trivial task. However, if future
implementations require more Gaussians per pixel the sorting
architecture will be useful to reduce hardware complexity. A
comparison of hardware complexity between proposed sorting
architecture and other schemes mentioned above is shown in
Authorized licensed use limited to: Lunds Universitetsbibliotek. Downloaded on February 16, 2009 at 03:01 from IEEE Xplore. Restrictions apply.

Citations
More filters
Journal ArticleDOI
TL;DR: Two hardware implementations of the OpenCV version of the Gaussian mixture model (GMM), a background identification algorithm, are proposed, able to perform real-time background identification on high definition (HD) video sequences with frame size 1920 × 1080.
Abstract: Background identification is a common feature in many video processing systems. This paper proposes two hardware implementations of the OpenCV version of the Gaussian mixture model (GMM), a background identification algorithm. The implemented version of the algorithm allows a fast initialization of the background model while an innovative, hardware-oriented, formulation of the GMM equations makes the proposed circuits able to perform real-time background identification on high definition (HD) video sequences with frame size 1920 × 1080. The first of the two circuits is designed with commercial field-programmable gate-array (FPGA) devices as target. When implemented on Virtex6 vlx75t, the proposed circuit process 91 HD fps (frames per second) and uses 3% of FPGA logic resources. The second circuit is oriented to the implementation in UMC-90 nm CMOS standard cell technology, and is proposed in two versions. Both versions can process at a frame rate higher than 60 HD fps. The first version uses the constant voltage scaling technique to provide a low power implementation. It provides silicon area occupation of 28847 μm2 and energy dissipation per pixel of 15.3 pJ/pixel. The second version is designed to reduce silicon area utilization and occupies 21847 μm2 with an energy dissipation of 49.4 pJ/pixel.

84 citations


Cites background or methods from "A Hardware Architecture for Real-Ti..."

  • ...In [25], the processing capability only reaches 7....

    [...]

  • ...References [24] and [25] used 140 bits per pixel, while [27] proposed the use of 116 bits....

    [...]

  • ...The segmentation circuit of [23] is improved in [24] and [25]....

    [...]

  • ...as shown in Section VII-B and reported in [25], can be fed with the luminance channel of the YCrCb color space....

    [...]

  • ...The circuit of [24] is improved in [25]....

    [...]

Journal ArticleDOI
TL;DR: The proposed circuit is optimized to perform real time processing of HD video sequences (1,920 × 1,080 @ 20 fps) when implemented on FPGA devices and uses an optimized fixed width representation of the data and implements high performance arithmetic circuits.
Abstract: The identification of moving objects is a basic step in computer vision. The identification begins with the segmentation and is followed by a denoising phase. This paper proposes the FPGA hardware implementation of segmentation and denoising unit. The segmentation is conducted using the Gaussian mixture model (GMM), a probabilistic method for the segmentation of the background. The denoising is conducted implementing the morphological operators of erosion, dilation, opening and closing. The proposed circuit is optimized to perform real time processing of HD video sequences (1,920 × 1,080 @ 20 fps) when implemented on FPGA devices. The circuit uses an optimized fixed width representation of the data and implements high performance arithmetic circuits. The circuit is implemented on Xilinx and Altera FPGA. Implemented on xc5vlx50 Virtex5 FPGA, it can process 24 fps of an HD video using 1,179 Slice LUTs and 291 Slice Registers; the dynamic power dissipation is 0.46 mW/MHz. Implemented on EP2S15F484C3 StratixII, it provides a maximum working frequency of 44.03 MHz employing 5038 Logic Elements and 7,957 flip flop with a dynamic power dissipation of 4.03 mW/MHz.

58 citations


Cites background or methods from "A Hardware Architecture for Real-Ti..."

  • ...[28, 29], Minghua and Bermak [30] and Genovese et al....

    [...]

  • ...[28, 29] and Minghua and Bermak [30] are not able to process 41....

    [...]

  • ...[29] improves the memory throughput with respect to [28] employing a memory reduction scheme....

    [...]

Journal ArticleDOI
TL;DR: A background subtraction scheme, which models the thermal responses of each pixel as a mixture of Gaussians with unknown number of components, and follows a Bayesian approach, which permits the system to be automatically adapted to dynamically changing operation conditions.
Abstract: Detection of moving objects in videos is a crucial step toward successful surveillance and monitoring applications. A key component for such tasks is called background subtraction and tries to extract regions of interest from the image background for further processing or action. For this reason, its accuracy and real-time performance are of great significance. Although effective background subtraction methods have been proposed, only a few of them take into consideration the special characteristics of thermal imagery. In this paper, we propose a background subtraction scheme, which models the thermal responses of each pixel as a mixture of Gaussians with unknown number of components. Following a Bayesian approach, our method automatically estimates the mixture structure, while simultaneously it avoids over-/underfitting. The pixel density estimate is followed by an efficient and highly accurate updating mechanism, which permits our system to be automatically adapted to dynamically changing operation conditions. We propose a reference implementation of our method in reconfigurable hardware achieving both adequate performance and low-power consumption. Adopting a high-level synthesis design and demanding floating point arithmetic operations are mapped in reconfigurable hardware, demonstrating fast prototyping and on-field customization at the same time.

33 citations


Cites background from "A Hardware Architecture for Real-Ti..."

  • ...In the work of [33] and the later improvement of [34] the authors propose a real-time video segmentation/surveillance system using a GMM also handling memory bandwidth reduction requirements....

    [...]

Journal ArticleDOI
TL;DR: This work proposes a hardware computing engine to perform background subtraction on low-cost field programmable gate arrays (FPGAs), focused on resource-limited environments, based on the codebook algorithm and offers very low accuracy degradation.
Abstract: Object detection and tracking are main tasks in video surveillance systems. Extracting the background is an intensive task with high computational cost. This work proposes a hardware computing engine to perform background subtraction on low-cost field programmable gate arrays (FPGAs), focused on resource-limited environments. Our approach is based on the codebook algorithm and offers very low accuracy degradation. We have analyzed resource consumption and performance trade-offs in Spartan-3 FPGAs by Xilinx. In addition, an accuracy evaluation with standard benchmark sequences has been performed, obtaining better results than previous hardware approaches. The implementation is able to segment objects in sequences with resolution $$768\times 576$$ at 50 fps using a robust and accurate approach, and an estimated power consumption of 5.13 W.

30 citations


Cites methods from "A Hardware Architecture for Real-Ti..."

  • ...Keywords Field programmable gate arrays · Fixedpoint arithmetic · Real time image processing · Video surveillance...

    [...]

  • ...…a spatial background subtraction technique (Jodoin et al 2007), a DSP-embedded implementation (Ierodiaconou et al 2006), a memory reduction scheme (Jiang et al 2009) or static algorithms where it is assumed that the background is fixed as the one proposed by Horprasert et al. (Karaman et al…...

    [...]

Proceedings ArticleDOI
11 Sep 2011
TL;DR: Compared to a Triple Modular Redundancy (TMR) fault tolerance technique, the stochastic architecture tolerates substantially more soft errors with lower power consumption and consumes less hardware and power compared to a conventional (nonstochastic) implementation.
Abstract: The kernel density estimation (KDE)-based image segmentation algorithm has excellent segmentation performance. However, this algorithm is computational intensive. In addition, although this algorithm can tolerant noise in the input images, such as the noise due to snow, rain, or camera shaking, it is sensitive to the noise from the internal computing circuits, such as the noise due to soft errors or PVT (process, voltage, and temperature) variation. Tolerating this kind of noise becomes more and more important as device scaling continues to nanoscale dimensions. Stochastic computing, which uses streams of random bits (stochastic bits streams) to perform computation with conventional digital logic gates, can guarantee reliable computation using unreliable devices. In this paper, we present a stochastic computing implementation of the KDE-based image segmentation algorithm. Our experimental results show that, under the same time constraint, the stochastic implementation is much more tolerant of faults and consumes less hardware and power compared to a conventional (nonstochastic) implementation. Furthermore, compared to a Triple Modular Redundancy (TMR) fault tolerance technique, the stochastic architecture tolerates substantially more soft errors with lower power consumption.

27 citations

References
More filters
Book
01 Jan 1986
TL;DR: In this paper, the authors propose a recursive least square adaptive filter (RLF) based on the Kalman filter, which is used as the unifying base for RLS Filters.
Abstract: Background and Overview. 1. Stochastic Processes and Models. 2. Wiener Filters. 3. Linear Prediction. 4. Method of Steepest Descent. 5. Least-Mean-Square Adaptive Filters. 6. Normalized Least-Mean-Square Adaptive Filters. 7. Transform-Domain and Sub-Band Adaptive Filters. 8. Method of Least Squares. 9. Recursive Least-Square Adaptive Filters. 10. Kalman Filters as the Unifying Bases for RLS Filters. 11. Square-Root Adaptive Filters. 12. Order-Recursive Adaptive Filters. 13. Finite-Precision Effects. 14. Tracking of Time-Varying Systems. 15. Adaptive Filters Using Infinite-Duration Impulse Response Structures. 16. Blind Deconvolution. 17. Back-Propagation Learning. Epilogue. Appendix A. Complex Variables. Appendix B. Differentiation with Respect to a Vector. Appendix C. Method of Lagrange Multipliers. Appendix D. Estimation Theory. Appendix E. Eigenanalysis. Appendix F. Rotations and Reflections. Appendix G. Complex Wishart Distribution. Glossary. Abbreviations. Principal Symbols. Bibliography. Index.

16,062 citations

Proceedings ArticleDOI
23 Jun 1999
TL;DR: This paper discusses modeling each pixel as a mixture of Gaussians and using an on-line approximation to update the model, resulting in a stable, real-time outdoor tracker which reliably deals with lighting changes, repetitive motions from clutter, and long-term scene changes.
Abstract: A common method for real-time segmentation of moving regions in image sequences involves "background subtraction", or thresholding the error between an estimate of the image without moving objects and the current image. The numerous approaches to this problem differ in the type of background model used and the procedure used to update the model. This paper discusses modeling each pixel as a mixture of Gaussians and using an on-line approximation to update the model. The Gaussian, distributions of the adaptive mixture model are then evaluated to determine which are most likely to result from a background process. Each pixel is classified based on whether the Gaussian distribution which represents it most effectively is considered part of the background model. This results in a stable, real-time outdoor tracker which reliably deals with lighting changes, repetitive motions from clutter, and long-term scene changes. This system has been run almost continuously for 16 months, 24 hours a day, through rain and snow.

7,660 citations


"A Hardware Architecture for Real-Ti..." refers background or methods in this paper

  • ...To overcome this, in [13], all updated Gaussian distributions are sorted according to the ratio ....

    [...]

  • ...In this section the used algorithm is briefly described, for a more thorough description we refer to [13]....

    [...]

  • ...In [13], only a frame rate of 11–13 fps is obtained even for a small frame size of 160 120 on an SGI O2 workstation....

    [...]

  • ...A background model based on pixel wise multimodal Gaussian distribution was proposed in [13] with robustness to multimodal background situations, which are quite common in both indoor and outdoor environments....

    [...]

  • ...Table I shows five segmentation algorithms that are cited by many literatures, namely frame difference (FD) [2]–[5], median filter [6]–[8], linear predictive filter (LPF) [1], [9]–[12], mixture of Gaussian (MoG) [13]–[19] and kernel density estimation (KDE) [20]....

    [...]

Proceedings ArticleDOI
30 Apr 1968
TL;DR: To achieve high throughput rates today's computers perform several operations simultaneously; not only are I/O operations performed concurrently with computing, but also, in multiprocessors, several computing operations are done concurrently.
Abstract: To achieve high throughput rates today's computers perform several operations simultaneously. Not only are I/O operations performed concurrently with computing, but also, in multiprocessors, several computing operations are done concurrently. A major problem in the design of such a computing system is the connecting together of the various parts of the system (the I/O devices, memories, processing units, etc.) in such a way that all the required data transfers can be accommodated. One common scheme is a high-speed bus which is time-shared by the various parts; speed of available hardware limits this scheme. Another scheme is a cross-bar switch or matrix; limiting factors here are the amount of hardware (an m × n matrix requires m × n cross-points) and the fan-in and fan-out of the hardware.

2,553 citations


"A Hardware Architecture for Real-Ti..." refers background in this paper

  • ...In order to reduce hardware complexity found in parallel sorting networks, such as [33]–[35], while still maintaining the speed, a specific feature in the algorithm is explored....

    [...]

Book ChapterDOI
26 Jun 2000
TL;DR: A novel non-parametric background model that can handle situations where the background of the scene is cluttered and not completely static but contains small motions such as tree branches and bushes is presented.
Abstract: Background subtraction is a method typically used to segment moving regions in image sequences taken from a static camera by comparing each new frame to a model of the scene background. We present a novel non-parametric background model and a background subtraction approach. The model can handle situations where the background of the scene is cluttered and not completely static but contains small motions such as tree branches and bushes. The model estimates the probability of observing pixel intensity values based on a sample of intensity values for each pixel. The model adapts quickly to changes in the scene which enables very sensitive detection of moving targets. We also show how the model can use color information to suppress detection of shadows. The implementation of the model runs in real-time for both gray level and color imagery. Evaluation shows that this approach achieves very sensitive detection with very low false alarm rates.

2,432 citations


"A Hardware Architecture for Real-Ti..." refers background in this paper

  • ...From Table I it can be seen that the KDE approach has the highest segmentation quality which however comes at the cost of a high hardware complexity and even to a larger extent, increased memory requirements....

    [...]

  • ...Table I shows five segmentation algorithms that are cited by many literatures, namely frame difference (FD) [2]–[5], median filter [6]–[8], linear predictive filter (LPF) [1], [9]–[12], mixture of Gaussian (MoG) [13]–[19] and kernel density estimation (KDE) [20]....

    [...]

Proceedings ArticleDOI
01 Sep 1999
TL;DR: This work develops Wallflower, a three-component system for background maintenance that is shown to outperform previous algorithms by handling a greater set of the difficult situations that can occur.
Abstract: Background maintenance is a frequent element of video surveillance systems. We develop Wallflower, a three-component system for background maintenance: the pixel-level component performs Wiener filtering to make probabilistic predictions of the expected background; the region-level component fills in homogeneous regions of foreground objects; and the frame-level component detects sudden, global changes in the image and swaps in better approximations of the background. We compare our system with 8 other background subtraction algorithms. Wallflower is shown to outperform previous algorithms by handling a greater set of the difficult situations that can occur. Finally, we analyze the experimental results and propose normative principles for background maintenance.

1,971 citations


"A Hardware Architecture for Real-Ti..." refers background in this paper

  • ...Table I shows five segmentation algorithms that are cited by many literatures, namely frame difference (FD) [2]–[5], median filter [6]–[8], linear predictive filter (LPF) [1], [9]–[12], mixture of Gaussian (MoG) [13]–[19] and kernel density estimation (KDE) [20]....

    [...]

  • ...In [1], comparisons on segmentation qualities are made to evaluate a variety of approaches....

    [...]

Frequently Asked Questions (15)
Q1. What contributions have the authors mentioned in the paper "A hardware architecture for real-time video segmentation utilizing memory reduction techniques" ?

This paper presents the implementation of a video segmentation unit used for embedded automated video surveillance systems. Various aspects of the underlying segmentation algorithm are explored and modifications are made with potential improvements of segmentation results and hardware efficiency. 

By utilizing coarse parameter updating scheme, wordlength for each Gaussian parameters are reduced substantially, which effectively decrease the memory bandwidth to off-chip memories. 

In their implementation, an over sampling scheme by a higher clock frequency (100Mhz) is used to ensure the accuracy of the image data. 

A pixel containing several background object colors, e.g., the leaves of a swaying tree and a road, can be modeled with a mixture of Gaussian distributions. 

In addition to wordlength reduction, a data compression scheme for further bandwidth reduction is proposed by utilizing pixel locality for Gaussian distributions in adjacent areas. 

By observing that only one Gaussian distribution is updated at a time and all the distributions are initially sorted, the sorting of Gaussian distributions can be changed to rearranging an updated distribution among ordered distributions. 

By only saving non overlapping distributions together with the number of equivalent succeeding distributions, memory bandwidth is reduced. 

For the implementation of the hardware units, memory usage is identified as the mainbottleneck of the whole system, which is common in many image processing systems. 

The 24 BRAMs used for the DDR controller can be reduced by using low depth Gaussian parameter buffers to write/read to the off-chip DDR memory. 

with the primary goal to reduce wordlength, the coarse parameter updating scheme results in limited improvements to the segmentation results. 

For the case that no match is found, a MUX is used together with the No_match signal to update all parameters for the distribution (3 in the figure) with predefined values. 

Algorithm modifications covered in previous sections are implemented with potential benefits on hardware efficiency and segmentation quality. 

To be able to model background pixels using a single distribution without extensive hardware overhead, color space transformation has been investigated. 

Restrictions apply.DDR controller contributes to a large part of the whole design due to complicated memory command and data signal manipulations, clock schemes, and buffer controls. 

it is a hardware costly computation to transform RGB values to cylindrical coordinates, e.g., division and square root.