What are the future works in "Open archive toulouse archive ouverte (oatao)" ?

The aim of such a system is to replace or complement in the future the old systems based on ILD.

What are the two kinds of shadows in the observed scene?

In the observed scene, there are two kinds of shadows: those that are stationary and created by road panels, and those moving and coming from swaying branches.

(Open Access) Automatic Vehicle Counting System for Traffic Monitoring (2016) | Alain Crouzil

Q: What is the novelty of the approach?

The novelty of their approach relies on a multi-cue background subtraction procedure in which the segmentation thresholds adapt robustly to illumination changes.

Q: Why is the covariance matrix considered as diagonal?

In order to simplify calculations, the covariance matrix is considered as diagonal because the three color channels are taken into account independently.

Q: What are the common types of sensors used to detect vehicles?

1Laser sensors: Laser sensors are applied to detect vehicles, to measure the distance between the sensor and the vehicles, and the speed and shape of the vehicles.

Q: What could be the way to improve the segmentation of vehicles?

They also point out that adding a higher-level model of vehicles could allow for better results, as these could help with bad segmentation situations.

Q: What is the detection process for each pixel?

The detection process consists in independently classifying every pixel in the object/ background classes, according to the current observations.

Q: What is the combination of motion detection and occlusion care?

This approach is combined with a motion detection procedure, which can adapt robustly to illumination changes, maintaining a high sensitivity to new incoming foreground objects.

To link to this article : DOI :

10.1117/1.JEI.25.5.051207

URL :

http://dx.doi.org/10.1117/1.JEI.25.5.051207

To cite this version :

Crouzil, Alain and Khoudour, Louahdi and

Valiere, Paul and Truong Cong, Dung Nghi Automatic Vehicle

Counting System for Traffic Monitoring. (2016) Journal of Electronic

Imaging, vol. 25 (n° 5). pp. 1-12. ISSN 1017-9909

Open Archive TOULOUSE Archive Ouverte (OATAO)

OATAO is an open access repository that collects the work of Toulouse researchers and

makes it freely available over the web where possible.

This is an author-deposited version published in : http://oatao.univ-

toulouse.fr/

Eprints ID : 16983

Any correspondence

concerning this service should be sent to the repository

administrator: staff-oatao@listes-diff.inp-toulouse.fr

Automatic vehicle counting system for traffic monitoring

Alain Crouzil,

* Louahdi Khoudour,

Paul Valiere,

and Dung Nghy Truong Cong

Université Paul Sabatier, Institut de Recherche en Informatique de Toulouse, 118 route de Narbonne, 31062 Toulouse Cedex 9, France

Center for Technical Studies of South West, ZELT Group, 1 avenue du Colonel Roche, 31400 Toulouse, France

Sopra Steria, 1 Avenue André-Marie Ampère, 31770 Colomiers, France

Ho Chi Minh City University of Technology, 268 Ly Thuong Kiet Street, 10th District, Ho Chi Minh City, Vietnam

Abstract. The article is dedicated to the presentation of a vision-based system for road vehicle counting and

classification. The system is able to achieve counting with a very good accuracy even in difficult scenarios linked

to occlusions and/or presence of shadows. The principle of the system is to use already installed cameras in road

networks without any additional calibration procedure. We propose a robust segmentation algorithm that detects

foreground pixels corresponding to moving vehicles. First, the approach models each pixel of the background

with an adaptive Gaussian distribution. This model is coupled with a motion detection procedure, which allows

correctly location of moving vehicles in space and time. The nature of trials carried out, including peak periods

and various vehicle types, leads to an increase of occlusions between cars and between cars and trucks. A

specific method for severe occlusion detection, based on the notion of solidity, has been carried out and tested.

Furthermore, the method developed in this work is capable of managing shadows with high resolution. The

related algorithm has been tested and compared to a classical method. Experimental results based on four

large datasets show that our method can count and classify vehicles in real time with a high level of performance

(>98%) under different environmental situations, thus performing better than the conventional inductive loop

detectors.

Keywords:

computer vision; tracking; traffic image analysis; traffic information systems.

1 Introduction

A considerable numbe r of technologies able to meas ure

traffic flows are available in the literature. Three of the

most established ones are summarized below.

Inductive loops detectors (ILD): The most deployed are

inductive loops installed on roads all over the world.

This kind of sensor presents some limitations linked to

the

following factors: electromagnetic fields, vehicles mov-

ing very slowly not taken into account (<5 km∕h), vehicles

close to each other, and very small vehicles. Furthermore, the

cost for installation and maintenance is very high.

Infrared detectors (IRDs): There are two main families

among the IRDs: passive IR sensors and active ones (emis-

sion and reception of a signal). This kind of sensor presents

low accuracy in terms of speed and flow. Furthermore, the

active IRDs do not allow detecting certain vehicles such as

two-wheeled or dark vehicles. They are also very susceptible

to rain.

Laser sensors: Laser sensors are applied to detect

vehicles, to measure the distance between the sensor and

the vehicles, and the speed and shape of the vehicles.

This kind of sensor does not allow detecting fast vehicles,

is susceptible to rain, and presents difficulty in detecting

two-wheeled vehicles.

A vision-based system is chosen here for several reasons:

the quality of data is much richer and more complete com-

pared to the information coming from radar, ILD, or lasers.

Furthermore, the computational power of contemporary com-

puters is able to meet the requirements of image processing.

In the literature, a great number of methods dealing with

vehicle classification using computer vision can be found. In

fact, the tools developed in this area are either industrial sys-

tems developed by companies like Citilog in France,

FLI

R Systems, Inc.,

or specific algorithms developed by

academic researchers. According to Ref. 4, many commer-

cial

ly available vision-based systems rely on simple process-

ing algorithms, such as virtual detectors, in a way similar to

ILD systems, with limited vehicle classification capabilities,

in contrast to more sophisticated academic developments.

5,6

This study presents the description of a vision-based sys-

tem to automatically obtain traffic flow data. This system

operates in real time and can work during challenging scenar-

ios in terms of weather conditions, with very low-cost cam-

eras, poor illumination, and in the presence of many shadows.

In addition, the system is conceived to work on the already

existing cameras installed by the transport operators.

Contemporary cameras are used for traffic surveillance or

detection capabilities like incident detections (counterflow,

stopped vehicles, and so on). The objective in this work is

to directly use the existing cameras without changing existing

parameters (orientation, focal lens, height, and so on). From a

user-needs analysis carried out with transport operators, the

system presented here is mainly dedicated to a vehicle count-

ing and classification for ring roads (cf. Fig.

1).

Recently, Unzueta et al.

published a study on the same

subject. The novelty of their approach relies on a multi-cue

background subtraction procedure in which the segmentation

thresholds adapt robustly to illumination changes. Even if the

results are very promising, the datasets used in the evaluation

*Address all correspondence to: Alain Crouzil, E-mail: alain.crouzil@irit.fr

phase are very limited (duration of 5 min.). Furthermor e, the

handling of severe occlus ions is out of the scope of his paper.

The novelty of our approach is threefold. (1) We propose

an approach for background subtraction, derived from

improved Gaussian mixture models (GMMs), in which

the update of the background is achieved recursively. This

approach is combined with a motion detection procedure,

which can adapt robustly to illumination changes, maintain-

ing a high sensitivity to new incoming foreground objects.

(2) We also propose an algorithm able to deal with strong,

moving casted shadows. One of the evaluation datasets is

specifically shadow-oriented. (3) Finally, a new algorithm

able to tackle the problems raised by severe occlusions

among cars, and between cars and trucks is proposed.

We include experimental results with varying weat her

conditions, on sunny days with moving directional shadows

and heavy traffic. We o btain vehicle counti ng and classifica-

tion results much better than those of ILD systems, which are

currently the most widely used systems for these types of

traffic measurements, while keeping the main advantages

of vision-based systems, i.e., not requiring the cumbe rsome

operation or installation of equipment at the roadside or the

need for additional technology such as laser scanners, tags,

or GPS.

2 Related Work

Robust background subtraction, shadows management, and

occlusion care are the three main scientific contributions of

our work.

2.1 Background Subtraction

The main aim of this section is to provide a brief summary of

the state-of-the-art moving object detection methods based

on a reference image. The existing methods of backgrou nd

subtraction can be divided according to two categories:

non-

param

etric and parametric methods. Parametric approaches

use a series of parameters that determines the characteristics

of the statistical functions of the model, whereas nonpara-

metric approaches automate the selection of the model

parameters as a function of the observed data during training.

2.1.1 Nonparametric methods

The classification procedure is generally divided into two

parts: a training period of time and a detection period.

The nonparametric methods are efficient when the training

period is sufficiently long. During this period, the setting

up of a background model consists in saving the possible

states of a pixel (intensity, color, and so on).

Median value model. This adaptive model was developed

by Greenhill et al. in Ref.

8 for moving objects extraction

during degraded illumination changes. Referring to the

different states of each pixel during a training period, a

background model is thus elaborated. The background is

continuously updated for every new frame so that a vector

of the median values (intensities, color, and so on) is built

from the N∕2 last frames, where N is the number of frames

used during the training period. The classification back-

ground/object is simply obtained by thresholding the dis-

tance between the value of the pixel to classify and its

counterpart in the background model . In order to take into

account the illumination changes, the threshold considers

the width of the interval containing the pixel values.

This method based on the median operator is more robust

than that based on running average.

Codebook. The codebook method is the most famous non-

parametric method. In Ref.

9, Kim et al. suggest modeling

the

background based on a sequence of observations of each

pixel during a period of several minutes. Then, similar occur-

rences of a given pixel are represented according to a vector

called codeword. Two codewords are considered as different

if the distance, in the vectorial space, exceeds a given thresh-

old. A codebook, which is a set of codewords, is built for

every pixel. The classification background/object is based

on a simple difference between the current value of each

pixel and each of the corresponding codewords.

2.1.2 Parametric methods

Most of the moving objects extraction methods are based on

the temporal evolution of each pixel of the image. A

sequence of frames is used to build a background model

for every pixel. Intensity, color, or some texture characteris-

tics could be used for the pixel. The detection process con-

sists in independently classifying every pixel in the object/

background classes, according to the current observations.

Gaussian model. In Ref.

10, Wren et al. suggest to

adapt

the threshold on each pixel by modeling the intensity

distribution for every pixel with a Gaussian distribution.

This model could adapt to slow changes in the scene, like

progressive illumination changes. The background is

updated recursively thanks to an adaptive filter. Different

extensions of this model were developed by changing the

characteristics at pixel level. Gordon et al.

represent

each

pixel with four components: the three color components

and the depth.

Gaussian mixture model. An improvement of the pre-

vious model consists in modeling the temporal evolution

with a GMM. Stauffer and Grimson

12,13

model the color

each pixel with a Gaussian mixture. The number of

Gaussians must be adjusted according to the complexity

of the scene. In order to simplify calculations, the covariance

matrix is consi dered as diagonal because the three color

channels are taken into account independently. The GMM

model is updated at each iteration using the k-mean algo-

rithm. Harville et al.

suggest to use GMM in a space com-

bining the depth and YUV space. They improve the method

by controlling the training rate according to the activity in the

scene. However, its response is very sensitive to sudden var-

iations of the background like global illumination changes. A

low training rate will produce numerous false detections dur-

ing an illumination change period, whereas a high training

rate will include moving objects in the background model.

Markov model. In order to consider the temporal evolu-

tion of a pixel, the order of arrival of the gray levels on

this pixel is useful information. A solution consists in mod-

eling the gray level evolution for each pixel by a Markov

chain. Rittscher et al.

use a Markov chain with three states:

object

, background, and shadow. All the parameters of the

chain, initial, transition, and observation probabilities, are

estimated off-line on a training sequence. Stenger et al.

pro-

posed an improvement, since after a short training period,

the model of the chain and its parameter s continues to be

updated. This update, carried out during the detection period,

allows us to better deal with the nonstationary states linked,

for example, to sudden illumination changes.

2.2 Shadow Removal

In the literature, several shadow detection methods exist,

and, hereunder, we briefly mention some of them.

In Ref. 17, Grest e t al. determine the shadow zones by

study

ing the correlation between a reference image and a

current image from two hypotheses. The first one states

that a pixel in a shadowed zone is darker than the same

pixel in an illuminated zone. The second one starts from

a correlation between the texture of a shadowed zone and

the same zone of the reference image. The study of Joshi

et al.

shows correlations between the current image and

the

background model using four parameters: intensity,

color, edges, and texture.

Avery et al.

determine the shadow zones with a region-

growing method. The starting point is located at the edge of

the segmented object. Its position is calculated thanks to the

sun position obtained from GPS data and time codes of the

sequence.

Song et al.

make the motion detection with Markov

chain models and detect shadows by adding different shadow

models.

Recent methods for both background subtraction and

shadow suppression mix multiple cues, such as edges and

color, to obtain more accurate segmentations. For instance,

Huerta et al.

apply heuristic rules by combining a conical

model

of brightness and chromaticity in the RGB color space

along with edge-based background subtraction, obtaining

better segmentation results than other previous state-of-

the-art approaches. They also point out that adding a

higher-level model of vehicles could allow for better results,

as these could help with bad segmentation situations. This

optimization is seen in Ref. 22, in which the size, position,

and orientation of a three-dimensional bounding box of a

vehicle, which includes shadow simulation from GPS

data, are optimized with respect to the segmented images.

Furthermore, it is shown in some examples that this approach

can improve the performance compared to using only

shadow detection or shadow simulation. Their improvement

is most evident when shadow detection or simulation is inac-

curate. However, a major drawback for this approach is the

initialization of the box, which can lead to severe failures.

Other shadow detection methods are described in recent

survey articles.

23,24

2.3 Occl

usion Management

Except when the camera is located above the road, with

perpendicular viewing to the road surface, when vehicles

are close, they partially occlude one another and correct

counting is difficult. The problem becomes harder when

the occlusion occurs as soon as the vehicles appear in the

field of view. Coifman et al.

propose tracking vehicle fea-

tures

and to group them by applying a common motion

constraint. However, this method fails when two vehicles

involved in an occlusion have the same motion. For example,

if one vehicle is closely following another, the latter partially

occludes the former and the two vehicles can move with the

same speed and their trajectory can be quite similar. This sit-

uation is usually observed when the traffic is too dense for

drivers to keep large spacings between vehicles and to avoid

occlusions, but not enough congested to make them con-

stantly change their velocity. Pang et al.

propose a threefold

method: a deformable model is geometrically fitted onto the

occluded vehicles; a contour description model is utilized to

describe the contour segments; a resolvability index is

assigned to each occluded vehicle. This method provides

very promising results in terms of counting capabilities.

Nonetheless, the method needs the camera to be calibrated

and the process is time-consuming.

3 Moving Vehicle Extraction and Counting

3.1 Synopsis

In this work, we have developed a system that automatically

detects and counts vehicles. The synopsis of the global proc-

ess is presented in Fig.

2. The proposed system consists of

ve main functions: motion detection, shadow removal,

occlusion management, vehicle tracking, and trajectory

counting.

The input of the system is, for instance, a video footage

(in the current version of the system, we use a prerecorded

video), while the output of the system is an absolute number

of vehicles. The following sections describe the different

processing steps of the counting system.

3.2 Motion Detection

Motion detection, which provides a classification of the pix-

els into either foreground or background, is a critical task in

many computer vision applications. A common approach

to detect moving objects is background subtraction, in

which each new frame is compared to the estimated back-

ground model.

Motion

detection

Shadow

removal

Occlusion

management

Vehicle

tracking

Trajectory

counting

Traffic

information

Video

Fig. 2 Synopsis

of the proposed system for vehicle counting.

Fig. 1 Some images shot by the existing CCTV system in suburban fast lanes at Toulouse in the

southwest of France.

Exterior environment conditions like illumination varia-

tions, casted shadows, and occlusions can affect motion

detection and lead to wrong counting results. In order to

deal with such particular problems, we propose an approach

based on an adaptive background subtraction algorithm

coupled with a motion detection module. The synopsis of

the proposed approach is shown in Fig. 3.

The first two steps, background subtraction and motion

detection, are independent and their outputs are combined

using the logical AND operator to get the motion detection

result. Then, an update operation is carried out. This ultimate

step is necessary for motion detection at the next iteration.

Those steps are detailed below.

3.2.1 Background subtraction using Gaussian

mixture model

The GMM method for background subtraction consists in

estimating a density function for each pixel. The pixel dis-

tribution is modeled as a mixture of N

Gaussians. The prob-

ability of occurrence of a color I

ðpÞ at the given pixel p is

estimated as

EQ-TARGET;temp:intralink-;e001;63;516P½I

ðpÞjI

$ ¼

i¼1

ðpÞη½I

ðpÞjμ

ðpÞ; Σ

ðpÞ$; (1)

where w

ðpÞ is the mixing weight of the i

th component at time

t, for pixel p (

i¼1

ðpÞ ¼ 1). Terms μ

ðpÞ and Σ

ðpÞ are

the estimates of the mean and the covariance matrix that

describe the i

th Gaussian component. Assuming that the

three color components are independent and have the same var-

iances, the covariance matrix is of the form Σ

ðpÞ ¼ σ

ðpÞI.

The current pixel p is associated with Gaussian compo -

nent k if kI

ðpÞ − μ

ðpÞk < S

ðpÞ, where S

is a multiply-

ing coefficient of the standard deviation of a given Gaussian.

The value of S

generally lies between 2.5 and 4, depending

on the variation of lighting condition of the scene. We fixed it

experimentally to 2.7.

For each pixel, the parameters of the matched component

k are then updated as follows (the pixel dependence has been

omitted for brevity):

EQ-TARGET;temp:intralink-;e002;63;301

1 −

t−1

;

ðσ

1 −

ðσ

t−1

ðI

− μ

;

¼ ð1 − αÞw

t−1

þ α;

(2)

where αðpÞ is the updating coefficient of pixel p. An updat-

ing matrix that defines the updating coefficient of each pixel

will be reestimated at the final stage of the motion detection

process.

For the other components that do not satisfy the above

condition, their weights are adjusted with

EQ-TARGET;temp:intralink-;e003;326;708w

¼ ð1 − αÞw

t−1

: (3)

If no matched component can be found, the component

with the least weight is replaced by a new component

with mean I

ðpÞ, an initial variance, and a small weight w

In order to determine whether p is a foreground pixel,

all components are first ranked according to the value

ðpÞ∕σ

ðpÞ. High-rank components, which have low var-

iances and high probabilities, are typical characteristics of

background. The first CðpÞ components describing the back-

ground are then selected by the following criterio n:

EQ-TARGET;temp:intralink-;e004;326;577CðpÞ ¼ arg min

CðpÞ

(

CðpÞ

i¼1

ðpÞ > S

)

; (4)

where S

is the rank threshold, which measures the mini-

mum portion of the components that should be accounted

for the background. The more complex the background

motion, the more the number of Gaussians needed and

the higher the value of S

Pixel p is declared as a background pixel if I

ðpÞ is asso-

ciated with one of the background components. Otherwise,

it is detected as a foreground pixel.

This moving object detection using GMM could also be

employed to detect motionless vehicles. Indeed, this func-

tionality dealing with safety is often questioned by transport

operators. In our ring road environment, our main concern is

to detect and count moving vehicles. Furthermore, we do not

consider traffic jam periods because, in this case, the vehicle

flow will decrease, and it is more useful to calculate the

density of vehicles.

3.2.2 Moving region detection

In order to produce better localizations of moving objects

and to eliminate all the regions that do not correspond to

the foreground, a second algorithm is combined with the

GMM method. This algorithm is much faster than the

first one and maintains the regions belonging to real moving

objects and eliminates noise and false detections. This mod-

ule looks into the difference among three consecutive frames.

This technique has the advantage of requiring very few

resources. The binary motion detection mask is defined by

EQ-TARGET;temp:intralink-;e005;326;224M

ðpÞ ¼

(

ðpÞ − I

t−1

ðpÞ − μ

> S

)

∪

(

t−1

ðpÞ − I

t−2

ðpÞ − μ

> S

)

; (5)

where I

ðpÞ is the gray level of pixel p at time t, μ

and σ

are

the mean and the standard deviation of jI

− I

t−1

j, and S

a threshold of the normalized image difference. The value

of S

has been experimentally defined to be 1.0 in our

application.

Moving

regions

Video

Model

updating

Background

subtraction

Moving region

detection

Fig. 3 Synopsis

of the motion detection module.

Automatic Vehicle Counting System for Traffic Monitoring

Figures

Citations

A probabilistic background model for tracking

Vehicle Counting Quantitative Comparison Using Background Subtraction, Viola Jones and Deep Learning Methods

A video-based vehicle counting system using an embedded device in realistic traffic conditions

Vehicle Classification in Traffic Environments Using the Growing Neural Gas

Computer vision supported pedestrian tracking: A demonstration on trail bridges in rural Rwanda.

References

A Computational Approach to Edge Detection

Adaptive background mixture models for real-time tracking

Pfinder: real-time tracking of the human body

Learning patterns of activity using real-time tracking

Minimum error thresholding

Related Papers (5)

Vision-based detection, tracking and classification of vehicles using stable features with automatic camera calibration

A real time vehicle counting based on adaptive tracking approach for highway videos

High-Level Traffic-Violation Detection for Embedded Traffic Analysis

Traffic video surveillance: Vehicle detection and classification

Investigation into Shadow Removal from Traffic Images

Frequently Asked Questions (9)

Q1. What contributions have the authors mentioned in the paper "Open archive toulouse archive ouverte (oatao)" ?

Q2. What are the future works in "Open archive toulouse archive ouverte (oatao)" ?

Q3. What is the novelty of the approach?

Q4. Why is the covariance matrix considered as diagonal?

Q5. What are the common types of sensors used to detect vehicles?

Q6. What are the two kinds of shadows in the observed scene?

Q7. What could be the way to improve the segmentation of vehicles?

Q8. What is the detection process for each pixel?

Q9. What is the combination of motion detection and occlusion care?