scispace - formally typeset
Open AccessBook ChapterDOI

Pedestrian Path Prediction with Recursive Bayesian Filters: A Comparative Study

Reads0
Chats0
TLDR
A comparative study on recursive Bayesian filters for pedestrian path prediction at short time horizons (< 2s) based on single dynamical models and Interacting Multiple Models combining several such basic models (constant velocity/acceleration/turn).
Abstract
In the context of intelligent vehicles, we perform a comparative study on recursive Bayesian filters for pedestrian path prediction at short time horizons (< 2s). We consider Extended Kalman Filters (EKF) based on single dynamical models and Interacting Multiple Models (IMM) combining several such basic models (constant velocity/acceleration/turn). These are applied to four typical pedestrian motion types (crossing, stopping, bending in, starting). Position measurements are provided by an external state-of-the-art stereo vision-based pedestrian detector. We investigate the accuracy of position estimation and path prediction, and the benefit of the IMMs vs. the simpler single dynamical models. Special care is given to the proper sensor modeling and parameter optimization. The dataset and evaluation framework are made public to facilitate benchmarking.

read more

Content maybe subject to copyright    Report

Pedestrian Path Prediction with Recursive
Bayesian Filters: A Comparative Study
N. Schneider
1,2
and D. M. Gavrila
1,2
1
Environment Perception, Daimler R&D, Ulm, Germany
2
Intelligent Systems Laboratory, Univ. of Amsterdam, The Netherlands
Abstract. In the context of intelligent vehicles, we perform a com-
parative study on recursive Bayesian filters for pedestrian path pre-
diction at short time horizons (< 2s). We consider Extended Kalman
Filters (EKF) based on single dynamical models and Interacting Multi-
ple Models (IMM) combining several such basic models (constant veloc-
ity/acceleration/turn). These are applied to four typical pedestrian mo-
tion types (crossing, stopping, bending in, starting). Position measure-
ments are provided by an external state-of-the-art stereo vision-based
pedestrian detector. We investigate the accuracy of position estimation
and path prediction, and the benefit of the IMMs vs. the simpler single
dynamical models. Special care is given to the proper sensor modeling
and parameter optimization. The dataset and evaluation framework are
made public to facilitate benchmarking.
1 Introduction
Pedestrian path prediction is an important problem in several application con-
texts, such as architecture, social robotics and intelligent vehicles. Here we con-
sider the intelligent vehicle context, in view of driver assistance and active pedes-
trian safety. Strong gains have been made over the years in improving computer
vision-based pedestrian recognition performance. This has culminated in first
active pedestrian safety systems reaching the market. For example, Mercedes-
Benz introduces in its 2013 E- and S-Class models a novel stereo-vision based
pedestrian system, which incorporates automatic full emergency braking.
A sophisticated situation assessment requires a precise estimation of the cur-
rent and future position of the pedestrian with respect to the moving vehicle. A
deviation of, say, 30 cm in the estimated lateral position of the pedestrian can
make all the difference, as this might place the pedestrian just inside or outside
the driving corridor. Current active pedestrian systems are typically designed
conservatively in their warning and control strategy, emphasizing the current
state rather than prediction, in order to avoid false system activations. Indeed,
pedestrian path prediction is a challenging problem, due to the highly dynamic
behavior of pedestrians. They can change their walking direction in an instance,
or start/stop walking abruptly. As a consequence, sensible prediction horizons
are typical short (we consider < 2s in this paper).

2 N. Schneider and D. M. Gavrila
Fig. 1: Four typical pedestrian motion types: bending in (top left), stopping (top right),
crossing (bottom left) and starting (bottom right) with detection bounding boxes.
There has been surprisingly little analysis in previous work of the accuracy of
pedestrian state estimation, let alone, that of prediction, in vehicle context. This
paper addresses this by providing a quantitative comparative study of recursive
Bayesian filters: we consider Extended Kalman Filters (EKF) based on single
dynamical models and Interacting Multiple Models (IMM) combining several
such basic models (constant velocity/acceleration/turn). These are applied to
four typical pedestrian motion types (crossing, stopping, bending in, starting),
see Fig. 1. Position measurements are provided by an external state-of-the-art
stereo vision-based pedestrian detector. The rationale for focusing on recursive
Bayesian filters in connection with modeling pedestrians as point targets is their
relatively good performance and low computational cost (especially important in
a vehicle context). We investigate the accuracy of position estimation and path
prediction, and the benefit of the IMMs vs. the simpler single dynamical models.
Special care is given to the proper sensor modeling and parameter optimization.
2 Previous Work
In this section, we focus on pedestrian state estimation based on parametric,
recursive Bayesian filters. For an overview of vision-based pedestrian detection
and tracking in more general context, see recent surveys (e.g. [7, 8]).
A popular choice for target state estimation is the Kalman Filter (KF). Its
applicability in real-time systems has been proven over many years for different
sensors and application domains [1, 3, 4, 18, 21]. State parameters (e.g. position,
velocity, acceleration) of the tracked target can be estimated with appropriate
dynamical and measurement models. The KF can further be used for prediction
by propagating the current state with the dynamical model without the inclu-
sion of new measurements. Work by [3] on FIR-based pedestrian tracking uses
a constant acceleration (CA) model in image space. Working in image space,
however, makes it difficult to incorporate prior knowledge on the dynamics of
pedestrian motion. Therefore, [2] track pedestrians on the ground plane using
a KF in an indoor, static stereo camera setup. The use of a linear KF in the

Pedestrian Path Prediction with Recursive Bayesian Filters 3
context of video-based pedestrian tracking in the world implies the use of 3D
pseudo-measurements (i.e. back projection of 2D measurements); this does not
account for the dependency of the longitudinal component of the noise on depth.
More accurate measurement models for the perspective projection of video
sensors can be incorporated by means of non-linear Extended (EKF) or Un-
scented (UKF) Kalman filters. [15] use a UKF in a mono camera setup to track
pedestrians on the ground plane (CV model). [19] apply the UKF to measure-
ments from a stereo camera system comparing three different dynamical models
(two CV and one constant position (CP) model) where two models have a state
space in world coordinates and one in image coordinates.
KF-based approaches have also been used for pedestrian state estimation
outside the video-only domain. [9] apply a CV model in a multi-sensor setup with
an IR camera and laser scanner. In a previous paper [18], they used two different
motion models (CA and CTRV), mentioning advantages of the latter model at
near-zero pedestrian speeds. Work by [21] considers a setting where pedestrians
wear electronic tags. It uses a KF with a turn motion model including orientation
and velocity in polar coordinates (CTRV).
Maneuvering targets can be elegantly accounted for mathematically by means
of the Interacting Multiple Model (IMM) framework [4, 13]. [11] use an IMM
(CP/CV) for analyzing walking vs. stopping pedestrian motion types from a
stereo vision sensor on-board a vehicle. [5] use an IMM combining eight CV
models with fixed velocities in eight directions. It further contains an online
adaption algorithm for the IMM transition probability matrix.
Within the class of non-parametric methods for pedestrian path predic-
tion and action classification, [11] proposes a probabilistic trajectory matching
method to estimate whether a pedestrian walking towards the curbside intends
to cross or not, when viewed from a stereo vision system on-board a vehicle. [12]
considers the complementary case of whether a pedestrian standing will start to
walk using a SVM-based classification approach, albeit from a static monocular
camera.
Quantitative evaluations of pedestrian state (position) estimation have been
few and limited. [3, 5, 9, 15, 18, 21] do not include any such evaluation. [2] provides
accuracy figures only related to its KF approach in indoor setting. [19] uses
simulated data to compare CV and CP KFs. Our paper contribution is a broad
quantitative study on pedestrian position estimation and path prediction using
parametric Bayesian recursive filters in vehicle context. Compared to [11], we
consider a wider range of pedestrian motion types. Whereas the IMM used by
[11] uses 3D pseudo measurements and KFs, we use a more accurate stereo sensor
modeling by EKFs.
3 Recursive Bayesian Filtering
3.1 Kalman Filter
The discrete-time KF estimates a state x(t) at time step t from measurement
z(t) and previous state x(t 1) with the dynamical model

4 N. Schneider and D. M. Gavrila
x(t) = Ax(t 1) + Bu(t 1) + ω(t 1) (1)
where the relation between measurement and state is given by
z(t) = Hx(t) + ν(t). (2)
A and B are transition matrices for the state x and the control input u, re-
spectively, ω(t 1) and ν(t) are white, zero-mean, uncorrelated noise processes
with covariances ω(t) (0, Q(t)) and ν(t) (0, R(t)). The filter process can be
described as cycle of the two steps prediction (predicting the state x(t 1) to
the next time step) and correction (updating the predicted state
ˆ
x(t) with the
current measurement) [20].
3.2 Interacting Multiple Model Kalman Filter
There are several KF extensions available to cover different motion types and
maneuvers (see [13] for an overview), the most common is the Interacting Mul-
tiple Model KF (IMM). The IMM models that there is a probability of p
ij
that
the tracking target makes a transition from one type of motion (i) to another
(j); these values are captured by the transition probability matrix (TPM). Each
iteration of the IMM consists of the three steps: interaction, filtering and model
probability update [4]. In the interaction step, the mixing probability µ
ij
(t 1)
(cond. probability that the target changed its type of motion) is calculated based
on model probabilities and the TPM to produce mixed state estimates
ˆ
x
0
j
(t 1)
and covariances
ˆ
P
0
j
(t 1) for all models j. The mixed states are used as input in
the filtering step where each model is predicted and updated with the standard
KF equations. In the last step, the model probabilities are updated based on the
measurement likelihood.
3.3 Measurement Model
Measurements come from a pedestrian detector applied on sequences recorded
with a stereo camera system. A measurement vector (dropping time index t
in the following) z = (u, d) is derived from the footpoint p
f
= (u, v) and the
median disparity d of a pedestrian bounding box. The relation of a point in the
image p
i
= (u, v) and its disparity d to a point p
c
= (x
c
, y
c
, z
c
) in the camera
coordinate system is given by the perspective camera model [1]:
u
v
d
=
h
1
(p
c
)
h
2
(p
c
)
h
3
(p
c
)
=
u
0
+
f
u
x
c
z
c
v
0
+
f
v
y
c
z
c
f
u
b
z
c
(3)
where f
u
=
f
s
u
and f
v
=
f
s
v
with focal length f, baseline b, horizontal and vertical
pixel width s
u
and s
v
, respectively. Eq. (3) leads to the nonlinear measurement
function h. For a position p
g
c
= (x
c
, z
c
) on the groundplane h
2
can be ignored.
To predict a measurement at time step t, the predicted state vector
ˆ
x (camera
coordinates) has to be projected into the measurement (image) space with
ˆ
z =
h(
ˆ
x). For the EKF we further need to calculate the Jacobian H =
h
x
.

Pedestrian Path Prediction with Recursive Bayesian Filters 5
Table 1: Mean sojourn times of different target dynamics in the training set (diagonals
P
i,i
of the TPM based on a cycle time of T 60 ms). “Straight Walking” consists of
the straight walking segments of starting, stopping and bending in sequences as well as
complete crossing sequences. “Maneuver” relates to all other segments. “Turning”, a
subset of “Maneuver”, relates to the turning segments within the bending in sequences.
Motion type mean sojourn time τ
i
(s) P
ii
(T ) = 1 T
i
Straight Walking 6.66 0.99
Maneuver 1.67 0.96
Turning 2.50 0.98
3.4 Dynamical Models
Several discretized continuous-time dynamical models are considered in this
study: the popular constant velocity (white noise acceleration) model (CV), the
constant acceleration (Wiener process acceleration) model (CA) and the con-
stant turn model (CT) with Cartesian state vector. These characterized by their
state vectors x, transition matrices A and process noise matrices Q. The CV
model state vector holds position and velocity (x = [x, z, v
x
, v
z
]), the CA model
further has acceleration (x = [x, z, v
x
, v
z
, a
x
, a
z
]) and the CT model turn rate
(x = [x, z, v
x
, v
z
, ω]) variables. For details, such as transition and process noise
matrices, see [4, 14].
Several approaches can be taken to specify the TPM. There is the ad-hoc
approach to fill the diagonals with values close to one. [4, 13] discuss the use of the
mean sojourn time (the mean time a target stays in a motion type) for the TPM.
Lastly, one could perform parameter optimization of the entries of the TPM
directly. In preliminary experiments, we obtained similar best performance with
the second and third approaches, thus we selected the sojourn time approach to
specify the TPM, derived from a training set, see Section 4 and Table 1.
3.5 Ego Motion Compensation
At each time step, the filter state is projected from the previous camera coor-
dinate system to the current one using the inertial motion matrix M
v
(vehicle
coordinates) based on velocity and yaw rate measured by on-board sensors. The
inverse ego motion homography matrix is given by M
c
= D
1
M
v
D (where D de-
fines the relation between camera and vehicle coordinate system). Translational
ego compensation is done using t
M
c
as control vector u (Eq. (1) with B = I
2x2
),
the ego rotation is integrated into the transition matrix A
e
(exemplary for the
CV model) [16]:
A
e
=
R
M
c
0
2x2
0
2x2
R
M
c
A (4)

Citations
More filters
Journal ArticleDOI

A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures

TL;DR: The LSTM cell and its variants are reviewed and their variants are explored to explore the learning capacity of the LSTm cell and the L STM networks are divided into two broad categories:LSTM-dominated networks and integrated LSTS networks.
Journal ArticleDOI

Human motion trajectory prediction: a survey:

TL;DR: In this article, the ability of intelligent autonomous systems to perceive, understand, and anticipate human behavior becomes increasingly important in a growing number of intelligent systems in human environments, and the ability to do so is discussed.
Journal ArticleDOI

Autonomous Vehicles That Interact With Pedestrians: A Survey of Theory and Practice

TL;DR: In this paper, the authors identify the major challenges that autonomous cars are facing today is driving in urban environments and propose future research directions, including design approaches for autonomous vehicles that communicate with pedestrians and visual perception and reasoning algorithms tailored to understanding pedestrian intention.
Posted Content

Autonomous Vehicles that Interact with Pedestrians: A Survey of Theory and Practice

TL;DR: This paper surveys pedestrian behavior studies, both the classical works on pedestrian–driver interaction and the modern ones that involve autonomous vehicles, to discuss various methods of studying pedestrian behavior and analyze how the factors identified in the literature are interrelated.
Journal ArticleDOI

Human Motion Trajectory Prediction: A Survey

TL;DR: A survey of human motion trajectory prediction can be found in this article, where the authors provide an overview of the existing datasets and performance metrics and discuss limitations of the state-of-the-art and outline directions for further research.
References
More filters
Proceedings ArticleDOI

Histograms of oriented gradients for human detection

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Journal ArticleDOI

Stereo Processing by Semiglobal Matching and Mutual Information

TL;DR: This paper describes the Semi-Global Matching (SGM) stereo method, which uses a pixelwise, Mutual Information based matching cost for compensating radiometric differences of input images and demonstrates a tolerance against a wide range of radiometric transformations.
Journal ArticleDOI

Pedestrian Detection: An Evaluation of the State of the Art

TL;DR: An extensive evaluation of the state of the art in a unified framework of monocular pedestrian detection using sixteen pretrained state-of-the-art detectors across six data sets and proposes a refined per-frame evaluation methodology.
BookDOI

An Introduction to the Kalman Filter

TL;DR: The discrete Kalman filter as mentioned in this paper is a set of mathematical equations that provides an efficient computational (recursive) means to estimate the state of a process, in a way that minimizes the mean of the squared error.
Book

Design and Analysis of Modern Tracking Systems

TL;DR: The Basics of Target Tracking and Multi Target Tracking with an Agile Beam Radar, and Multiple Hypothesis Tracking System Design and Application.
Related Papers (5)