Pedestrian Path Prediction with Recursive Bayesian Filters: A Comparative Study

doi:10.1007/978-3-642-40602-7_18

Pedestrian Path Prediction with Recursive

Bayesian Filters: A Comparative Study

N. Schneider

1,2

and D. M. Gavrila

1,2

1

Environment Perception, Daimler R&D, Ulm, Germany

2

Intelligent Systems Laboratory, Univ. of Amsterdam, The Netherlands

Abstract. In the context of intelligent vehicles, we perform a com-

parative study on recursive Bayesian ﬁlters for pedestrian path pre-

diction at short time horizons (< 2s). We consider Extended Kalman

Filters (EKF) based on single dynamical models and Interacting Multi-

ple Models (IMM) combining several such basic models (constant veloc-

ity/acceleration/turn). These are applied to four typical pedestrian mo-

tion types (crossing, stopping, bending in, starting). Position measure-

ments are provided by an external state-of-the-art stereo vision-based

pedestrian detector. We investigate the accuracy of position estimation

and path prediction, and the beneﬁt of the IMMs vs. the simpler single

dynamical models. Special care is given to the proper sensor modeling

and parameter optimization. The dataset and evaluation framework are

made public to facilitate benchmarking.

1 Introduction

Pedestrian path prediction is an important problem in several application con-

texts, such as architecture, social robotics and intelligent vehicles. Here we con-

sider the intelligent vehicle context, in view of driver assistance and active pedes-

trian safety. Strong gains have been made over the years in improving computer

vision-based pedestrian recognition performance. This has culminated in ﬁrst

active pedestrian safety systems reaching the market. For example, Mercedes-

Benz introduces in its 2013 E- and S-Class models a novel stereo-vision based

pedestrian system, which incorporates automatic full emergency braking.

A sophisticated situation assessment requires a precise estimation of the cur-

rent and future position of the pedestrian with respect to the moving vehicle. A

deviation of, say, 30 cm in the estimated lateral position of the pedestrian can

make all the diﬀerence, as this might place the pedestrian just inside or outside

the driving corridor. Current active pedestrian systems are typically designed

conservatively in their warning and control strategy, emphasizing the current

state rather than prediction, in order to avoid false system activations. Indeed,

pedestrian path prediction is a challenging problem, due to the highly dynamic

behavior of pedestrians. They can change their walking direction in an instance,

or start/stop walking abruptly. As a consequence, sensible prediction horizons

are typical short (we consider < 2s in this paper).

2 N. Schneider and D. M. Gavrila

Fig. 1: Four typical pedestrian motion types: bending in (top left), stopping (top right),

crossing (bottom left) and starting (bottom right) with detection bounding boxes.

There has been surprisingly little analysis in previous work of the accuracy of

pedestrian state estimation, let alone, that of prediction, in vehicle context. This

paper addresses this by providing a quantitative comparative study of recursive

Bayesian ﬁlters: we consider Extended Kalman Filters (EKF) based on single

dynamical models and Interacting Multiple Models (IMM) combining several

such basic models (constant velocity/acceleration/turn). These are applied to

four typical pedestrian motion types (crossing, stopping, bending in, starting),

see Fig. 1. Position measurements are provided by an external state-of-the-art

stereo vision-based pedestrian detector. The rationale for focusing on recursive

Bayesian ﬁlters in connection with modeling pedestrians as point targets is their

relatively good performance and low computational cost (especially important in

a vehicle context). We investigate the accuracy of position estimation and path

prediction, and the beneﬁt of the IMMs vs. the simpler single dynamical models.

Special care is given to the proper sensor modeling and parameter optimization.

2 Previous Work

In this section, we focus on pedestrian state estimation based on parametric,

recursive Bayesian ﬁlters. For an overview of vision-based pedestrian detection

and tracking in more general context, see recent surveys (e.g. [7, 8]).

A popular choice for target state estimation is the Kalman Filter (KF). Its

applicability in real-time systems has been proven over many years for diﬀerent

sensors and application domains [1, 3, 4, 18, 21]. State parameters (e.g. position,

velocity, acceleration) of the tracked target can be estimated with appropriate

dynamical and measurement models. The KF can further be used for prediction

by propagating the current state with the dynamical model without the inclu-

sion of new measurements. Work by [3] on FIR-based pedestrian tracking uses

a constant acceleration (CA) model in image space. Working in image space,

however, makes it diﬃcult to incorporate prior knowledge on the dynamics of

pedestrian motion. Therefore, [2] track pedestrians on the ground plane using

a KF in an indoor, static stereo camera setup. The use of a linear KF in the

Pedestrian Path Prediction with Recursive Bayesian Filters 3

context of video-based pedestrian tracking in the world implies the use of 3D

pseudo-measurements (i.e. back projection of 2D measurements); this does not

account for the dependency of the longitudinal component of the noise on depth.

More accurate measurement models for the perspective projection of video

sensors can be incorporated by means of non-linear Extended (EKF) or Un-

scented (UKF) Kalman ﬁlters. [15] use a UKF in a mono camera setup to track

pedestrians on the ground plane (CV model). [19] apply the UKF to measure-

ments from a stereo camera system comparing three diﬀerent dynamical models

(two CV and one constant position (CP) model) where two models have a state

space in world coordinates and one in image coordinates.

KF-based approaches have also been used for pedestrian state estimation

outside the video-only domain. [9] apply a CV model in a multi-sensor setup with

an IR camera and laser scanner. In a previous paper [18], they used two diﬀerent

motion models (CA and CTRV), mentioning advantages of the latter model at

near-zero pedestrian speeds. Work by [21] considers a setting where pedestrians

wear electronic tags. It uses a KF with a turn motion model including orientation

and velocity in polar coordinates (CTRV).

Maneuvering targets can be elegantly accounted for mathematically by means

of the Interacting Multiple Model (IMM) framework [4, 13]. [11] use an IMM

(CP/CV) for analyzing walking vs. stopping pedestrian motion types from a

stereo vision sensor on-board a vehicle. [5] use an IMM combining eight CV

models with ﬁxed velocities in eight directions. It further contains an online

adaption algorithm for the IMM transition probability matrix.

Within the class of non-parametric methods for pedestrian path predic-

tion and action classiﬁcation, [11] proposes a probabilistic trajectory matching

method to estimate whether a pedestrian walking towards the curbside intends

to cross or not, when viewed from a stereo vision system on-board a vehicle. [12]

considers the complementary case of whether a pedestrian standing will start to

walk using a SVM-based classiﬁcation approach, albeit from a static monocular

camera.

Quantitative evaluations of pedestrian state (position) estimation have been

few and limited. [3, 5, 9, 15, 18, 21] do not include any such evaluation. [2] provides

accuracy ﬁgures only related to its KF approach in indoor setting. [19] uses

simulated data to compare CV and CP KFs. Our paper contribution is a broad

quantitative study on pedestrian position estimation and path prediction using

parametric Bayesian recursive ﬁlters in vehicle context. Compared to [11], we

consider a wider range of pedestrian motion types. Whereas the IMM used by

[11] uses 3D pseudo measurements and KFs, we use a more accurate stereo sensor

modeling by EKFs.

3 Recursive Bayesian Filtering

3.1 Kalman Filter

The discrete-time KF estimates a state x(t) at time step t from measurement

z(t) and previous state x(t − 1) with the dynamical model

4 N. Schneider and D. M. Gavrila

x(t) = Ax(t − 1) + Bu(t − 1) + ω(t − 1) (1)

where the relation between measurement and state is given by

z(t) = Hx(t) + ν(t). (2)

A and B are transition matrices for the state x and the control input u, re-

spectively, ω(t − 1) and ν(t) are white, zero-mean, uncorrelated noise processes

with covariances ω(t) ∼ (0, Q(t)) and ν(t) ∼ (0, R(t)). The ﬁlter process can be

described as cycle of the two steps prediction (predicting the state x(t − 1) to

the next time step) and correction (updating the predicted state

ˆ

x(t) with the

current measurement) [20].

3.2 Interacting Multiple Model Kalman Filter

There are several KF extensions available to cover diﬀerent motion types and

maneuvers (see [13] for an overview), the most common is the Interacting Mul-

tiple Model KF (IMM). The IMM models that there is a probability of p

ij

that

the tracking target makes a transition from one type of motion (i) to another

(j); these values are captured by the transition probability matrix (TPM). Each

iteration of the IMM consists of the three steps: interaction, ﬁltering and model

probability update [4]. In the interaction step, the mixing probability µ

ij

(t − 1)

(cond. probability that the target changed its type of motion) is calculated based

on model probabilities and the TPM to produce mixed state estimates

ˆ

x

0

j

(t − 1)

and covariances

ˆ

P

0

j

(t −1) for all models j. The mixed states are used as input in

the ﬁltering step where each model is predicted and updated with the standard

KF equations. In the last step, the model probabilities are updated based on the

measurement likelihood.

3.3 Measurement Model

Measurements come from a pedestrian detector applied on sequences recorded

with a stereo camera system. A measurement vector (dropping time index t

in the following) z = (u, d) is derived from the footpoint p

f

= (u, v) and the

median disparity d of a pedestrian bounding box. The relation of a point in the

image p

i

= (u, v) and its disparity d to a point p

c

= (x

c

, y

c

, z

c

) in the camera

coordinate system is given by the perspective camera model [1]:





u

v

d





=





h

1

(p

c

)

h

2

(p

c

)

h

3

(p

c

)





=







u

0

+

f

u

x

c

z

c

v

0

+

−f

v

y

c

z

c

f

u

b

z

c







(3)

where f

u

=

f

s

u

and f

v

=

f

s

v

with focal length f, baseline b, horizontal and vertical

pixel width s

u

and s

v

, respectively. Eq. (3) leads to the nonlinear measurement

function h. For a position p

g

c

= (x

c

, z

c

) on the groundplane h

2

can be ignored.

To predict a measurement at time step t, the predicted state vector

ˆ

x (camera

coordinates) has to be projected into the measurement (image) space with

ˆ

z =

h(

ˆ

x). For the EKF we further need to calculate the Jacobian H =

∂h

∂x

.

Pedestrian Path Prediction with Recursive Bayesian Filters 5

Table 1: Mean sojourn times of diﬀerent target dynamics in the training set (diagonals

P

i,i

of the TPM based on a cycle time of T ≈ 60 ms). “Straight Walking” consists of

the straight walking segments of starting, stopping and bending in sequences as well as

complete crossing sequences. “Maneuver” relates to all other segments. “Turning”, a

subset of “Maneuver”, relates to the turning segments within the bending in sequences.

Motion type mean sojourn time τ

i

(s) P

ii

(T ) = 1 − T /τ

i

Straight Walking 6.66 0.99

Maneuver 1.67 0.96

Turning 2.50 0.98

3.4 Dynamical Models

Several discretized continuous-time dynamical models are considered in this

study: the popular constant velocity (white noise acceleration) model (CV), the

constant acceleration (Wiener process acceleration) model (CA) and the con-

stant turn model (CT) with Cartesian state vector. These characterized by their

state vectors x, transition matrices A and process noise matrices Q. The CV

model state vector holds position and velocity (x = [x, z, v

x

, v

z

]), the CA model

further has acceleration (x = [x, z, v

x

, v

z

, a

x

, a

z

]) and the CT model turn rate

(x = [x, z, v

x

, v

z

, ω]) variables. For details, such as transition and process noise

matrices, see [4, 14].

Several approaches can be taken to specify the TPM. There is the ad-hoc

approach to ﬁll the diagonals with values close to one. [4, 13] discuss the use of the

mean sojourn time (the mean time a target stays in a motion type) for the TPM.

Lastly, one could perform parameter optimization of the entries of the TPM

directly. In preliminary experiments, we obtained similar best performance with

the second and third approaches, thus we selected the sojourn time approach to

specify the TPM, derived from a training set, see Section 4 and Table 1.

3.5 Ego Motion Compensation

At each time step, the ﬁlter state is projected from the previous camera coor-

dinate system to the current one using the inertial motion matrix M

v

(vehicle

coordinates) based on velocity and yaw rate measured by on-board sensors. The

inverse ego motion homography matrix is given by M

c

= D

−1

M

v

D (where D de-

ﬁnes the relation between camera and vehicle coordinate system). Translational

ego compensation is done using t

M

c

as control vector u (Eq. (1) with B = I

2x2

),

the ego rotation is integrated into the transition matrix A

e

(exemplary for the

CV model) [16]:

A

e

=



R

M

c

0

2x2

0

2x2

R

M

c



A (4)

Pedestrian Path Prediction with Recursive Bayesian Filters: A Comparative Study

Figures

Citations

A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures

Human motion trajectory prediction: a survey:

Autonomous Vehicles That Interact With Pedestrians: A Survey of Theory and Practice

Autonomous Vehicles that Interact with Pedestrians: A Survey of Theory and Practice

Human Motion Trajectory Prediction: A Survey

References

Histograms of oriented gradients for human detection

Stereo Processing by Semiglobal Matching and Mutual Information

Pedestrian Detection: An Evaluation of the State of the Art

An Introduction to the Kalman Filter

Design and Analysis of Modern Tracking Systems

Related Papers (5)

Social LSTM: Human Trajectory Prediction in Crowded Spaces

You'll never walk alone: Modeling social behavior for multi-target tracking

Activity forecasting

Histograms of oriented gradients for human detection

Long short-term memory