What future works have the authors mentioned in the paper "Planning-based prediction for pedestrians" ?

The authors have presented a novel approach for predicting future pedestrian trajectories using a soft-max version of goal-based planning. The authors additionally showed the usefulness of this approach for planning hindrance-sensitive routes using a novel incremental path planner. In future work, the authors plan to explicitly model interactions between people so that they can better predict movements in crowded environments.

What is the way to predict a set of observed trajectories?

For prescriptive MDP applications, the reward values for actions in the MDP are often engineered to produce appropriate behavior, however for their prediction purposes,we would like to find the reward values that best predict a set of observed trajectories, {ζ̃i}.

How do the authors smooth the probability of a given location to nearby cells?

The authors smooth this probability to nearby cells using the Manhattan distance (dist(a, b)) and also add probability (P0) for previously unvisited locations to avoid overfitting, yielding: P (dest x) ∝ P0 + ∑ goals g e−dist(x,g).

How can the authors improve the computational gain of this technique?

The authors can further improve the computational gain of this technique by using efficient replanners such as D* and its variants [4] in the inner loop.

How do the authors achieve intelligent adaptive robot behavior?

By re-running this iterative replanner every 0.25 seconds using updated predictions of pedestrian motion, the authors can achieve intelligent adaptive robot behavior that anticipates where a pedestrian is heading and maneuvers well in advance to implement efficient avoidance.

How do the authors plan to model interactions between people?

In future work, the authors plan to explicitly model interactions between people so that the authors can better predict movements in crowded environments.

How can the authors predict human behavior in crowded environments?

As the authors have shown, the feature-based cost function learned using this approach allows accurate generalization to changes in the environment.

(Open Access) Planning-based prediction for pedestrians (2009) | Brian D. Ziebart

Q: What are the contributions in "Planning-based prediction for pedestrians" ?

The authors present a novel approach for determining robot movements that efficiently accomplish the robot ’ s tasks while not hindering the movements of people within the environment. Their approach models the goal-directed trajectories of pedestrians using maximum entropy inverse optimal control. The authors employ the predictions of this model of pedestrian trajectories in a novel incremental planner and quantitatively show the improvement in hindrancesensitive robot trajectory planning provided by their approach.

Q: What are the main features that influence movement in many environments?

In many environments, the relevant features that influence movement change frequently – furniture is moved in indoor environments, the locations of parked vehicles are dynamic in urban environments, and weather conditions influence natural environments with muddy, icy, or dry conditions.

Q: What is the method for determining the optimal quantities?

(2)The value iteration algorithm produces these optimal values by alternately applying Equations 1 and 2 as update rules until the values converge.

Planning-based Prediction for Pedestrians

Brian D. Ziebart

Nathan Ratliff

Garratt Gallagher

Christoph Mertz

Kevin Peterson

J. Andrew Bagnell

Martial Hebert

Anind K. Dey

Siddhartha Srinivasa

School of Computer Science

Intel Research

Carnegie Mellon University Pittsburgh, PA

{bziebart, ndr, ggallagh, mertz, kp, dbagnell, hebert, anind}@cs.cmu.edu siddhartha.srinivasa@intel.com

Abstract— We present a novel approach for determining

robot movements that efﬁciently accomplish the robot’s tasks

while not hindering the movements of people within the en-

vironment. Our approach models the goal-directed trajectories

of pedestrians using maximum entropy inverse optimal control.

The advantage of this modeling approach is the generality of

its learned cost function to changes in the environment and

to entirely different environments. We employ the predictions

of this model of pedestrian trajectories in a novel incremental

planner and quantitatively show the improvement in hindrance-

sensitive robot trajectory planning provided by our approach.

I. INTRODUCTION

Determining appropriate robotic actions in environments

with moving people is a well-studied [15], [2], [5], but often

difﬁcult task due to the uncertainty of each person’s future

behavior. Robots should certainly never collide with people

[11], but avoiding collisions alone is often unsatisfactory

because the disruption of almost colliding can be burdensome

to people and sub-optimal for robots. Instead, robots should

predict the future locations of people and plan routes that

will avoid such hindrances (i.e., situations where the person’s

natural behavior is disrupted due to a robot’s proximity)

while still efﬁciently achieving the robot’s objectives. For

example, given the origins and target destinations of the robot

and person in Figure 1, the robot’s hindrance-minimizing

trajectory would take the longer way around the center

obstacle (a table), leaving a clear path for the pedestrian.

One common approach for predicting trajectories is to

project the prediction step of a tracking ﬁlter [9], [13], [10]

forward over time. For example, a Kalman ﬁlter’s [7] future

positions are predicted according to a Gaussian distribution

with growing uncertainty and, unfortunately, often high prob-

ability for physically impossible locations (e.g., behind walls,

within obstacles). Particle ﬁlters [16] can incorporate more

sophisticated constraints and non-Gaussian distributions, but

degrade into random walks of feasible motion over large

time horizons rather than purposeful, goal-based motion.

Closer to our research are approaches that directly model

the policy [6]. These approaches assume that previously

observed trajectories capture all purposeful behavior, and the

only uncertainty involves determining to which previously

observed class of trajectories the current behavior belongs.

Models based on mixtures of trajectories and conditioned

action distribution modeling (using hidden Markov models)

have been employed [17]. This approach often suffers from

over-ﬁtting to the particular training trajectories and context

of those trajectories. When changes to the environment

occur (e.g., rearrangement of the furniture), the model will

conﬁdently predict incorrect trajectories through obstacles.

Fig. 1. A hindrance-sensitive robot path planning problem in our exper-

imental environment containing a person (green square) in the upper right

with a previous trajectory (green line) and intended destination (green X)

near a doorway, and a robot (red square) near the secretary desk with its

intended destination (red X) near the person’s starting location. Hindrances

are likely if the person and robot both take the distance-minimizing path to

their intended destinations. Laser scanners are denoted with blue boxes.

Fig. 2. Images of the kitchen area (left), secretary desk area (center), and

lounge area (right) of our experimental environment.

We assume that people behave like planners – efﬁciently

moving to reach destinations. In traditional planning, given

a cost function mapping environment features to costs, the

optimal trajectory is easily obtained for any endpoints in

any environment described using those features. Our ap-

proach learns the cost function that best explains previously

observed trajectories. Unfortunately, traditional planning is

prescriptive rather than predictive – the sub-optimality typ-

ically present in observed data is inexplicable to a planner.

We employ the principle of maximum entropy to address the

lack of decision uncertainty using a technique we previously

developed called maximum entropy inverse optimal con-

trol (or inverse reinforcement learning) [18]. This approach

yields a soft-maximum version of Markov decision processes

(MDP) that accounts for decision uncertainty. As we shall

show, this soft-max MDP model supports efﬁcient algorithms

for learning the cost function that best explains previous

behavior, and for predicting a person’s future positions.

Importantly, the featured-based cost function that we em-

ploy enables generalization. Speciﬁcally, the cost function

is a linear combination of a given set of features computed

from the environment (e.g., obstacles and ﬁlters applied to

obstacles). Once trained, the cost function applies to any

conﬁguration of these features. Therefore if obstacles in the

environment move, the environment otherwise changes, or

we consider an entirely different environment, our model

generalizes to this new setting. We consider this improved

generalization to be a major beneﬁt of our approach over

previous techniques.

Predictions of pedestrian trajectories can be naturally

employed by a planner with time-dependent costs so that po-

tential hindrances are penalized. Unfortunately, the increased

dimensionality of the planning problem can be prohibitive.

Instead, we present a simple, incremental “constraint genera-

tion” planning approach that enables real-time performance.

This approach initially employs a cost map that ignores

the predictions of people’s future locations. It then itera-

tively plans the robot’s trajectory in the cost map, simulates

the person’s trajectory, and adds “cost” to the cost map

based on the probability of hindrance at each location. The

time-independent cost function that this procedure produces

accounts for the time-varying predictions, and ultimately

yields a high quality, hindrance-free robot trajectory, while

requiring much less computation than a time-based planner.

We evaluate the quality of our combined prediction and

planning system on the trajectories of people in a lab environ-

ment using the opposing objectives of maximizing the robot’s

efﬁciency in reaching its intended destination and minimizing

robot-person hindrances. An inherent trade-off between these

two criteria exists in planning appropriate behavior. We show

that for any chosen trade-off, our prediction model is better

for making decisions than an alternative approach.

II. MODELING PURPOSEFUL MOTION

Accurate predictions of the future positions of people

enable a robot to plan appropriate actions that avoid hin-

dering those people. We represent the sequence of actions

(diagonal and adjacent movements) that lead to a person’s

future position using a deterministic Markov decision process

(MDP) over a grid representing the environment. Unfortun-

tely, people do not move in a perfectly predictable manner,

and instead our robot must reason probabilistically about

their future locations. By maximizing the entropy of the

distribution of trajectories, H(P

) = −

P (ζ) log P (ζ)

subject to the constraint of matching the reward of the

person’s behavior in expectation [1], we obtain a distribution

over trajectories [18].

In this section, we present a new interpretation of the max-

imum entropy distribution over trajectories and algorithms

for obtaining it. This is framed as a softened version of the

value iteration algorithm, which is commonly employed to

ﬁnd optimal policies for MDPs. We ﬁrst review the Bellman

equations and value iteration. We next relax these equations,

obtaining a distribution over actions and trajectories. We then

employ Bayes’ Rule using this distribution to reason about

unknown intended destinations. Next, we compute expected

visitation counts, D

x,y

, across the environment, and ﬁnally

obtain time-based visitation counts, D

x,y,t

, that can be used

for hindrance-sensitive planning purposes.

A. Relaxing Maximum Value MDPs for Prediction

Consider a Markov decision process with deterministic

action outcomes for modeling path planning. This class

of MDPs consist of a state set, S, an action set, A, an

action transition function, T : S × A → S, and a reward

function, R : S × A → R. A trajectory, ζ, is a sequence

of states (grid cells) and actions (diagonal and adjacent

moves), {s

, a

} that satisﬁes the transition function (i.e.,

∀

t+1

∈ζ

T (s

, a

) = s

t+1

). Goal-based planning is mod-

eled in the MDP by assigning costs (i.e., negative rewards)

for every action except for a self-transitioning action in the

absorbing goal state that has a cost of 0. MDPs are solved

by ﬁnding the state and action values (i.e., future reward of

that action or state), V

∗

(s) and Q

∗

(s, a), for the maximum

future reward policy, π

∗

: S → A. The Bellman equations

deﬁne the optimal quantities:

∗

(s, a) = R(s, a) + V

∗

(T (s, a)) (1)

∗

(s) = max

∗

(s, a). (2)

The value iteration algorithm produces these optimal values

by alternately applying Equations 1 and 2 as update rules un-

til the values converge. The optimal policy is then: π

∗

(s) =

argmax

∗

(s, a). While useful for prescribing a set of

actions to take, this policy is not usually predictive because

observed trajectories are often not consistently optimal.

We employ a “softened” version of MDPs derived us-

ing the principle of maximum entropy that incorporates

trajectory uncertainty into our model. In this setting, tra-

jectories are probabilistically distributed according to their

values rather than having a single optimal trajectory for

the solution. We accomplish this by replacing the maximum

of the Bellman equations with a soft-maximum function,

softmax

f(x) = log

f(x)

≈

(s, a) = R(s, a) + V

≈

(T (s, a)) (3)

≈

(s) = softmax

≈

(s, a) (4)

In this case, the solution policy is probabilistic with dis-

tribution: π(a|s) = e

≈

(s,a)−V

≈

(s)

. The probability of a

trajectory, ζ, can be shown [18] to be distributed according

to P (ζ) ∝ e

(s,a)∈ζ

R(s,a)

. Trajectories with a very high

reward (low cost) are exponentially more preferable to low

reward (high cost) trajectories, and trajectories with equal

reward are equally probable. The magnitude of the rewards,

|R(s, a)|, is meaningful in this softmax setting and corre-

sponds to the certainty about trajectories. As |R(s, a)| →

∞, softmax

≈

(s, a) = max

∗

(s, a), and the distribu-

tion converges to only optimal trajectories. An analagous

O(L|S||A|) time value-iteration procedure is employed (with

appropriately chosen length L) to solve this softmax pol-

icy distribution – terminating when the value function has

reached an acceptable level of convergence. Note that this

softmax value distribution over trajectories is very different

than the softmax action selection distribution that has been

employed for reinforcement learning: P

(a|s) ∝ e

∗

(s,a)/τ

[14], [12].

B. Learning Unknown Cost Functions

For prescriptive MDP applications, the reward values

for actions in the MDP are often engineered to produce

appropriate behavior, however for our prediction purposes,

we would like to ﬁnd the reward values that best predict a set

of observed trajectories, {

}. We assume the availability of

a vector of feature values, f

s,a

, characterizing each possible

action. For our application, these features are obstacle loca-

tions and functions of obstacle locations (e.g., blurring and

ﬁltering of obstacles). We assume that the reward is linear

in these features, R(s, a) = θ

s,a

, with unknown weight

parameters, θ. We denote the ﬁrst state, s

, of trajectory

(

). The learning problem is then the maximization of the

observed trajectory’s probability, P (ζ|θ)

, or equivalently:

∗

= argmax









(s,a)∈

s,a





− V

≈

(

))





(5)

The gradient of the function in Equation 5 has an intuitive

interpretation as the difference between the feature counts of

observed trajectories and expected feature counts according

to the model:

i,(s,a)∈

s,a

− E

(ζ)

s,a

]. We employ

gradient-based optimization on this convex function to obtain

∗

[18]. We refer the reader to that work for a more detailed

explanation of the optimization procedure.

C. Destination Prior Distribution

Though our model is conditioned on a known destination

location, that destination location is not known at prediction

time. Our predictive model must reason about all possible

destinations to predict the future trajectory of a person. We

address this problem in a Bayesian way by ﬁrst obtaining a

prior distribution over destinations using previously observed

trajectories and the features of the environment.

In this work, we base our prior distribution on the goals

of previously observed trajectories (g). We smooth this

probability to nearby cells using the Manhattan distance

(dist(a, b)) and also add probability (P

) for previously un-

visited locations to avoid overﬁtting, yielding: P (dest x) ∝

goals g

−dist(x,g)

. When little or no previous data

is available for a particular environment, a feature-based

model of destinations with features expressing door, chair,

and appliance locations could be employed.

D. Efﬁcient Future Trajectory Prediction

In the prediction setting, the robot knows the person’s

partial trajectory from state A to current state B, ζ

A→B

. and

must infer the future trajectory of the person, P (ζ

B→C

), to

an unknown destination state, C, given all available informa-

tion. First we infer the posterior distribution of destinations

given the partial trajectory, P (dest C|ζ

A→B

), using Bayes’

Rule. For notational simplicity, we denote the softmax value

function of state X to destination state Y as V

≈

(X → Y )

and the reward of a policy as R(ζ) =

(s,a)∈ζ

R(s, a). The

posterior distribution is then:

P (dest C|ζ

A→B

) =

P (ζ

A→B

|dest C)P (dest C)

P (ζ

A→B

)

R(ζ

A→B

)+V

≈

(B→C)

≈

(A→C)

P (dest C)

R(ζ

A→B

)+V

≈

(B→D)

≈

(A→D)

P (dest D)

. (6)

We assume the ﬁnal state of the trajectory is the goal destination and

our probability distribution is conditioned on this goal destination.

The value functions, V

≈

(A → D) and V

≈

(B → D),

for each state D are required to compute this posterior

(Equation 6). The na

ıve approach is to execute O(|D|) runs

of softmax value iteration – one for each possible goal, D.

Fortunately there is a much more efﬁcient algorithm. In the

hard maximum case, this problem is solved efﬁciently by

modifying the Bellman equations to operate backwards, so

that instead of V (S) representing the future value of state S,

it is the maximum value obtained by a trajectory reaching

state S. Initializing ∀

s6=A

V (s) = −∞ and V (A) = 0, the

following equations deﬁne V (A → D) for all D.

Q(s, a) = R(s, a) + V (s) (7)

V (s) = max

,a):T (s

,a)=s

Q(s

, a) (8)

For the soft-max reward case, the max is replaced with

softmax (Equation 8) and the value functions, V

≈

(A → D),

are obtained with a value-iteration algorithm. Thus, with two

applications of value-iteration to produce V

≈

(A → D) and

≈

(B → D), and O(|D|) time to process the results, the

posterior distribution over destinations is obtained.

We now use the destination posterior to compute the

conditional probability of any continuation path, ζ

B→C

P (ζ

B→C

|ζ

A→B

) (9)

P (ζ

B→C

|ζ

A→B

, dest D)P (dest D|ζ

A→B

)

= P (ζ

B→C

|dest C)P (dest C|ζ

A→B

)

= e

R(ζ

B→C

)−V

≈

(B→C)

P (dest C|ζ

A→B

This can be readily computed using the previously computed

posterior destination distribution and V

≈

(B → C) is com-

puted as part of generating that posterior distribution.

The expected occupancies of different states, D

, are

obtained by marginalizing over all paths containing D

For this class of paths, denoted Ξ

B→x→C

, the path can be

divided

into a path from B to x and a path from x to C.

ζ∈Ξ

B→x→C

P (ζ

B→C

|dest C)P (dest C|ζ

A→B

)

P (dest C|ζ

A→B

)

∈ Ξ

B→x

∈ Ξ

x→C

R(ζ

)+R(ζ

)−V

≈

(B→C)

∈Ξ

A→x

R(ζ

)

(10)

∈Ξ

x→C

R(ζ

)+log P (dest C|ζ

A→B

)−V

≈

(B→C)

The ﬁrst summation of Equation 10 equates to

≈

(A→x)

, which is easily obtained from previously com-

puted value functions. We compute the second dou-

ble summation by adding a ﬁnal state reward of

(log P (dest C|ξ

A→B

) − V

≈

(B → C)) and performing soft

value iteration with those modiﬁed rewards. Thus with one

additional application of the soft value iteration algorithm

and combining the results (constant time with respect to the

number of goals), we obtain state expected visitation counts.

The solution also holds for paths with multiple occurances of a state.

Algorithm 1 Incorporating predictive pedestrian models via

predictive planning

1: procedure PREDICTIVEPLANNING(σ > 0, α > 0,

s,t

}, D

thresh

)

2: Initialize cost map to prior navigational costs c

(s).

3: for t = 0, . . . , T do

4: Plan under the current cost map.

5: Simulate the plan forward to ﬁnd points of proba-

ble interference with the pedestrian {(s

)}

i=1

where D

s,t

> D

thresh

6: If K = 0 then break.

7: Add cost to those points

8: c

t+1

(s) = c

(s) + α

i=1

−

2σ

ks−s

9: end for

10: return The plan through the ﬁnal cost map.

11: end procedure

E. Temporal Predictions

To plan appropriately requires predictions of where peo-

ple will be at different points in time. More formally,

we need predictions of expected future occupancy of each

location during the time windows surrounding ﬁxed intervals:

τ, 2τ, ..., T τ . We denote these quantities as D

s,t

. In theory,

time can be added to the state space of a Markov decision

process and explicitly modeled. In practice, however, this

expansion of the state space signiﬁcantly increases the time

complexity of inference, making real-time applications based

on the time-based model impractical. We instead consider an

alternative approach that is much more tractable.

We assume that a person’s movement will “consume”

some cost over a time window t according to the normal

distribution N(tC

, σ

+ tσ

), where C

, σ

, and σ

are

learned parameters. Certaintly

s,t

= D

, so we simply

divide the expected visitation counts among the time intervals

according to this probability distribution. We use the cost

of the optimal path to each state, Q

∗

(s), to estimate the

cost incurred in reaching it. The resulting time-dependent

occupancy counts are then:

s,it

∝ D

−(C

t−Q

∗

(s))

2(σ

+tσ

)

. (11)

These values are computed using a single execution of

Dijkstra’s algorithm [3] in O(|S| log |S|) time to compute

∗

(.) and then O(|S|T ) time for additional calculation.

III. PLANNING WITH PEDESTRIAN PREDICTIONS

Ideally, to account for predictive models of pedestrian

behavior, we should increase the dimensionality of the plan-

ning problem by augmenting the state of the planner to

account for time-varying costs. Unfortunately, the computa-

tional complexity of combinatorial planning is exponential

in the dimension of the planning space, and the added

computational burden of this solution will be prohibitive for

many real-time applications.

We therefore propose a novel technique for integrating our

time-varying predictions into the robot’s planner. Algorithm

1 details this procedure; it essentially iteratively shapes

a time-independent navigational cost function to remove

known points of hindrance. At each iteration, we run the

time-independent planner under the current cost map and

simulate forward the resulting plan in order to predict points

at which the robot will likely interfere with the pedestrian. By

then adding cost to those regions of the map we can ensure

that subsequent plans will not interfere at those locations. We

can further improve the computational gain of this technique

by using efﬁcient replanners such as D* and its variants [4] in

the inner loop. While this technique, as it reasons only about

stationary costs, cannot guarantee the optimal plan given the

time-varying costs, we demonstrate that it produces good

robot behavior in practice that efﬁciently accounts for the

predicted motion of the pedestrian.

By re-running this iterative replanner every 0.25 seconds

using updated predictions of pedestrian motion, we can

achieve intelligent adaptive robot behavior that anticipates

where a pedestrian is heading and maneuvers well in advance

to implement efﬁcient avoidance. The accompanying movie

demonstrates the behavior that emerges from our predictive

planner in select situations. In practice, we use the ﬁnal

cost-to-go values of the iteratively constructed cost map to

implement a policy that chooses a good action from a pre-

deﬁned collection of actions. When a plan with sufﬁciently

low probability of pedestrian hindrance cannot be found, the

robot’s speed is varied. Additionally, when the robot is too

close to a pedestrian, all actions that take the robot within a

small radius of the human are removed to avoid potential

collisions. Section IV-F presents quantitative experiments

demonstrating the properties of this policy.

IV. EXPERIMENTAL EVALUATION

We now present experiments demonstrating the capabili-

ties of our prediction model and its usefulness for planning

hindrance-sensitive robot trajectories.

A. Data Collection

We collected over one month’s worth of data in a lab envi-

ronment. The environment has three major areas (Figure 1): a

kitchen area with a sink, refrigerator, microwave, and coffee

maker; a secretary desk; and a lounge area. We installed four

laser range ﬁnders in ﬁxed locations around the lab, as shown

in Figure 1, and ran a pedestrian tracking algorithm [8].

Trajectories were segmented based on signiﬁcant stopping

time in any location.

Fig. 3. Collected trajectory dataset.

From the collected data, we use a subset of 166 tra-

jectories through our experimental environment to evaluate

our approach. This dataset is shown in Figure 3 after post-

processing and being ﬁt to a 490 by 321 cell grid (each cell

represented as a single pixel). We employ 50% of this data

as a training set for estimating the parameters of our model

and use the remainder for evaluative purposes.

B. Learning Feature-Based Cost Functions

We learn a 6-parameter cost function over simple features

of the environment, which we argue are easily transferable to

other environments. The ﬁrst feature is a constant feature for

every grid cell in the environment. The remaining functions

are an indicator function for whether an obstacle exists in a

particular grid cell, and four “blurs” of obstacle occupancies,

which are shown in Figure 4.

Fig. 4. Four obstacle-blur features for our cost function. Feature values

range from low weight (dark blue) to high weight (dark red).

We then learn the weights for these features that best

explain the demonstrated data. The resulting cost function

for the environment is shown in Figure 5. Obstacles in the

cost function have very high cost, and free space has a low

cost that increases near obstacles.

Fig. 5. Left: The learned cost function in the environment. Right: The

prior distribution over destinations learned from the training set.

The prior distribution over destinations is obtained from

the set of endpoints in the training set, and the temporal

Gaussian parameters are also learned using the training set.

C. Stochastic Modeling Experiment

We ﬁrst consider two examples from our dataset (Figure

6) that demonstrate the need for uncertainty-based modeling.

Fig. 6. Two trajectory examples (blue) and log occupancy predictions (red).

Both trajectories travel around the table in the center of

the environment. However, in the ﬁrst example (left), the

person takes the lower pathway around the table, and in the

second example (right), the person takes the upper pathway

despite that the lower pathway around the table has a lower

cost in the learned cost function. In both cases, the path

taken is not the shortest path through the open space that

one would obtain using an optimal planner. Our uncertainty-

based planning model handles these two examples appropri-

ately, while a planner would choose one pathway or the other

around the table and, even after smoothing the resulting path

into a probability distribution, tend to get a large fraction

of its predictions wrong when the person takes the “other”

approximately equally desirable pathway.

D. Dynamic Feature Adaptation Experiment

In many environments, the relevant features that inﬂuence

movement change frequently – furniture is moved in indoor

environments, the locations of parked vehicles are dynamic

in urban environments, and weather conditions inﬂuence

natural environments with muddy, icy, or dry conditions. We

demonstrate qualitatively that our model of motion is robust

to these feature changes.

The left frames of Figure 7 show the environment and

the path prediction of a person moving around the table at

two different points in time. At the second point of time

(bottom left), the probability of the trajectory leading to the

kitchen area or the left hallway is extremely small. In the

right frames of Figure 7, an obstacle has been introduced

that blocks the direct pathway through the kitchen area. In

this case, the trajectory around the table (bottom right) still

has a very high probability of leading to either the kitchen

area or the left hallway. As this example shows, our approach

is robust to changes in the environment such as this one.

Fig. 7. Our experimental environment with (right column) and without

(left column) an added obstacle (gray) between the kitchen and center

table. Predictions of future visitation expectations given a person’s trajectory

(white line) in both settings for two different trajectories. Frequencies range

from red (high log expectation) to dark blue (low log expectation).

E. Comparative Evaluation

We now compare our model’s ability to predict the future

path of a person with a previous approach for modeling

goal-directed trajectories – the variable-length Markov model

(VLMM) [6]. The VLMM estimates the probability of a

person’s next cell transition conditioned on the person’s

history of cells visited in the past. It is variable length

because it employs a long history when relevant training data

is abundant, and a short history otherwise.

The results of our experimental evaluation are shown in

Figure 8. We ﬁrst note that for the training set (denoted

train), that the trajectory log probability of the VLMM is

signiﬁcantly better than the plan-based model. However, for

Planning-based prediction for pedestrians

Figures

Citations

Social LSTM: Human Trajectory Prediction in Crowded Spaces

Activity forecasting

CHOMP: Covariant Hamiltonian optimization for motion planning

Human-aware robot navigation: A survey

Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes

References

Reinforcement Learning: An Introduction

A note on two problems in connexion with graphs

Introduction to Reinforcement Learning

New Results in Linear Filtering and Prediction Theory

Apprenticeship learning via inverse reinforcement learning

Related Papers (5)

Maximum entropy inverse reinforcement learning

Social Force Model for Pedestrian Dynamics

Social LSTM: Human Trajectory Prediction in Crowded Spaces

Apprenticeship learning via inverse reinforcement learning

You'll never walk alone: Modeling social behavior for multi-target tracking

Frequently Asked Questions (15)

Q1. What are the contributions in "Planning-based prediction for pedestrians" ?

Q2. What future works have the authors mentioned in the paper "Planning-based prediction for pedestrians" ?

Q3. What are the main features that influence movement in many environments?

Q4. What is the entropy of the distribution of trajectories?

Q5. What is the way to predict a set of observed trajectories?

Q6. What is the method for determining the optimal quantities?

Q7. How do the authors smooth the probability of a given location to nearby cells?

Q8. How can the authors improve the computational gain of this technique?

Q9. How do the authors achieve intelligent adaptive robot behavior?

Q10. What is the way to determine the optimal trajectories?

Q11. What is the procedure for integrating predictive pedestrian models?

Q12. How do the authors plan to model interactions between people?

Q13. What is the cost-to-go value of the iteratively constructed cost map?

Q14. How can the authors predict human behavior in crowded environments?

Q15. How do the authors replace the maximum of the Bellman equations with a softmax?