scispace - formally typeset
Open AccessProceedings ArticleDOI

Planning-based prediction for pedestrians

TLDR
A novel approach for determining robot movements that efficiently accomplish the robot's tasks while not hindering the movements of people within the environment is presented and improvement in hindrance-sensitive robot trajectory planning is quantitatively shown.
Abstract
We present a novel approach for determining robot movements that efficiently accomplish the robot's tasks while not hindering the movements of people within the environment. Our approach models the goal-directed trajectories of pedestrians using maximum entropy inverse optimal control. The advantage of this modeling approach is the generality of its learned cost function to changes in the environment and to entirely different environments. We employ the predictions of this model of pedestrian trajectories in a novel incremental planner and quantitatively show the improvement in hindrance-sensitive robot trajectory planning provided by our approach.

read more

Content maybe subject to copyright    Report

Planning-based Prediction for Pedestrians
Brian D. Ziebart
1
Nathan Ratliff
1
Garratt Gallagher
1
Christoph Mertz
1
Kevin Peterson
1
J. Andrew Bagnell
1
Martial Hebert
1
Anind K. Dey
1
Siddhartha Srinivasa
2
1
School of Computer Science
2
Intel Research
Carnegie Mellon University Pittsburgh, PA
{bziebart, ndr, ggallagh, mertz, kp, dbagnell, hebert, anind}@cs.cmu.edu siddhartha.srinivasa@intel.com
Abstract We present a novel approach for determining
robot movements that efficiently accomplish the robot’s tasks
while not hindering the movements of people within the en-
vironment. Our approach models the goal-directed trajectories
of pedestrians using maximum entropy inverse optimal control.
The advantage of this modeling approach is the generality of
its learned cost function to changes in the environment and
to entirely different environments. We employ the predictions
of this model of pedestrian trajectories in a novel incremental
planner and quantitatively show the improvement in hindrance-
sensitive robot trajectory planning provided by our approach.
I. INTRODUCTION
Determining appropriate robotic actions in environments
with moving people is a well-studied [15], [2], [5], but often
difficult task due to the uncertainty of each person’s future
behavior. Robots should certainly never collide with people
[11], but avoiding collisions alone is often unsatisfactory
because the disruption of almost colliding can be burdensome
to people and sub-optimal for robots. Instead, robots should
predict the future locations of people and plan routes that
will avoid such hindrances (i.e., situations where the person’s
natural behavior is disrupted due to a robot’s proximity)
while still efficiently achieving the robot’s objectives. For
example, given the origins and target destinations of the robot
and person in Figure 1, the robot’s hindrance-minimizing
trajectory would take the longer way around the center
obstacle (a table), leaving a clear path for the pedestrian.
One common approach for predicting trajectories is to
project the prediction step of a tracking filter [9], [13], [10]
forward over time. For example, a Kalman filter’s [7] future
positions are predicted according to a Gaussian distribution
with growing uncertainty and, unfortunately, often high prob-
ability for physically impossible locations (e.g., behind walls,
within obstacles). Particle filters [16] can incorporate more
sophisticated constraints and non-Gaussian distributions, but
degrade into random walks of feasible motion over large
time horizons rather than purposeful, goal-based motion.
Closer to our research are approaches that directly model
the policy [6]. These approaches assume that previously
observed trajectories capture all purposeful behavior, and the
only uncertainty involves determining to which previously
observed class of trajectories the current behavior belongs.
Models based on mixtures of trajectories and conditioned
action distribution modeling (using hidden Markov models)
have been employed [17]. This approach often suffers from
over-fitting to the particular training trajectories and context
of those trajectories. When changes to the environment
occur (e.g., rearrangement of the furniture), the model will
confidently predict incorrect trajectories through obstacles.
Fig. 1. A hindrance-sensitive robot path planning problem in our exper-
imental environment containing a person (green square) in the upper right
with a previous trajectory (green line) and intended destination (green X)
near a doorway, and a robot (red square) near the secretary desk with its
intended destination (red X) near the person’s starting location. Hindrances
are likely if the person and robot both take the distance-minimizing path to
their intended destinations. Laser scanners are denoted with blue boxes.
Fig. 2. Images of the kitchen area (left), secretary desk area (center), and
lounge area (right) of our experimental environment.
We assume that people behave like planners efficiently
moving to reach destinations. In traditional planning, given
a cost function mapping environment features to costs, the
optimal trajectory is easily obtained for any endpoints in
any environment described using those features. Our ap-
proach learns the cost function that best explains previously
observed trajectories. Unfortunately, traditional planning is
prescriptive rather than predictive the sub-optimality typ-
ically present in observed data is inexplicable to a planner.
We employ the principle of maximum entropy to address the
lack of decision uncertainty using a technique we previously
developed called maximum entropy inverse optimal con-
trol (or inverse reinforcement learning) [18]. This approach
yields a soft-maximum version of Markov decision processes
(MDP) that accounts for decision uncertainty. As we shall
show, this soft-max MDP model supports efficient algorithms
for learning the cost function that best explains previous
behavior, and for predicting a person’s future positions.

Importantly, the featured-based cost function that we em-
ploy enables generalization. Specifically, the cost function
is a linear combination of a given set of features computed
from the environment (e.g., obstacles and filters applied to
obstacles). Once trained, the cost function applies to any
configuration of these features. Therefore if obstacles in the
environment move, the environment otherwise changes, or
we consider an entirely different environment, our model
generalizes to this new setting. We consider this improved
generalization to be a major benefit of our approach over
previous techniques.
Predictions of pedestrian trajectories can be naturally
employed by a planner with time-dependent costs so that po-
tential hindrances are penalized. Unfortunately, the increased
dimensionality of the planning problem can be prohibitive.
Instead, we present a simple, incremental “constraint genera-
tion” planning approach that enables real-time performance.
This approach initially employs a cost map that ignores
the predictions of people’s future locations. It then itera-
tively plans the robot’s trajectory in the cost map, simulates
the person’s trajectory, and adds “cost” to the cost map
based on the probability of hindrance at each location. The
time-independent cost function that this procedure produces
accounts for the time-varying predictions, and ultimately
yields a high quality, hindrance-free robot trajectory, while
requiring much less computation than a time-based planner.
We evaluate the quality of our combined prediction and
planning system on the trajectories of people in a lab environ-
ment using the opposing objectives of maximizing the robot’s
efficiency in reaching its intended destination and minimizing
robot-person hindrances. An inherent trade-off between these
two criteria exists in planning appropriate behavior. We show
that for any chosen trade-off, our prediction model is better
for making decisions than an alternative approach.
II. MODELING PURPOSEFUL MOTION
Accurate predictions of the future positions of people
enable a robot to plan appropriate actions that avoid hin-
dering those people. We represent the sequence of actions
(diagonal and adjacent movements) that lead to a person’s
future position using a deterministic Markov decision process
(MDP) over a grid representing the environment. Unfortun-
tely, people do not move in a perfectly predictable manner,
and instead our robot must reason probabilistically about
their future locations. By maximizing the entropy of the
distribution of trajectories, H(P
ζ
) =
P
ζ
P (ζ) log P (ζ)
subject to the constraint of matching the reward of the
person’s behavior in expectation [1], we obtain a distribution
over trajectories [18].
In this section, we present a new interpretation of the max-
imum entropy distribution over trajectories and algorithms
for obtaining it. This is framed as a softened version of the
value iteration algorithm, which is commonly employed to
find optimal policies for MDPs. We first review the Bellman
equations and value iteration. We next relax these equations,
obtaining a distribution over actions and trajectories. We then
employ Bayes’ Rule using this distribution to reason about
unknown intended destinations. Next, we compute expected
visitation counts, D
x,y
, across the environment, and finally
obtain time-based visitation counts, D
x,y,t
, that can be used
for hindrance-sensitive planning purposes.
A. Relaxing Maximum Value MDPs for Prediction
Consider a Markov decision process with deterministic
action outcomes for modeling path planning. This class
of MDPs consist of a state set, S, an action set, A, an
action transition function, T : S × A S, and a reward
function, R : S × A R. A trajectory, ζ, is a sequence
of states (grid cells) and actions (diagonal and adjacent
moves), {s
0
, a
0
} that satisfies the transition function (i.e.,
s
t+1
,s
t
,a
t
ζ
T (s
t
, a
t
) = s
t+1
). Goal-based planning is mod-
eled in the MDP by assigning costs (i.e., negative rewards)
for every action except for a self-transitioning action in the
absorbing goal state that has a cost of 0. MDPs are solved
by finding the state and action values (i.e., future reward of
that action or state), V
(s) and Q
(s, a), for the maximum
future reward policy, π
: S A. The Bellman equations
define the optimal quantities:
Q
(s, a) = R(s, a) + V
(T (s, a)) (1)
V
(s) = max
a
Q
(s, a). (2)
The value iteration algorithm produces these optimal values
by alternately applying Equations 1 and 2 as update rules un-
til the values converge. The optimal policy is then: π
(s) =
argmax
a
Q
(s, a). While useful for prescribing a set of
actions to take, this policy is not usually predictive because
observed trajectories are often not consistently optimal.
We employ a “softened” version of MDPs derived us-
ing the principle of maximum entropy that incorporates
trajectory uncertainty into our model. In this setting, tra-
jectories are probabilistically distributed according to their
values rather than having a single optimal trajectory for
the solution. We accomplish this by replacing the maximum
of the Bellman equations with a soft-maximum function,
softmax
x
f(x) = log
P
x
e
f(x)
.
Q
(s, a) = R(s, a) + V
(T (s, a)) (3)
V
(s) = softmax
a
Q
(s, a) (4)
In this case, the solution policy is probabilistic with dis-
tribution: π(a|s) = e
Q
(s,a)V
(s)
. The probability of a
trajectory, ζ, can be shown [18] to be distributed according
to P (ζ) e
P
(s,a)ζ
R(s,a)
. Trajectories with a very high
reward (low cost) are exponentially more preferable to low
reward (high cost) trajectories, and trajectories with equal
reward are equally probable. The magnitude of the rewards,
|R(s, a)|, is meaningful in this softmax setting and corre-
sponds to the certainty about trajectories. As |R(s, a)|
, softmax
a
Q
(s, a) = max
a
Q
(s, a), and the distribu-
tion converges to only optimal trajectories. An analagous
O(L|S||A|) time value-iteration procedure is employed (with
appropriately chosen length L) to solve this softmax pol-
icy distribution terminating when the value function has
reached an acceptable level of convergence. Note that this
softmax value distribution over trajectories is very different
than the softmax action selection distribution that has been
employed for reinforcement learning: P
τ
(a|s) e
Q
(s,a)
[14], [12].
B. Learning Unknown Cost Functions
For prescriptive MDP applications, the reward values
for actions in the MDP are often engineered to produce
appropriate behavior, however for our prediction purposes,

we would like to find the reward values that best predict a set
of observed trajectories, {
˜
ζ
i
}. We assume the availability of
a vector of feature values, f
s,a
, characterizing each possible
action. For our application, these features are obstacle loca-
tions and functions of obstacle locations (e.g., blurring and
filtering of obstacles). We assume that the reward is linear
in these features, R(s, a) = θ
>
f
s,a
, with unknown weight
parameters, θ. We denote the first state, s
0
, of trajectory
˜
ζ
i
as
s
0
(
˜
ζ
i
). The learning problem is then the maximization of the
observed trajectory’s probability, P (ζ|θ)
1
, or equivalently:
θ
= argmax
θ
X
i
X
(s,a)
˜
ζ
i
θ
>
f
s,a
V
(s
0
(
˜
ζ
i
))
.
(5)
The gradient of the function in Equation 5 has an intuitive
interpretation as the difference between the feature counts of
observed trajectories and expected feature counts according
to the model:
P
i,(s,a)
˜
ζ
i
f
s,a
E
P
θ
(ζ)
[f
s,a
]. We employ
gradient-based optimization on this convex function to obtain
θ
[18]. We refer the reader to that work for a more detailed
explanation of the optimization procedure.
C. Destination Prior Distribution
Though our model is conditioned on a known destination
location, that destination location is not known at prediction
time. Our predictive model must reason about all possible
destinations to predict the future trajectory of a person. We
address this problem in a Bayesian way by first obtaining a
prior distribution over destinations using previously observed
trajectories and the features of the environment.
In this work, we base our prior distribution on the goals
of previously observed trajectories (g). We smooth this
probability to nearby cells using the Manhattan distance
(dist(a, b)) and also add probability (P
0
) for previously un-
visited locations to avoid overfitting, yielding: P (dest x)
P
0
+
P
goals g
e
dist(x,g)
. When little or no previous data
is available for a particular environment, a feature-based
model of destinations with features expressing door, chair,
and appliance locations could be employed.
D. Efficient Future Trajectory Prediction
In the prediction setting, the robot knows the person’s
partial trajectory from state A to current state B, ζ
AB
. and
must infer the future trajectory of the person, P (ζ
BC
), to
an unknown destination state, C, given all available informa-
tion. First we infer the posterior distribution of destinations
given the partial trajectory, P (dest C|ζ
AB
), using Bayes’
Rule. For notational simplicity, we denote the softmax value
function of state X to destination state Y as V
(X Y )
and the reward of a policy as R(ζ) =
P
(s,a)ζ
R(s, a). The
posterior distribution is then:
P (dest C|ζ
AB
) =
P (ζ
AB
|dest C)P (dest C)
P (ζ
AB
)
=
e
R(ζ
AB
)+V
(BC)
e
V
(AC)
P (dest C)
P
D
e
R(ζ
AB
)+V
(BD)
e
V
(AD)
P (dest D)
. (6)
1
We assume the final state of the trajectory is the goal destination and
our probability distribution is conditioned on this goal destination.
The value functions, V
(A D) and V
(B D),
for each state D are required to compute this posterior
(Equation 6). The na
¨
ıve approach is to execute O(|D|) runs
of softmax value iteration one for each possible goal, D.
Fortunately there is a much more efficient algorithm. In the
hard maximum case, this problem is solved efficiently by
modifying the Bellman equations to operate backwards, so
that instead of V (S) representing the future value of state S,
it is the maximum value obtained by a trajectory reaching
state S. Initializing
s6=A
V (s) = −∞ and V (A) = 0, the
following equations define V (A D) for all D.
Q(s, a) = R(s, a) + V (s) (7)
V (s) = max
(s
0
,a):T (s
0
,a)=s
Q(s
0
, a) (8)
For the soft-max reward case, the max is replaced with
softmax (Equation 8) and the value functions, V
(A D),
are obtained with a value-iteration algorithm. Thus, with two
applications of value-iteration to produce V
(A D) and
V
(B D), and O(|D|) time to process the results, the
posterior distribution over destinations is obtained.
We now use the destination posterior to compute the
conditional probability of any continuation path, ζ
BC
:
P (ζ
BC
|ζ
AB
) (9)
=
X
D
P (ζ
BC
|ζ
AB
, dest D)P (dest D|ζ
AB
)
= P (ζ
BC
|dest C)P (dest C|ζ
AB
)
= e
R(ζ
BC
)V
(BC)
P (dest C|ζ
AB
).
This can be readily computed using the previously computed
posterior destination distribution and V
(B C) is com-
puted as part of generating that posterior distribution.
The expected occupancies of different states, D
x
, are
obtained by marginalizing over all paths containing D
x
.
For this class of paths, denoted Ξ
BxC
, the path can be
divided
2
into a path from B to x and a path from x to C.
D
s
=
X
C
X
ζΞ
BxC
P (ζ
BC
|dest C)P (dest C|ζ
AB
)
=
X
C
P (dest C|ζ
AB
)
X
ζ
1
Ξ
Bx
,
ζ
2
Ξ
xC
e
R(ζ
1
)+R(ζ
2
)V
(BC)
=
0
@
X
ζ
1
Ξ
Ax
e
R(ζ
1
)
1
A
(10)
0
@
X
C
X
ζ
2
Ξ
xC
e
R(ζ
2
)+log P (dest C|ζ
AB
)V
(BC)
1
A
The first summation of Equation 10 equates to
e
V
(Ax)
, which is easily obtained from previously com-
puted value functions. We compute the second dou-
ble summation by adding a final state reward of
(log P (dest C|ξ
AB
) V
(B C)) and performing soft
value iteration with those modified rewards. Thus with one
additional application of the soft value iteration algorithm
and combining the results (constant time with respect to the
number of goals), we obtain state expected visitation counts.
2
The solution also holds for paths with multiple occurances of a state.

Algorithm 1 Incorporating predictive pedestrian models via
predictive planning
1: procedure PREDICTIVEPLANNING(σ > 0, α > 0,
{D
s,t
}, D
thresh
)
2: Initialize cost map to prior navigational costs c
0
(s).
3: for t = 0, . . . , T do
4: Plan under the current cost map.
5: Simulate the plan forward to find points of proba-
ble interference with the pedestrian {(s
i
)}
K
t
i=1
where D
s,t
> D
thresh
.
6: If K = 0 then break.
7: Add cost to those points
8: c
t+1
(s) = c
t
(s) + α
P
K
t
i=1
e
1
2σ
2
kss
i
k
2
.
9: end for
10: return The plan through the final cost map.
11: end procedure
E. Temporal Predictions
To plan appropriately requires predictions of where peo-
ple will be at different points in time. More formally,
we need predictions of expected future occupancy of each
location during the time windows surrounding fixed intervals:
τ, 2τ, ..., T τ . We denote these quantities as D
s,t
. In theory,
time can be added to the state space of a Markov decision
process and explicitly modeled. In practice, however, this
expansion of the state space significantly increases the time
complexity of inference, making real-time applications based
on the time-based model impractical. We instead consider an
alternative approach that is much more tractable.
We assume that a person’s movement will “consume”
some cost over a time window t according to the normal
distribution N(tC
0
, σ
2
0
+
2
1
), where C
0
, σ
2
0
, and σ
2
1
are
learned parameters. Certaintly
P
t
D
s,t
= D
s
, so we simply
divide the expected visitation counts among the time intervals
according to this probability distribution. We use the cost
of the optimal path to each state, Q
(s), to estimate the
cost incurred in reaching it. The resulting time-dependent
occupancy counts are then:
D
s,it
D
s
e
(C
0
tQ
(s))
2
2(σ
2
0
+
2
1
)
. (11)
These values are computed using a single execution of
Dijkstra’s algorithm [3] in O(|S| log |S|) time to compute
Q
(.) and then O(|S|T ) time for additional calculation.
III. PLANNING WITH PEDESTRIAN PREDICTIONS
Ideally, to account for predictive models of pedestrian
behavior, we should increase the dimensionality of the plan-
ning problem by augmenting the state of the planner to
account for time-varying costs. Unfortunately, the computa-
tional complexity of combinatorial planning is exponential
in the dimension of the planning space, and the added
computational burden of this solution will be prohibitive for
many real-time applications.
We therefore propose a novel technique for integrating our
time-varying predictions into the robot’s planner. Algorithm
1 details this procedure; it essentially iteratively shapes
a time-independent navigational cost function to remove
known points of hindrance. At each iteration, we run the
time-independent planner under the current cost map and
simulate forward the resulting plan in order to predict points
at which the robot will likely interfere with the pedestrian. By
then adding cost to those regions of the map we can ensure
that subsequent plans will not interfere at those locations. We
can further improve the computational gain of this technique
by using efficient replanners such as D* and its variants [4] in
the inner loop. While this technique, as it reasons only about
stationary costs, cannot guarantee the optimal plan given the
time-varying costs, we demonstrate that it produces good
robot behavior in practice that efficiently accounts for the
predicted motion of the pedestrian.
By re-running this iterative replanner every 0.25 seconds
using updated predictions of pedestrian motion, we can
achieve intelligent adaptive robot behavior that anticipates
where a pedestrian is heading and maneuvers well in advance
to implement efficient avoidance. The accompanying movie
demonstrates the behavior that emerges from our predictive
planner in select situations. In practice, we use the final
cost-to-go values of the iteratively constructed cost map to
implement a policy that chooses a good action from a pre-
defined collection of actions. When a plan with sufficiently
low probability of pedestrian hindrance cannot be found, the
robot’s speed is varied. Additionally, when the robot is too
close to a pedestrian, all actions that take the robot within a
small radius of the human are removed to avoid potential
collisions. Section IV-F presents quantitative experiments
demonstrating the properties of this policy.
IV. EXPERIMENTAL EVALUATION
We now present experiments demonstrating the capabili-
ties of our prediction model and its usefulness for planning
hindrance-sensitive robot trajectories.
A. Data Collection
We collected over one month’s worth of data in a lab envi-
ronment. The environment has three major areas (Figure 1): a
kitchen area with a sink, refrigerator, microwave, and coffee
maker; a secretary desk; and a lounge area. We installed four
laser range finders in fixed locations around the lab, as shown
in Figure 1, and ran a pedestrian tracking algorithm [8].
Trajectories were segmented based on significant stopping
time in any location.
Fig. 3. Collected trajectory dataset.
From the collected data, we use a subset of 166 tra-
jectories through our experimental environment to evaluate
our approach. This dataset is shown in Figure 3 after post-
processing and being fit to a 490 by 321 cell grid (each cell
represented as a single pixel). We employ 50% of this data

as a training set for estimating the parameters of our model
and use the remainder for evaluative purposes.
B. Learning Feature-Based Cost Functions
We learn a 6-parameter cost function over simple features
of the environment, which we argue are easily transferable to
other environments. The first feature is a constant feature for
every grid cell in the environment. The remaining functions
are an indicator function for whether an obstacle exists in a
particular grid cell, and four “blurs” of obstacle occupancies,
which are shown in Figure 4.
Fig. 4. Four obstacle-blur features for our cost function. Feature values
range from low weight (dark blue) to high weight (dark red).
We then learn the weights for these features that best
explain the demonstrated data. The resulting cost function
for the environment is shown in Figure 5. Obstacles in the
cost function have very high cost, and free space has a low
cost that increases near obstacles.
Fig. 5. Left: The learned cost function in the environment. Right: The
prior distribution over destinations learned from the training set.
The prior distribution over destinations is obtained from
the set of endpoints in the training set, and the temporal
Gaussian parameters are also learned using the training set.
C. Stochastic Modeling Experiment
We first consider two examples from our dataset (Figure
6) that demonstrate the need for uncertainty-based modeling.
Fig. 6. Two trajectory examples (blue) and log occupancy predictions (red).
Both trajectories travel around the table in the center of
the environment. However, in the first example (left), the
person takes the lower pathway around the table, and in the
second example (right), the person takes the upper pathway
despite that the lower pathway around the table has a lower
cost in the learned cost function. In both cases, the path
taken is not the shortest path through the open space that
one would obtain using an optimal planner. Our uncertainty-
based planning model handles these two examples appropri-
ately, while a planner would choose one pathway or the other
around the table and, even after smoothing the resulting path
into a probability distribution, tend to get a large fraction
of its predictions wrong when the person takes the “other”
approximately equally desirable pathway.
D. Dynamic Feature Adaptation Experiment
In many environments, the relevant features that influence
movement change frequently furniture is moved in indoor
environments, the locations of parked vehicles are dynamic
in urban environments, and weather conditions influence
natural environments with muddy, icy, or dry conditions. We
demonstrate qualitatively that our model of motion is robust
to these feature changes.
The left frames of Figure 7 show the environment and
the path prediction of a person moving around the table at
two different points in time. At the second point of time
(bottom left), the probability of the trajectory leading to the
kitchen area or the left hallway is extremely small. In the
right frames of Figure 7, an obstacle has been introduced
that blocks the direct pathway through the kitchen area. In
this case, the trajectory around the table (bottom right) still
has a very high probability of leading to either the kitchen
area or the left hallway. As this example shows, our approach
is robust to changes in the environment such as this one.
Fig. 7. Our experimental environment with (right column) and without
(left column) an added obstacle (gray) between the kitchen and center
table. Predictions of future visitation expectations given a person’s trajectory
(white line) in both settings for two different trajectories. Frequencies range
from red (high log expectation) to dark blue (low log expectation).
E. Comparative Evaluation
We now compare our model’s ability to predict the future
path of a person with a previous approach for modeling
goal-directed trajectories the variable-length Markov model
(VLMM) [6]. The VLMM estimates the probability of a
person’s next cell transition conditioned on the person’s
history of cells visited in the past. It is variable length
because it employs a long history when relevant training data
is abundant, and a short history otherwise.
The results of our experimental evaluation are shown in
Figure 8. We first note that for the training set (denoted
train), that the trajectory log probability of the VLMM is
significantly better than the plan-based model. However, for

Citations
More filters
Proceedings ArticleDOI

Social LSTM: Human Trajectory Prediction in Crowded Spaces

TL;DR: This work proposes an LSTM model which can learn general human movement and predict their future trajectories and outperforms state-of-the-art methods on some of these datasets.
Book ChapterDOI

Activity forecasting

TL;DR: In this article, the authors address the task of inferring the future actions of people from noisy visual input by using state-of-the-art semantic scene understanding combined with ideas from optimal control theory.
Journal ArticleDOI

CHOMP: Covariant Hamiltonian optimization for motion planning

TL;DR: CHOMP (covariant Hamiltonian optimization for motion planning), a method for trajectory optimization invariant to reparametrization, uses functional gradient techniques to iteratively improve the quality of an initial trajectory, optimizing a functional that trades off between a smoothness and an obstacle avoidance component.
Journal ArticleDOI

Human-aware robot navigation: A survey

TL;DR: This paper provides a survey of existing approaches to human-aware navigation and offers a general classification scheme for the presented methods.
Book ChapterDOI

Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes

TL;DR: This paper contributes a new large-scale dataset that collects videos of various types of targets that navigate in a real world outdoor environment such as a university campus and introduces a new characterization that describes the “social sensitivity” at which two targets interact.
References
More filters
Book

Reinforcement Learning: An Introduction

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Journal ArticleDOI

A note on two problems in connexion with graphs

TL;DR: A tree is a graph with one and only one path between every two nodes, where at least one path exists between any two nodes and the length of each branch is given.
Book

Introduction to Reinforcement Learning

TL;DR: In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning.
Journal ArticleDOI

New Results in Linear Filtering and Prediction Theory

TL;DR: The Duality Principle relating stochastic estimation and deterministic control problems plays an important role in the proof of theoretical results and properties of the variance equation are of great interest in the theory of adaptive systems.
Proceedings ArticleDOI

Apprenticeship learning via inverse reinforcement learning

TL;DR: This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function.
Frequently Asked Questions (15)
Q1. What are the contributions in "Planning-based prediction for pedestrians" ?

The authors present a novel approach for determining robot movements that efficiently accomplish the robot ’ s tasks while not hindering the movements of people within the environment. Their approach models the goal-directed trajectories of pedestrians using maximum entropy inverse optimal control. The authors employ the predictions of this model of pedestrian trajectories in a novel incremental planner and quantitatively show the improvement in hindrancesensitive robot trajectory planning provided by their approach. 

The authors have presented a novel approach for predicting future pedestrian trajectories using a soft-max version of goal-based planning. The authors additionally showed the usefulness of this approach for planning hindrance-sensitive routes using a novel incremental path planner. In future work, the authors plan to explicitly model interactions between people so that they can better predict movements in crowded environments. 

In many environments, the relevant features that influence movement change frequently – furniture is moved in indoor environments, the locations of parked vehicles are dynamic in urban environments, and weather conditions influence natural environments with muddy, icy, or dry conditions. 

By maximizing the entropy of the distribution of trajectories, H(Pζ) = − ∑ ζ P (ζ) logP (ζ) subject to the constraint of matching the reward of the person’s behavior in expectation [1], the authors obtain a distribution over trajectories [18]. 

For prescriptive MDP applications, the reward values for actions in the MDP are often engineered to produce appropriate behavior, however for their prediction purposes,we would like to find the reward values that best predict a set of observed trajectories, {ζ̃i}. 

(2)The value iteration algorithm produces these optimal values by alternately applying Equations 1 and 2 as update rules until the values converge. 

The authors smooth this probability to nearby cells using the Manhattan distance (dist(a, b)) and also add probability (P0) for previously unvisited locations to avoid overfitting, yielding: P (dest x) ∝ P0 + ∑ goals g e−dist(x,g). 

The authors can further improve the computational gain of this technique by using efficient replanners such as D* and its variants [4] in the inner loop. 

By re-running this iterative replanner every 0.25 seconds using updated predictions of pedestrian motion, the authors can achieve intelligent adaptive robot behavior that anticipates where a pedestrian is heading and maneuvers well in advance to implement efficient avoidance. 

Trajectories with a very high reward (low cost) are exponentially more preferable to low reward (high cost) trajectories, and trajectories with equal reward are equally probable. 

Algorithm 1 Incorporating predictive pedestrian models via predictive planning1: procedure PREDICTIVEPLANNING(σ > 0, α > 0, {Ds,t}, Dthresh) 2: Initialize cost map to prior navigational costs c0(s). 

In future work, the authors plan to explicitly model interactions between people so that the authors can better predict movements in crowded environments. 

In practice, the authors use the final cost-to-go values of the iteratively constructed cost map to implement a policy that chooses a good action from a predefined collection of actions. 

As the authors have shown, the feature-based cost function learned using this approach allows accurate generalization to changes in the environment. 

The authors accomplish this by replacing the maximum of the Bellman equations with a soft-maximum function, softmaxx f(x) = log ∑ x e f(x).