scispace - formally typeset
Open AccessJournal ArticleDOI

Integrated task and motion planning in belief space

Reads0
Chats0
TLDR
It is shown that a relatively small set of symbolic operators can give rise to task-oriented perception in support of the manipulation goals and form a vocabulary of logical expressions that describe sets of belief states, which are goals and subgoals in the planning process.
Abstract
We describe an integrated strategy for planning, perception, state estimation and action in complex mobile manipulation domains based on planning in the belief space of probability distributions over states using hierarchical goal regression (pre-image back-chaining). We develop a vocabulary of logical expressions that describe sets of belief states, which are goals and subgoals in the planning process. We show that a relatively small set of symbolic operators can give rise to task-oriented perception in support of the manipulation goals. An implementation of this method is demonstrated in simulation and on a real PR2 robot, showing robust, flexible solution of mobile manipulation problems with multiple objects and substantial uncertainty.

read more

Content maybe subject to copyright    Report

Integrated task and motion planning in belief space
Leslie Pack Kaelbling and Tom´as Lozano-P´erez
CSAIL, MIT, Cambridge, MA 02139
{lpk, tlp}@csail.mit.edu
Abstract
We describe an integrated strategy for planning, perception, state-estimation and action in
complex mobile manipulation domains based on planning in the belief space of probability distri-
butions over states using hierarchical goal regression (pre-image back-chaining). We develop a
vocabulary of logical expressions that describe sets of belief states, which are goals and subgoals
in the planning process. We show that a relatively small set of symbolic operators can give rise
to task-oriented perception in support of the manipulation goals. An implementation of this
method is demonstrated in simulation and on a real PR2 robot, showing robust, flexible solution
of mobile manipulation problems with multiple objects and substantial uncertainty.
1 Introduction
As robots become more capable of sophisticated sensing, navigation, and manipulation, we would
like them to carry out increasingly complex tasks autonomously. A robot that helps in a house-
hold must select actions over the scale of hours or days, considering abstract features such as the
desires of the occupants of the house, as well as detailed geometric models that support locating
and manipulating objects. The complexity of such tasks derives from very long time horizons,
large numbers of objects to be considered and manipulated, and fundamental uncertainty about
properties and locations of those objects. Specifying control policies directly, in the form of tables
or state machines, becomes intractable as the size and variability of the domain increases. However,
good control decisions can be made with a compact specification by using a model of the world
dynamics to do on-line planning and execution.
The uncertainty in problems of this kind is pervasive and fundamental: it is, in general, impos-
sible to sense sufficiently to remove all of the important uncertainty. The robot will not know the
contents of all of the containers in a house, or where someone left their car keys, or the owner’s
preference for dinner. In order to behave effectively in the face of such uncertainty, the robot must
explicitly take actions to gain information: look in a cupboard, remove an occluding object, or ask
someone a question.
We have developed an approach to integrated task and motion planning that combines geo-
metric and symbolic representations in an aggressively hierarchical planning architecture, called
hpn (for hierarchical planning in the now), which we summarize in section 1.2 and discuss in de-
tail in [Kaelbling and Lozano-P´erez, 2011, 2012a]. The hierarchical decomposition allows efficient
solution of problems with very long horizons; the symbolic representations support abstraction in
complex domains with large numbers of objects and are integrated effectively with the detailed
geometric models that support motion planning. In this paper, we extend the hpn approach to
1

Figure 1: PR2 robot manipulating objects
handle two types of uncertainty: future-state uncertainty about what the outcome of an action
will be, and current-state uncertainty about what the current state actually is. Future-state uncer-
tainty is handled by planning in approximate deterministic models, performing careful execution
monitoring, and replanning when necessary. Current-state uncertainty is handled by planning in
belief space: the space of probability distributions over possible underlying world states. Explicitly
modeling the robot’s uncertainty during planning enables the selection of actions based both on
their ability to gather information as well as their ability to change the state of the world.
This paper describes a tightly integrated approach, weaving together perception, estimation,
geometric reasoning, symbolic task planning, and control to generate behavior in a real robot that
robustly achieves tasks in complex, uncertain domains. We have formulated this method in the
context of a mobile manipulator doing household tasks, such as the one shown in figure 1, but
it can be applied much more broadly. Problems of surveillance or locating objects of interest are
naturally formulated in this framework, as are more complex operational tasks, such as disaster
relief or managing logistical operations in a dynamic uncertain domain.
1.1 Handling uncertainty
The decision-theoretic optimal approach to planning in domains with probabilistic dynamics is to
make a conditional plan, in the form of a tree or policy, supplying an action to take in response
to any possible outcome of a preceding action. Figure 2(a) illustrates one strategy, which consists
of building a tree starting at the current world state s, branching on the robot’s choice of actions
a and then, for each action, branching on the probability of each resulting state s
0
. To select
actions, one grows the tree out to some depth k, then evaluates it from the bottom up using the
“expectimax” rule. A static evaluation function assigns a value ρ
0
to each leaf node. Then, the
value of a probabilistic node is computed as the expected value of its children, and the value of an
2

s
s' s' s' s' s' s's' s' s'
a1
a2
a3
o3
o2
o1
o3
o2
o1
o3
o2
o1
s'' s'' s'' s'' s'' s''s'' s'' s''
a1
a2
a3
o3
o2
o1
o3
o2
o1
o3
o2
o1
...
...
expectation
expectation
expectation
expectation
expectation
expectation
maximum
maximum
ρ
0
ρ
1
ρ
0
ρ
0
ρ
0
ρ
0
ρ
0
ρ
0
ρ
0
ρ
0
ρ
2
ρ
1
ρ
1
ρ
1
ρ
1
ρ
1
ρ
1
ρ
1
ρ
1
(a) Probabilistic policy tree
state
estimator
controller
action
belief state
observation
environment
(b) pomdp estimation and control architecture
Figure 2: Models of decision-making in uncertain domains.
action-choice node is computed as the maximum of the values of its children. The policy consists
of selecting the maximizing action at any action-choice node that is reached.
The process of constructing and evaluating a tree of this kind can be prohibitively expensive;
but there have been recent advances in effective sample-based approaches [Gelly and Silver, 2008].
For efficiency and robustness, our approach to action selection is to construct a deterministic
approximation of the dynamics, use the approximate dynamics to build a sequential non-branching
plan, execute the plan while perceptually monitoring the world for deviations from the expected
outcomes of the actions, and replan when deviations occur. This method has worked well in control
applications [Mayne and Michalska, 1990] as well as symbolic planning domains [Yoon et al., 2007].
Replanning approaches that work in the state space address uncertainty in the future outcomes
of actions, but not uncertainty about the current world state. Current-state uncertainty is un-
avoidable in most real-world applications. The optimal solution to such problems is found in the
formulation of partially observable Markov decision processes (pomdps) [Smallwood and Sondik,
1973] and involves planning in the belief space rather than the underlying state space of the domain.
The belief space is the space of probability distributions over underlying world states. A controller
in such a problem is divided into two components, as shown in figure 2(b). The state estimator
is a process (such as a Bayesian filter or a Kalman filter), which maintains a current distribution
over underlying world states, conditioned on the history of actions and observations the system has
made. The controller is a policy that maps belief states to actions: it can be computed off-line and
stored in a representation that allows efficient execution, or it can itself be a program that does
significant computation on-line to select an appropriate action for the current belief state.
In the traditional pomdp literature, the goal is to find an optimal or near-optimal policy using
off-line computation; the policy can then be executed on-line with little further computation, simply
determining which of a finite number of categories the belief state belongs to, and executing the
associated action. Recent point-based solvers [Kurniawati et al., 2008] can find near-optimal policies
3

for domains with thousands of states. However, the problem of computing a near-optimal policy
for uncertain manipulation domains with many movable objects is still prohibitively expensive.
Our strategy will be to construct a policy “in the now”: that is, to apply the hpn approach to
interleaved planning and execution, but in the space of beliefs, using a determinized version of the
belief-space dynamics. Belief space is continuous (the space of probability distributions) and so is
not amenable to standard discrete planning approaches. We address it with hpn in the same way
that we address planning for geometric problems: by dynamically constructing discretizations that
are appropriate for the problem at hand.
We will use symbolic descriptions to characterize aspects of the robot’s belief state that specify
goals and subgoals during planning. For example, the condition “With probability greater than
0.95, the cup is in the cupboard,” can be written as BIn(cup, cupboard, 0.05), and might serve
as a goal for planning. Our description of the effects of the robot’s actions, encoded as operator
descriptions, will not be in terms of their effect on the state of the external world, which is not
observable, but in terms of their effect on the logical assertions that characterize the robot’s belief.
In general, it will be very difficult to characterize the exact pre-conditions of an operation in belief
space; we will strive to provide an approximation that supports the construction of reasonable plans
and rely on execution monitoring and replanning to handle errors due to approximation. We will
describe and illustrate this approach in detail in the rest of the paper.
1.2 HPN in observable domains
hpn [Kaelbling and Lozano-P´erez, 2011, 2012a] is a hierarchical approach to solving long-horizon
problems, which performs a temporal decomposition by planning operations at multiple levels of
abstraction; this ensures that problems to be addressed by the planner always have a reasonably
short horizon, making planning feasible.
In order to plan with abstract operators, we must be able to characterize their preconditions
and effects at various levels of abstraction. Even at abstract levels, geometric properties of the
domain may be critical for planning; but if we plan using abstracted models of the operations, we
will not be able to determine a detailed geometric configuration that results from performing an
operation. To support our hierarchical planning strategy we, instead, plan backwards from the goal
using the process known as goal regression [Ghallab et al., 2004] in task planning and pre-image
backchaining in motion planning [Lozano-P´erez et al., 1984]. Starting from the set of states that
satisfies the goal condition, we work backward, constructing descriptions of pre-images of the goal
under various abstract operations. The pre-image is the set of states such that, if the operation
were executed, a goal state would result. The key observation is that, whereas the description of
the detailed world state is an enormously complicated structure, the descriptions of the goal set,
and of pre-images of the goal set, are often simple conjunctions of a few logical requirements.
In a continuous space, pre-images may be characterized geometrically: if the goal is a circle of
locations in x, y space, then the operation of moving one meter in x will have a pre-image that is
also a circle of locations, but with the x coordinate displaced by a meter. In a logical, or combined
logical and geometric space, the definition of pre-image is the same, but computing pre-images will
require a combination of logical and geometric reasoning. We support abstract geometric reasoning
by constructing and referring to salient geometric objects in the logical representation used by the
planner. So, for example, we can say that a region of space must be clear of obstacles before an
object can be placed in its goal location, without specifying a particular geometric arrangement of
the obstacles in the domain.
4

The complexity of planning depends both on the horizon and the branching factor. We use
hierarchy to reduce the horizon, but the branching factor, in general, remains infinite: there are
innumerable places to put an object, innumerable paths for the robot to follow from one config-
uration to another, and so on. We use generators: functions that make use both of constraints
from the goal and heuristic information from the starting state to make choices from these infinite
domains that are likely to be successful: Our approach can be extended to support sample-based
backtracking over these choices if they fail. Because the value-generation process can depend on
the goal and the initial state, the generated values are much more likely to be successful than ones
chosen through an arbitrary sampling or discretization process.
To handle execution errors robustly, we interleave planning with execution, so that all planning
problems have short horizons and start from a current, known initial state. If an action has an
unexpected effect, a new plan can be constructed at not too great a cost, and execution resumed.
We refer to this overall approach as hpn for “hierarchical planning in the now.”
1.3 Related work
Advances in perception, navigation and motion planning have enabled sophisticated manipulation
systems that interleave perception, planning and acting in realistic domains (e.g., those of Srinivasa
et al. [2009], Rusu et al. [2009], Pangercic et al. [2010]). Although these systems use various forms
of planning, there is no systematic planning framework that can cope with manipulation tasks
that involve abstract goals (such as “cook dinner”), that can plan long sequences of actions in the
presence of substantial uncertainty, both in the current state and in the effect of actions, and that
can plan for acquiring information.
The work of Dogar and Srinivasa [2012] comes closest among existing systems to satisfying our
goals. They tackle manipulation of multiple objects using multiple types of primitives, including
grasping and pushing. The inclusion of pushing enables very efficient solutions to some problems
of manipulation in cluttered environments. They also include explicit modeling of uncertainty in
object poses and the effect of the actions on this uncertainty. Their implementation does not
currently encompass actions intended to gather information, but it appears that their approach
could readily be extended to such actions. A fundamental difference from our approach is that
they plan at the level of object poses and do not integrate task-level reasoning nor do they reason
hierarchically.
There have been several recent approaches to integrated task and motion planning [Cambon
et al., 2009, Plaku and Hager, 2010, Marthi et al., 2010, Dornhege et al., 2009] with the potential
of scaling to mobile manipulation problems; we review this related work in detail in [Kaelbling and
Lozano-P´erez, 2012a]. However, none of these approaches addresses uncertainty directly.
A number of recent papers [Alterovitz et al., 2007, Levihn et al., 2012] tackle motion planning
problems with substantial future-state uncertainty. Most relevant is the work of Levihn et al., which
addresses the problem of navigating among movable obstacles in the presence of uncertainty in the
effect of robot actions on the obstacles. However, this work assumes that there is no uncertainty
in the current state.
There have been attempts to formulate and solve problems of planning robot motions un-
der uncertainty in non-probabilistic frameworks dating back to the “pre-image backchaining” pa-
per [Lozano-P´erez et al., 1984]. This work seeks to construct strategies, based on information
sets, that are guaranteed to reach a goal and to know that they have reached it, with guaranteed
termination conditions [Brafman et al., 1997, Erdmann, 1994, Donald, 1988]. Excellent summaries
5

Citations
More filters
Book

A Concise Introduction to Decentralized POMDPs

TL;DR: This book introduces multiagent planning under uncertainty as formalized by decentralized partially observable Markov decision processes (Dec-POMDPs).
Journal ArticleDOI

Integrated Task and Motion Planning

TL;DR: This paper defines a class of TAMP problems and survey algorithms for solving them, characterizing the solution methods in terms of their strategies for solving the continuous-space subproblems and their techniques for integrating the discrete and continuous components of the search.
Proceedings ArticleDOI

Incremental Task and Motion Planning: A Constraint-Based Approach

TL;DR: The Iteratively Deepened Task and Motion Planning method is probabilistically-complete and offers improved performance and generality compared to a similar, state-of-theart, probabilistic-complete planner.
Journal ArticleDOI

Planning in the continuous domain

TL;DR: The proposed probabilistic framework is applied to the problem of uncertainty-constrained exploration, in which the robot has to perform tasks in an unknown environment while maintaining localization uncertainty within given bounds, and produces smooth and natural trajectories.
Proceedings ArticleDOI

Inner Monologue: Embodied Reasoning through Planning with Language Models

TL;DR: In this article , the authors investigate the extent to which large language models can reason over sources of feedback provided through natural language, without any additional training, such as success detection, scene description, and human interaction.
References
More filters
Book

Probabilistic Robotics

TL;DR: This research presents a novel approach to planning and navigation algorithms that exploit statistics gleaned from uncertain, imperfect real-world environments to guide robots toward their goals and around obstacles.
MonographDOI

Planning Algorithms: Introductory Material

TL;DR: This coherent and comprehensive book unifies material from several sources, including robotics, control theory, artificial intelligence, and algorithms, into planning under differential constraints that arise when automating the motions of virtually any mechanical system.
Book

Robot Motion Planning

TL;DR: This chapter discusses the configuration space of a Rigid Object, the challenges of dealing with uncertainty, and potential field methods for solving these problems.
Journal ArticleDOI

Unscented filtering and nonlinear estimation

TL;DR: The motivation, development, use, and implications of the UT are reviewed, which show it to be more accurate, easier to implement, and uses the same order of calculations as linearization.
Book

Planning Algorithms

Related Papers (5)
Frequently Asked Questions (13)
Q1. What have the authors contributed in "Integrated task and motion planning in belief space" ?

The authors describe an integrated strategy for planning, perception, state-estimation and action in complex mobile manipulation domains based on planning in the belief space of probability distributions over states using hierarchical goal regression ( pre-image back-chaining ). The authors develop a vocabulary of logical expressions that describe sets of belief states, which are goals and subgoals in the planning process. The authors show that a relatively small set of symbolic operators can give rise to task-oriented perception in support of the manipulation goals. 

The additional operators to extend to the kitchen domain were relatively easy to construct. It is an important area of future work to explore more highly decomposed and non-analytic ( particlebased ) representations of belief, and to characterize operator pre-images in those representations. This is a critical area of future research, but the authors are optimistic that it will be possible to represent a wide variety of types of uncertainty over very large domains. This integrated general-purpose mechanism current supports robust, flexible, solution of simple mobile manipulation problems in the current implementation, and the authors expect it to serve as a foundation for the solution of significantly more complex problems in the future. 

Because the authors are interested in regression-based planning using logical representations of sets of world states, the authors wish to retain the basic structure of planning to achieve a goal, rather than optimizing the sum of state and action costs over a fixed finite horizon or over the infinite horizon with value discounting. 

In the probabilistic case, as long as each plan that is constructed has a finite probability of success, then eventually a goal state will be reached. 

The decision-theoretic optimal approach to planning in domains with probabilistic dynamics is to make a conditional plan, in the form of a tree or policy, supplying an action to take in response to any possible outcome of a preceding action. 

Planning with simplified domain models is efficient and can be made robust by detecting execution failures and replanning online. 

The idea of characterizing a distribution in terms of pnm and computing its regression applies more generally, for example, to non-parametric distributions represented as sets of samples. 

Their approach to handling probabilistic uncertainty in the outcomes of actions is to: construct a deterministic approximation of the domain, plan a path from the current state to a state satisfying the goal condition; execute the first step of the plan; observe the resulting state; plan a new path to the goal; execute the first step; etc. 

Continuing the search, the authors find that at least one look operation is necessary before attempting to move the object, in order to establish that the object is in the starting location of the move. 

There is only one way to achieve believing the object is in location 0 with high confidence, which is to look in location 0 to verify that the object is there. 

Interleaved hierarchical planning and execution fits beautifully with information gain: the system makes a high-level plan to gather information and then uses it, and the interleaved hierarchical planning and execution architecture ensures that planning that depends on the information naturally takes place after the information has been gathered; and• 

In a continuous space, pre-images may be characterized geometrically: if the goal is a circle of locations in x, y space, then the operation of moving one meter in x will have a pre-image that is also a circle of locations, but with the x coordinate displaced by a meter. 

This approach is convenient because the product of multiple tangent spaces together with regular real spaces for the other dimensions can be taken, and a single multivariate Gaussian used to represent the entire joint distribution over the product space [Fletcher et al., 2003].