scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Planning Framework for Non-Prehensile Manipulation under Clutter and Uncertainty

20 Jun 2012-Autonomous Robots (Springer US)-Vol. 33, Iss: 3, pp 217-236
TL;DR: This work introduces a planning framework that reduces the problem to one of combinatorial search, and demonstrates planning times on the order of seconds, and works under high uncertainty by utilizing the funneling effect of pushing.
Abstract: Robotic manipulation systems suffer from two main problems in unstructured human environments: uncertainty and clutter. We introduce a planning framework addressing these two issues. The framework plans rearrangement of clutter using non-prehensile actions, such as pushing. Pushing actions are also used to manipulate object pose uncertainty. The framework uses an action library that is derived analytically from the mechanics of pushing and is provably conservative. The framework reduces the problem to one of combinatorial search, and demonstrates planning times on the order of seconds. With the extra functionality, our planner succeeds where traditional grasp planners fail, and works under high uncertainty by utilizing the funneling effect of pushing. We demonstrate our results with experiments in simulation and on HERB, a robotic platform developed at the Personal Robotics Lab at Carnegie Mellon University.

Summary (5 min read)

1 Introduction

  • Humans routinely perform remarkable manipulation tasks that their robots find impossible.
  • The blocksworld problem (Winograd 1971) introduced this idea to the AI community.
  • Performing actions other than pick-and-place requires reasoning about the non-rigid interaction between the robot effector and the object.
  • But their framework is not restricted to pick-and-place operations and can accomodate non-prehensile actions.
  • Through the use of different non-prehensile actions, their planner generates plans where an ordinary pickand-place planner cannot; e.g. when there are large, heavy ungraspable objects in the environment.

2 Framework

  • In this section the authors present their framework to rearrange the clutter around a goal object.
  • The framework uses non-prehensile actions that respects quasi-static mechanics.
  • In a given scene with multiple movable objects and a goal object to be grasped, the planner decides which objects to move and the order to move them, decides where to move them, chooses the lower-level actions to use on these objects, and accounts for the uncertainty in the environment all through this process.
  • The robot first pushes the dumbbell away to clear a portion of the space, which it then uses to push the box into.
  • Fig. 4 also shows that the actions to move objects are planned backwards in time.

2.1.1 Which objects to move?

  • In the environment there are a set of movable objects, obj.
  • The authors define the operator FindPenetrated to iden- tify the objects whose spaces are penetrated: FindPenetrated(vol, obj) = {o ∈ obj | volume vol penetrates the space of o} In the case of identifying objects to put into move, vol is the volume of space swept by the robot during its motion and by the object as it is manipulated.
  • In subsequent planning steps (e.g. Step 2 in Fig. 5) the planner searches for actions that move the objects in move.
  • But more importantly restricting the planner to monotone plans makes the search space smaller: the general problem of planning with multiple movable objects is NP-hard (Wilfong 1988).
  • The planner is not allowed to penetrate the spaces of the objects in avoid.

2.1.2 How to address uncertainty?

  • Robots can detect and estimate the poses of objects with a perception system (in their experiments the authors use Martinez et al (2010)).
  • The authors explicitly represent and track the object pose uncertainty during planning.
  • The manipulation actions change the uncertainty of an object o.
  • Each manipulation action outputs νo, i.e. how it evolves the uncertainty region of the object.
  • The authors present the number of samples they use for different uncertainty levels in §3.
  • The authors overload Volume to accept trajectories of regions and robots too; e.g. Volume(o, νo) gives the volume of space swept by the uncertainty of the object during its manipulation, and Volume(robot, τ) computes the three-dimensional volume the robot occupies during a trajectory τ .

2.1.3 How to move an object?

  • At each planning step, their planner searches over a set of possible actions in its action library.
  • The authors will describe the details of specific actions they use (e.g. push-grasp and sweep) and the search over the action-specific parametrizations in §2.1.6 and §2.2.
  • The resulting object motion can be directly derived from the robot trajectory.
  • – Volume(robot, τ) and Volume(o, νo) must be collision- free w.r.t avoidV ol; where robot is the robot body.
  • The authors also use a special action called GoTo, that does not necessarily manipulate an object, but moves the robot arm from one configuration to another.

2.1.4 Where to move an object?

  • This is easy for the original goal object, the red can in the example above.
  • It is the goal configuration passed into the planner, e.g. the final configuration in Fig.
  • But for subsequent objects, the planner does not have a direct goal.
  • The NGR at a planning step is the sum of the volume of space used by all the previously planned actions.
  • This includes both the space the robot arm sweeps and the space the manipulated objects’ uncertainty regions sweep.

2.1.5 Algorithm

  • Rn interacts with movable objects in the set obj.
  • Each recursive call to the Rearrange function is a planning step (Alg. 1).
  • The function searches over the actions in its action library between lines 1-21, to find an action that moves the goal object to the goal configuration (line 4), and then to move the arm to the initial configuration of the next action (line 7).
  • Then it uses this volume of space to find the objects whose spaces have been penetrated and adds these objects to the list move (line 12).
  • If any of these calls return a plan, the current trajectory is added at the end and returned again (line 20).

2.1.6 Action Library

  • The generic interface for actions is given in Eq. 1. In this section the authors briefly describe the actions in their action library and explain how they satisfy this generic interface.
  • Sweep uses the outside region of the hand to push an object.
  • Similar to Push-grasp, the authors check that the capture region of the Sweep includes all the poses sampled from the uncertainty region of the object (Fig. 7b).
  • The GoTo action moves the robot arm from one configuration to the other, also known as – GoTo.
  • To implement this action the authors use an extension of the Rapidly-Exploring Random Tree (RRT) (Lavalle and Kuffner 2000) planner, namely the Constrained Bi-directional RRT planner (Berenson et al 2009a).

2.2.1 The push-grasp

  • The push-grasp is a straight motion of the hand parallel to the pushing surface along a certain direction, followed by closing the fingers (Fig. 8).
  • The push distance d of the hand measured as the translation along the pushing direction.
  • The authors make certain assumptions while modeling and executing push-grasps: – The interaction between the robot hand and the ob- ject is quasi-static, meaning that the inertial forces are negligible.
  • To prevent objects from toppling, the robot pushes them as low as possible.
  • This includes not only circularly symmetric distributions, but also rectangles, equilateral triangles; any distribution that repeats itself in a revolution about the center of pressure Hothe authors and Cutkosky (1996).

2.2.2 The Capture Region of a Push-Grasp

  • Given the push-grasp, the object’s geometry and physical properties, which the authors term O, and the object’s initial pose, they can utilize the mechanics of manipulation described before to predict the object’s motion.
  • The authors call this set the capture region C(G,O) ⊂ SE(2) of the push-grasp.
  • The equatorial radii are found by calculating the maximum friction force (fmax) that the supporting surface can apply to the object, which occurs when the object is translating.
  • The capture region is the area bounded by the black curve.
  • If an object is placed at a point on this curve then during the push-grasp the left finger will make contact with the object and the object will eventually roll inside the hand.

2.2.3 Efficient Representation of Capture Regions

  • By computing C(G,O) relative to the coordinate frame of the hand, the authors can reduce the dependence to the aperture a and the pushing distance d.
  • To understand why this is the case, one can divide a long push into two parts, and think of the last part as an individual push with the remaining distance.
  • Dmax − dsub where dsub is the distance between the object and the top part of the capture region curve along the pushing direction v (see P3 and P4 in Fig. 10).
  • Referring again to the regions in Fig. 9(b), changing a only affects the width of the regions II and V, but not I and III.
  • Note that this is only true assuming the fingertips are cylindrical in shape, hence the contact surface shapes do not change with different apertures.

2.2.4 Validating Capture Regions

  • The authors ran 150 real robot experiments to determine if the precomputed models were good representations of the motion of a pushed object, and whether they were really conservative about which objects will roll into the hand during a push.
  • The setup and two example cases where the push grasp failed and succeeded are shown in Fig. 11c.
  • The results (Fig. 11a) show that, the simulated capture region is a conservative model of the real capture region.
  • There are object poses outside the region for which the real object rolled into the hand (green circles outside the black curve); but there are no object poses inside the curve for which the real object did not roll into the hand.
  • This guarantees success, in the sense that their planner always overestimates the pushing distance needed.

2.2.5 Overlapping Uncertainty and Capture Regions

  • The overlap between a capture region and an uncertainty region indicates whether a push-grasp will succeed under uncertainty.
  • Here the robot detects a juice bottle (Fig. 12a).
  • If the uncertainty region is completely included in the capture region as in Fig. 12c, then the authors can guarantee that the push-grasp will succeed.
  • The uncertainty and capture regions are two-dimensional in Fig. 12 only because the bottle is radially symmetric.
  • In general, these regions are three-dimensional, nonconvex and potentially even disjoint (e.g. multi-modal uncertainty regions).

2.2.6 Finding a successful push-grasp

  • The planner searches for a push-grasp such that the hand can grasp all the samples drawn from the uncertainty region of the object, and the resulting hand motion can be executed with the arm.
  • But the authors try to minimize the number of such objects to get more efficient plans.
  • The authors rotate the robot hand around the goal object and check the number of objects it collides with.

2.2.7 Evolution of uncertainty region during pushing

  • The authors use the capture region to also represent the evolution of the uncertainty region of a manipulated object, νo.
  • Hence, the authors discretize a push into smaller steps, and use these series of capture regions to conservatively approximate the evolution of the uncertainty region, νo, of the object o. 3 Experiments and Results 3.1 Push-grasping Experiments.
  • The authors will present their experiments with the complete framework in §3.2.
  • Simulation experiments are performed in OpenRAVE (Diankov and Kuffner 2008).

3.1.1 Robotic Platform

  • HERB has two 7-DoF WAM arms and 4-DoF Barrett hands with three fingers.
  • A camera is attached to the palm to detect objects and estimate their poses.

3.1.2 Planner performance

  • The authors compared the performance of their push-grasp planner with another grasp planner that can handle uncertainty about the object pose.
  • In their implementation, to supply the TSRs with a set of hypotheses the authors used samples from the uncertainty region of their objects.
  • The authors categorize scenes as no clutter (1 object), medium clutter (2-3 objects placed apart from each other), and high clutter (3-4 objects placed close to each other).
  • The same value for the Push-Grasp planner is in the top right.
  • Push-grasp planner takes a longer but reasonable amount of time (tens of seconds) in difficult environments where Uncertainty TSR planner fails.

3.1.3 Real Robot Experiments

  • In the first, the authors used the actual uncertainty profile of their object pose estimation system.
  • In the second set of experiments, the authors introduced higher noise to the detected object poses.
  • The authors assumed each dimension to be mutually independent and used the standard deviation values σ: (0.007m, 0.07rad) to build the covariance matrix Q. Uncertainty TSR planner was able to find a plan three out of five times, and the push-grasp planner was able to find a plan four out of five times.
  • Videos of their robot executing push-grasps are online at www.cs.cmu.edu/˜mdogar/pushgrasp.
  • The authors also used large boxes which the robot cannot grasp.

3.2.1 Pushing vs. Pick-and-Place

  • Here, the authors compare their planner in terms of the efficiency (planning and execution time) and effectiveness (whether the planner is able to find a plan or not) with a planner that can only perform pick-and-place operations.
  • The robot’s goal is to retrieve the coke can from among the clutter.
  • The authors present the plans that the two different planners generate.
  • The pick-and-place planner though cannot grasp and pick up the large box, hence needs to pick up two other objects and avoid the large box.
  • This results in a longer plan, and a longer execution time for the pick-and-place planner.

3.2.2 Addressing uncertainty

  • One of the advantages of using pushing is that pushing actions can account for much higher uncertainty than direct grasping approaches.
  • To demonstrate this the authors created scenes where they applied high uncertainty to the detected object poses.
  • The pick-andplace planner fails to find a plan in this scene too, as it cannot find a way to guarantee the grasp of the objects with such high uncertainty.
  • The pushing planner generates plans even with the high uncertainty.

5 Conclusion

  • The framework consists of a high-level rearrangement planner, and a low-level library of nonprehensile and prehensile actions.
  • The authors plan to extend this framework such that it can use sensor feedback, it can actively look for parts of the space that are occluded, and it can move multiple objects simultaneously in rearranging clutter.

Did you find this useful? Give us your feedback

Figures (20)

Content maybe subject to copyright    Report

Autonomous Robots manuscript No.
(will be inserted by the editor)
A Planning Framework for Non-prehensile Manipulation under
Clutter and Uncertainty
Mehmet R. Dogar · Siddhartha S. Srinivasa
Received: date / Accepted: date
Abstract Robotic manipulation systems suffer from
two main problems in unstructured human environ-
ments: uncertainty and clutter. We introduce a plan-
ning framework addressing these two issues. The frame-
work plans rearrangement of clutter using non-prehensile
actions, such as pushing. Pushing actions are also used
to manipulate object pose uncertainty. The framework
uses an action library that is derived analytically from
the mechanics of pushing and is provably conservative.
The framework reduces the problem to one of combi-
natorial search, and demonstrates planning times on
the order of seconds. With the extra functionality, our
planner succeeds where traditional grasp planners fail,
and works under high uncertainty by utilizing the fun-
neling effect of pushing. We demonstrate our results
with experiments in simulation and on HERB, a robotic
platform developed at the Personal Robotics Lab at
Carnegie Mellon University.
Keywords Manipulation among movable obstacles ·
Manipulation under uncertainty · Non-prehensile
manipulation · Pushing
1 Introduction
Humans routinely perform remarkable manipulation tasks
that our robots find impossible. Imagine waking up in
the morning to make coffee. You reach into the fridge
to pull out the milk jug. It is buried at the back of the
fridge. You immediately start rearranging content
you push the large heavy casserole out of the way, you
Mehmet R. Dogar · Siddhartha S. Srinivasa
The Robotics Institute, Carnegie Mellon University
5000 Forbes Avenue, Pittsburgh, PA, USA
Tel.: +1-412-973-9615
E-mail: {mdogar,siddh}@cs.cmu.edu
carefully pick up the fragile crate of eggs and move it
to a different rack, but along the way you push the box
of leftovers to the corner with your elbow.
Humans perform such manipulation tasks everyday.
The variety of actual situations we encounter are end-
less, but our approach to them share common themes:
the list of manipulation primitives that we use are not
limited to grasping and include non-prehensile actions
such as pushing, pulling, toppling; we are fearless in re-
arranging clutter surrounding our primary task we
care about picking up the milk jug and everything else
is in the way; we are acutely aware of the consequences
of our actions we push the casserole with enough
control to be able to move it without ejecting it from
the fridge.
Successful robotic manipulation in human environ-
ments requires similar characteristics. In this work we
propose a framework for robotic manipulation that plans
a rearrangement of clutter, uses non-prehensile pushing
actions as well as grasping actions, and tracks the con-
sequences of actions by reasoning about the uncertainty
in object pose and motion.
We present an example scene in Fig. 1. The robot’s
task is retrieving the red can which is surrounded by
clutter. The robot first pushes the large box to the
side and then uses that space to grasp the red can.
It produces these actions autonomously using our plan-
ning framework. The planner identifies the objects to be
moved: in the example the box is chosen among other
objects in the scene. The box is a good choice but it
is a big object that does not easily fit inside the robot
hand, i.e. it is not graspable. Since our framework can
work with non-prehensile actions, the box can be moved
without grasping it.
Our planner reasons about the uncertainty before
and during the motion of objects. Fig. 2 illustrates

2 Mehmet R. Dogar, Siddhartha S. Srinivasa
Fig. 1 The planner rearranging clutter to reach to a goal object. Pushing actions are useful for moving large objects that do
not fit inside the hand, i.e. are not graspable. Planning time for the full sequence of actions in this example is 16.6 sec.
Fig. 2 The planner generates pushing actions that are robust to the pose uncertainty of objects. Uncertainty is represented
using copies of the same object at different poses. Planning time for the full sequence of actions in this example is 23.4 sec.
the problem of planning under object pose uncertainty
more clearly. One source of uncertainty is perception:
robot cameras are used to detect and estimate the poses
of objects but these pose estimates come with some
amount of error. The second source of uncertainty is
the action of the robot on an object: our predictions of
how an object moves when it is pushed are not exact.
Our framework accounts for both types of uncertainty
when generating manipulation plans. When possible,
our framework utilizes pushing actions to funnel large
amounts of uncertainty into smaller amounts. In Fig. 2
the uncertainty of the red can is funneled into the hand
using a pushing action before it is grasped.
The idea of rearranging objects to accomplish a task
has been around for a few hundred years. We encounter
this idea in games like the Tower of Hanoi (Chartrand
1985), the 15-Puzzle and numerous others. The blocks-
world problem (Winograd 1971) introduced this idea to
the AI community. STRIPS (Fikes and Nilsson 1971) is
a well-known planner to solve this problem. In robotics,
the problem is named planning among movable obsta-
cles. The general problem is NP-hard (Wilfong 1988).
Most of the existing planners work in the domain of
two-dimensional robot navigation and take advantage
of the low-dimensionality by explicitly representing, or
discretizing, the robot C-space (Ben-Shahar and Rivlin
1998b; Chen and Hwang 1991; van den Berg et al 2008).
These approaches are not practical for a manipulator
arm with high degrees of freedom (DOF). Another group
of planners are based on a search over different order-
ings to move the obstacle objects in the environment
(Ben-Shahar and Rivlin 1998a; Overmars et al 2006;
Stilman and Kuffner 2006). Planners that solve simi-
lar rearrangement problems in manipulation using real
robotic hardware are also known (Stilman et al 2007).
The planner from Stilman et al (2007) works back-
wards in time and identifies the objects that needs to
be moved by computing the swept volume of the robot
during actions. Recently, Kaelbling and Lozano-Perez
(2011) proposed a planner that also identifies obstacles
by computing swept volumes of future actions. In all of
these cases, the physical act of manipulating an object
is abstracted into a simple action, like pick-and-place.
While extremely successful and algorithmically elegant,
the simplified assumptions on actions severely restrict
versatility. For example, such an algorithm would pro-
duce a solution whereby the robot carefully empties the
contents of the fridge onto the countertop, pulls out
the milk jug and then carefully refills the fridge. A per-
fectly valid plan, but one that is inefficient, and often
impossible to execute with heavy, large, or otherwise
ungraspable objects.
Pick-and-place actions are, however, easy to ana-
lyze. Once an object is rigidly grasped, it can be treated
as an extension of the robot body, and the planning
problem reduces to one of geometry. Performing actions
other than pick-and-place requires reasoning about the
non-rigid interaction between the robot effector and the
object.
A separate thread of work, rooted in Coulomb’s for-
mulation of friction, uses mechanics to analyze the con-
sequences of manipulation actions (Mason 1986; Goyal

A Planning Framework for Non-prehensile Manipulation under Clutter and Uncertainty 3
et al 1991; Howe and Cutkosky 1996; Peshkin and Sander-
son 1988; Brost 1988). Mason (1986) investigates the
mechanics and planning of pushing for robotic object
manipulation. One of the first planners that incorpo-
rates the mechanics of pushing was developed by Lynch
and Mason (1996). Using the planner, a robot is able
to push an object in a stable manner using edge-edge
contact to a goal position. Goyal et al (1991) show that,
in the quasi-static case, the motion of a pushed object
is determined by the limit surface, which we use in pre-
dicting consequences of pushing actions. Manipulation
planners and robot actions that use these physical mod-
els have been developed (Lynch and Mason 1996; Lynch
1999a; Akella and Mason 1998; Peshkin and Sanderson
1988; Agarwal et al 1997; Hauser and Ng-Thow-Hing
2011; Kappler et al 2012). Our planner uses pushing to
address uncertainty and as a pre-grasp strategy, simi-
lar to these planners. A key difference of our framework
is its ability to address clutter through rearrangement
planning.
In this work we make an attempt at merging these
two threads of work: geometric rearrangement plan-
ning and mechanical modeling and analysis. We present
a framework that plans sequences of actions to rear-
range clutter in manipulation tasks. This is a general-
ization of the planner from Stilman et al (2007). But
our framework is not restricted to pick-and-place oper-
ations and can accomodate non-prehensile actions. We
also present mechanically realistic pushing actions that
are integrated into our planner.
Through the use of different non-prehensile actions,
our planner generates plans where an ordinary pick-
and-place planner cannot; e.g. when there are large,
heavy ungraspable objects in the environment. We also
show that our planner is robust to uncertainty.
2 Framework
In this section we present our framework to rearrange
the clutter around a goal object. The framework uses
non-prehensile actions that respects quasi-static me-
chanics. It produces open-loop plans which are conser-
vative to the uncertainty in object poses. This uncer-
tainty may be coming from either the non-stable non-
prehensile actions or from the perception system that
initially detects the objects. The framework consists of
a high-level planner that decides on the sequence of ob-
jects to move and where to move them. The high-level
planner uses a library of lower level actions to plan the
actual robot trajectories that move the objects. The
lower-level actions are also open-loop and do not re-
quire sensor feedback during execution.
Fig. 3 An example scene. The robot’s task is picking up the
red can. The robot rearranges the clutter around the goal ob-
ject and achieves the goal in the final configuration. The robot
executes the series of actions shown in Fig. 4. We present the
planning process in Fig. 5.
We first present the high-level planning framework,
and then present the quasi-static pushing actions used
by the high-level planner.
2.1 Planning Framework
In a given scene with multiple movable objects and a
goal object to be grasped, the planner decides which
objects to move and the order to move them, decides
where to move them, chooses the lower-level actions to
use on these objects, and accounts for the uncertainty in
the environment all through this process. This section
describes how we do that.
We describe our framework with the example in
Fig. 3. The robot’s task is picking up the red can. There
are two other objects on the table: a brown box which
is too large to be grasped, and the dark blue dumbbell
which is too heavy to be lifted.
The sequence of robot actions shown in Fig. 4 solves
this problem. The robot first pushes the dumbbell away
to clear a portion of the space, which it then uses to
push the box into. Afterwards it uses the space in front
of the red can to grasp and move it to the goal position.
Fig. 4 also shows that the actions to move objects
are planned backwards in time. We visualize part of
this planning process in Fig. 5. In each planning step
we move a single object and plan two actions. The first
one (e.g. Push-grasp and Sweep in Fig. 5) is to manip-
ulate the object. The second one (GoTo in Fig. 5) is
to move the arm to the initial configuration of the next
action to be executed. We explain the details of these
specific actions in §2.1.6. We discuss a number of ques-
tions below to explain the planning process and then
present the algorithm in §2.1.5.
2.1.1 Which objects to move?
In the environment there are a set of movable objects,
obj. The planner identifies the objects to move by first
attempting to grasp the goal object (Step 1 in Fig. 5).

4 Mehmet R. Dogar, Siddhartha S. Srinivasa
Fig. 4 We show the snapshots of the planned actions in the order they are executed. The execution timeline goes from left
to right. Each dot on the execution timeline corresponds to a snapshot. Planning goes from right to left. Each dot on the
planning timeline corresponds to a planning step. The connections to the execution timeline shows the robot motions planned
in a planning step. Details of this planning process are in Fig. 5.
During this grasp, both the robot and the red can, as
it is moved by the robot, are allowed to penetrate the
space other objects in obj occupy. Once the planner
finds an action that grasps the red can, it identifies the
objects whose spaces are penetrated by this action and
adds them to a list called move. These objects need to
be moved for the planned grasp to be feasible. At the
end of Step 1 in Fig. 5, the brown box is added to move.
We define the operator FindPenetrated to iden-
tify the objects whose spaces are penetrated:
FindPenetrated(vol, obj) = {o obj | volume vol
penetrates the space of o}
In the case of identifying objects to put into move, vol is
the volume of space swept by the robot during its mo-
tion and by the object as it is manipulated. We compute
the volume of space an object occupies by taking into
account the pose uncertainty (§2.1.2).
In subsequent planning steps (e.g. Step 2 in Fig. 5)
the planner searches for actions that move the objects
in move. Again, the robot and the manipulated object
are allowed to penetrate other movable objects’ spaces.
We add the penetrated objects to move.
This recursive process continues until all the objects
in move are moved. The objects that are planned for
earlier should be moved later in the execution. In other
words, we plan backwards in identifying the objects to
move.
Allowing the actions to freely penetrate other ob-
jects’ spaces can result in a plan where objects are
moved unnecessarily. Hence, our planner tries to mini-
mize the number of these objects. This is described in
§2.1.6.
We also restrict the plans to monotone plans; i.e.
plans where an object can be moved at most once. This
avoids dead-lock situations where a plan to move ob-
ject A results in object B being moved, which in turn
Fig. 5 The planning timeline. Three snapshots are shown for
each planning step. The planner plans two consecutive arm
motions at each step, from the first snapshot to the second
snapshot, and from the second snapshot to the third snap-
shot. These motions are represented by blue dashed lines.
The purple regions show the negative goal regions (NGRs),
which are the regions the object needs to be moved out of
(§2.1.4). The object pose uncertainty is represented using a
collection of samples of the objects.
makes object A move, and so on. But more impor-
tantly restricting the planner to monotone plans makes
the search space smaller: the general problem of plan-
ning with multiple movable objects is NP-hard (Wil-
fong 1988). We enforce monotone plans by keeping a
list of objects called avoid. At the end of each suc-
cessful planning step the manipulated object is added
to avoid. The planner is not allowed to penetrate the
spaces of the objects in avoid. In Fig. 5 in Step 2 the

A Planning Framework for Non-prehensile Manipulation under Clutter and Uncertainty 5
avoid list includes the red can, in Step 3 it includes the
red can and the brown box.
2.1.2 How to address uncertainty?
Robots can detect and estimate the poses of objects
with a perception system (in our experiments we use
Martinez et al (2010)). Inaccuracies occur in pose esti-
mation, and manipulation plans that do not take this
into account can fail. Non-prehensile actions can also
decrease or increase object pose uncertainty. Our plan-
ner generates plans that are robust to uncertainty. We
explicitly represent and track the object pose uncer-
tainty during planning.
Given a probability density function, f
o
, over the
set of possible poses, we define the uncertainty region
of an object o as the set of poses it can be in such that
f
o
is larger than :
U(o) = {q SE(3)|f
o
(q) > }
We define the uncertainty region to be in SE(3) because
we assume no uncertainty in objects’ height and we
also assume that the objects are standing upright on a
surface.
Before the planning starts, the robot’s perception
system suggests a pose ˆq
o
for each object o in the scene.
We estimate the initial pose uncertainty of o as a mul-
tivariate normal distribution centered at ˆq
o
with the
covariance matrix Q. We estimate the covariance ma-
trix Q by empirically modeling the error profile of our
perception system (§3 presents the values we used to
build the matrix Q in our experiments). In the rest of
this paper we use U(o) specifically to refer to the initial
pose uncertainty of an object o.
The manipulation actions change the uncertainty of
an object o. We represent this as a trajectory ν
o
:
ν
o
: [0, 1] R
where R is the power set of SE(3). We call ν
o
the evo-
lution of the uncertainty region of object o. ν
o
[0] is the
same as U(o). ν
o
[1] refers to the final uncertainty region
of the object after manipulation. Each manipulation ac-
tion outputs ν
o
, i.e. how it evolves the uncertainty re-
gion of the object. §2.2.7 describes how ν
o
is estimated
for pushing actions as a series of shrinking capture re-
gions.
We used random sampling to represent all uncer-
tainty regions. We present the number of samples we
use for different uncertainty levels in §3. Fig. 5 illus-
trates the pose uncertainty using such samples.
During planning, we compute the volume of space
an object occupies using U , not only the most likely
pose. Likewise we compute the space swept by a manip-
ulated object using ν
o
. We define the operator Volume,
which takes as input an object and a region, and com-
putes the total 3-dimensional volume of space the ob-
ject occupies if it is placed at every point in the re-
gion. For example, Volume(o, U(o)) gives the volume
of space occupied by the initial uncertainty region of
object o.
We overload Volume to accept trajectories of re-
gions and robots too; e.g. Volume(o, ν
o
) gives the vol-
ume of space swept by the uncertainty of the object
during its manipulation, and Volume(robot, τ ) com-
putes the three-dimensional volume the robot occupies
during a trajectory τ . We compute this volume using
a high-resolution sampling of configurations along the
trajectory. We place three-dimensional models of the
robot links at the corresponding poses at all the sam-
pled points and sum them up to get the volume needed
by the full trajectory.
2.1.3 How to move an object?
At each planning step, our planner searches over a set
of possible actions in its action library. For example
in Step 1 of Fig. 5 the planner uses the action named
push-grasp, and in Step 2 it uses the action sweep. Push-
grasp uses pushing to funnel a large object pose uncer-
tainty into the hand. Sweep uses the outside of the hand
to push large objects. Each low-level action, in turn,
searches over different action-specific parametrizations
to move the object; e.g. different directions to push-
grasp an object, or different trajectories to use when
moving the arm from one configuration to the other.
We will describe the details of specific actions we use
(e.g. push-grasp and sweep) and the search over the
action-specific parametrizations in §2.1.6 and §2.2. Be-
low we present the general properties an action should
have so that it can be used by our high-level planner.
In grasp based planners robot manipulation actions
are simply represented by a trajectory of the robot arm:
τ : [0, 1] C where C is the configuration space of
the robot. The resulting object motion can be directly
derived from the robot trajectory. With non-prehensile
actions this is not enough and we also need information
about the trajectory of the object motion: the evolution
of the uncertainty region of the object. Hence the inter-
face of an action a in our framework takes as an input
the object to be moved o, a region of goal configura-
tions for the object G, and a volume of space to avoid
avoidV ol; and outputs a robot trajectory τ , and the
evolution of the uncertainty region of the object during
the action ν
o
:
(τ, ν
o
) a(o, G, avoidV ol) (1)

Citations
More filters
Proceedings ArticleDOI
Chelsea Finn1, Sergey Levine1
01 May 2017
TL;DR: This work develops a method for combining deep action-conditioned video prediction models with model-predictive control that uses entirely unlabeled training data and enables a real robot to perform nonprehensile manipulation — pushing objects — and can handle novel objects not seen during training.
Abstract: A key challenge in scaling up robot learning to many skills and environments is removing the need for human supervision, so that robots can collect their own data and improve their own performance without being limited by the cost of requesting human feedback. Model-based reinforcement learning holds the promise of enabling an agent to learn to predict the effects of its actions, which could provide flexible predictive models for a wide range of tasks and environments, without detailed human supervision. We develop a method for combining deep action-conditioned video prediction models with model-predictive control that uses entirely unlabeled training data. Our approach does not require a calibrated camera, an instrumented training set-up, nor precise sensing and actuation. Our results show that our method enables a real robot to perform nonprehensile manipulation — pushing objects — and can handle novel objects not seen during training.

620 citations


Cites methods from "A Planning Framework for Non-Prehen..."

  • ...Standard model-based methods for robotic manipulation might involve estimating the physical properties of the environment, and then solving for the controls based on the known laws of physics [4], [5], [6]....

    [...]

Journal ArticleDOI
TL;DR: It is shown that a relatively small set of symbolic operators can give rise to task-oriented perception in support of the manipulation goals and form a vocabulary of logical expressions that describe sets of belief states, which are goals and subgoals in the planning process.
Abstract: We describe an integrated strategy for planning, perception, state estimation and action in complex mobile manipulation domains based on planning in the belief space of probability distributions over states using hierarchical goal regression (pre-image back-chaining). We develop a vocabulary of logical expressions that describe sets of belief states, which are goals and subgoals in the planning process. We show that a relatively small set of symbolic operators can give rise to task-oriented perception in support of the manipulation goals. An implementation of this method is demonstrated in simulation and on a real PR2 robot, showing robust, flexible solution of mobile manipulation problems with multiple objects and substantial uncertainty.

380 citations


Cites background from "A Planning Framework for Non-Prehen..."

  • ...The work of Dogar and Srinivasa (2012) comes closest among existing systems to satisfying our goals....

    [...]

Proceedings Article
05 Dec 2016
TL;DR: In this paper, the authors investigate an experiential learning paradigm for acquiring an internal model of intuitive physics, by jointly estimating forward and inverse models of dynamics, which can then be used for multi-step decision making.
Abstract: We investigate an experiential learning paradigm for acquiring an internal model of intuitive physics. Our model is evaluated on a real-world robotic manipulation task that requires displacing objects to target locations by poking. The robot gathered over 400 hours of experience by executing more than 100K pokes on different objects. We propose a novel approach based on deep neural networks for modeling the dynamics of robot's interactions directly from images, by jointly estimating forward and inverse models of dynamics. The inverse model objective provides supervision to construct informative visual features, which the forward model can then predict and in turn regularize the feature space for the inverse model. The interplay between these two objectives creates useful, accurate models that can then be used for multi-step decision making. This formulation has the additional benefit that it is possible to learn forward models in an abstract feature space and thus alleviate the need of predicting pixels. Our experiments show that this joint modeling approach outperforms alternative methods.

253 citations

Posted Content
TL;DR: In this paper, the authors investigate an experiential learning paradigm for acquiring an internal model of intuitive physics, by jointly estimating forward and inverse models of dynamics, which can then be used for multi-step decision making.
Abstract: We investigate an experiential learning paradigm for acquiring an internal model of intuitive physics. Our model is evaluated on a real-world robotic manipulation task that requires displacing objects to target locations by poking. The robot gathered over 400 hours of experience by executing more than 100K pokes on different objects. We propose a novel approach based on deep neural networks for modeling the dynamics of robot's interactions directly from images, by jointly estimating forward and inverse models of dynamics. The inverse model objective provides supervision to construct informative visual features, which the forward model can then predict and in turn regularize the feature space for the inverse model. The interplay between these two objectives creates useful, accurate models that can then be used for multi-step decision making. This formulation has the additional benefit that it is possible to learn forward models in an abstract feature space and thus alleviate the need of predicting pixels. Our experiments show that this joint modeling approach outperforms alternative methods.

199 citations

Posted Content
TL;DR: In this article, the authors train two fully convolutional networks that map from visual observations to actions: one infers the utility of pushes for a dense pixel-wise sampling of end effector orientations and locations, while the other does the same for grasping.
Abstract: Skilled robotic manipulation benefits from complex synergies between non-prehensile (e.g. pushing) and prehensile (e.g. grasping) actions: pushing can help rearrange cluttered objects to make space for arms and fingers; likewise, grasping can help displace objects to make pushing movements more precise and collision-free. In this work, we demonstrate that it is possible to discover and learn these synergies from scratch through model-free deep reinforcement learning. Our method involves training two fully convolutional networks that map from visual observations to actions: one infers the utility of pushes for a dense pixel-wise sampling of end effector orientations and locations, while the other does the same for grasping. Both networks are trained jointly in a Q-learning framework and are entirely self-supervised by trial and error, where rewards are provided from successful grasps. In this way, our policy learns pushing motions that enable future grasps, while learning grasps that can leverage past pushes. During picking experiments in both simulation and real-world scenarios, we find that our system quickly learns complex behaviors amid challenging cases of clutter, and achieves better grasping success rates and picking efficiencies than baseline alternatives after only a few hours of training. We further demonstrate that our method is capable of generalizing to novel objects. Qualitative results (videos), code, pre-trained models, and simulation environments are available at this http URL

153 citations

References
More filters
Book
31 Oct 1995
TL;DR: In this article, the authors describe a problem solver called STRIPS that attempts to find a sequence of operators in a spcce of world models to transform a given initial world model into a model in which a given goal formula can be proven to be true.
Abstract: We describe a new problem solver called STRIPS that attempts to find a sequence of operators in a spcce of world models to transform a given initial world model into a model in which a given goal formula can be proven to be true. STRIPS represents a world n,~del as an arbitrary collection of first-order predicate calculus formulas and is designed to work with .models consisting of large numbers of formulas. It employs a resolution theorem prover to answer questions of particular models and uses means-ends analysis to guide it to the desired goal-satisfying model.

1,793 citations

Journal ArticleDOI
TL;DR: In this article, a theoretical exploration of the mechanics of pushing is presented and applied to the analysis and synthesis of robotic manipulator operations, and the results show that pushing is an essential component of many manipulator operations.
Abstract: Pushing is an essential component of many manipulator operations. This paper presents a theoretical exploration of the mechanics of pushing and demonstrates application of the theory to analysis and synthesis of robotic manipulator oper ations.

602 citations


"A Planning Framework for Non-Prehen..." refers background in this paper

  • ...A separate thread of work, rooted in Coulomb’s formulation of friction, uses mechanics to analyze the consequences of manipulation actions (Mason 1986; Goyal et al. 1991; Howe and Cutkosky 1996; Peshkin and Sanderson 1988; Brost 1988)....

    [...]

  • ...The voting theorem (Mason 1986) states that vp and the edges of the friction cone votes on the direction the object will rotate....

    [...]

  • ...Mason (1986) investigates the mechanics and planning of pushing for robotic object manipulation....

    [...]

Book
01 Feb 1971
TL;DR: A system for the computer understanding of English that combines a complete syntactic analysis of each sentence with a 'heuristic understander' which uses different kinds of information about a sentence, other parts of the discourse, and general information about the world in deciding what the sentence means.
Abstract: : The paper describes a system for the computer understanding of English. The system answers questions, executes commands, and accepts information in normal English dialog. It uses semantic information and context to understand discourse and to disambiguate sentences. It combines a complete syntactic analysis of each sentence with a 'heuristic understander' which uses different kinds of information about a sentence, other parts of the discourse, and general information about the world in deciding what the sentence means.

576 citations


"A Planning Framework for Non-Prehen..." refers background in this paper

  • ...The blocks-world problem (Winograd 1971) introduced this idea to the AI community....

    [...]

Journal ArticleDOI
TL;DR: A planner for finding stable pushing paths among obstacles is described, and the planner is demon strated on several manipulation tasks.
Abstract: We would like to give robots the ability to position and orient parts in the plane by pushing, particularly when the parts are too large or heavy to be grasped and lifted. Unfortunately, the motion of a pushed object is generally unpredictable due to unknown support friction forces. With multiple pushing contact points, however, it is possible to find pushing directions that cause the object to remain fixed to the manipulator. These are called stable pushing directions. In this article we consider the problem of planning pushing paths using stable pushes. Pushing imposes a set of nonholonomic velocity constraints on the motion of the object, and we study the issues of local and global controllability during pushing with point contact or stable line contact. We describe a planner for finding stable pushing paths among obstacles, and the planner is demon strated on several manipulation tasks.

513 citations


"A Planning Framework for Non-Prehen..." refers background or methods in this paper

  • ...Manipulation planners and robot actions that use these physical models have been developed (Lynch and Mason 1996; Lynch 1999a; Akella and Mason 1998; Peshkin and Sanderson 1988; Agarwal et al. 1997; Hauser and Ng-Thow-Hing 2011; Kappler et al. 2012)....

    [...]

  • ...One of the first planners that incorporates the mechanics of pushing was developed by Lynch and Mason (1996)....

    [...]

01 Jan 2008
TL;DR: This work introduces an open-source cross-platform software architecture called OpenRAVE, the Open Robotics and Animation Virtual Environment, targeted for real-world autonomous robot applications, and includes a seamless integration of 3-D simulation, visualization, planning, scripting and control.
Abstract: One of the challenges in developing real-world autonomous robots is the need for integrating and rigorously testing high-level scripting, motion planning, perception, and control algorithms. For this purpose, we introduce an open-source cross-platform software architecture called OpenRAVE, the Open Robotics and Animation Virtual Environment. OpenRAVE is targeted for real-world autonomous robot applications, and includes a seamless integration of 3-D simulation, visualization, planning, scripting and control. A plugin architecture allows users to easily write custom controllers or extend functionality. With OpenRAVE plugins, any planning algorithm, robot controller, or sensing subsystem can be distributed and dynamically loaded at run-time, which frees developers from struggling with monolithic code-bases. Users of OpenRAVE can concentrate on the development of planning and scripting aspects of a problem without having to explicitly manage the details of robot kinematics and dynamics, collision detection, world updates, and robot control. The OpenRAVE architecture provides a flexible interface that can be used in conjunction with other popular robotics packages such as Player and ROS because it is focused on autonomous motion planning and high-level scripting rather than low-level control and message protocols. OpenRAVE also supports a powerful network scripting environment which makes it simple to control and monitor robots and change execution flow during run-time. One of the key advantages of open component architectures is that they enable the robotics research community to easily share and compare algorithms.

446 citations


Additional excerpts

  • ...Simulation experiments are performed in OpenRAVE (Diankov and Kuffner 2008)....

    [...]

Frequently Asked Questions (2)
Q1. What are the contributions in "A planning framework for non-prehensile manipulation under clutter and uncertainty" ?

The authors introduce a planning framework addressing these two issues. The authors demonstrate their results with experiments in simulation and on HERB, a robotic platform developed at the Personal Robotics Lab at Carnegie Mellon University. 

At any step the authors take into account all uncertainty associated with previously planned actions. In future work, the authors will explore the idea of risk-taking actions as a solution to this problem. In future work the authors plan to use sensor feedback during pushing. The framework the authors present in this paper opens up the possibility to use different non-prehensile manipulation actions as a part of the same planner.