scispace - formally typeset
Open AccessJournal ArticleDOI

Data-Driven Grasp Synthesis—A Survey

TLDR
A review of the work on data-driven grasp synthesis and the methodologies for sampling and ranking candidate grasps and an overview of the different methodologies are provided, which draw a parallel to the classical approaches that rely on analytic formulations.
Abstract
We review the work on data-driven grasp synthesis and the methodologies for sampling and ranking candidate grasps. We divide the approaches into three groups based on whether they synthesize grasps for known, familiar, or unknown objects. This structure allows us to identify common object representations and perceptual processes that facilitate the employed data-driven grasp synthesis technique. In the case of known objects, we concentrate on the approaches that are based on object recognition and pose estimation. In the case of familiar objects, the techniques use some form of a similarity matching to a set of previously encountered objects. Finally, for the approaches dealing with unknown objects, the core part is the extraction of specific features that are indicative of good grasps. Our survey provides an overview of the different methodologies and discusses open problems in the area of robot grasping. We also draw a parallel to the classical approaches that rely on analytic formulations.

read more

Content maybe subject to copyright    Report

Título artículo / Títol article:
Data-driven grasp
synthesis—a survey
Autores / Autors:
Bohg, J. ; Morales, A. ;
Asfour, T. ; Kragic, D.
Revista:
IEEE Transactions on Robotics
Versión / Versió:
Postprint
Cita bibliográfica / Cita
bibliogràfica (ISO 690):
BOHG, Jeannette, et al. Data-
driven grasp synthesis—a
survey. Robotics, IEEE
Transactions on, 2014, vol. 30,
no 2, p. 289-309.
url Repositori UJI:
http://hdl.handle.net/10234/
132550

TRANSACTIONS ON ROBOTICS 1
Data-Driven Grasp Synthesis - A Survey
Jeannette Bohg, Member, IEEE, Antonio Morales, Member, IEEE, Tamim Asfour, Member, IEEE,
Danica Kragic Member, IEEE
Abstract—We review the work on data-driven grasp synthesis
and the methodologies for sampling and ranking candidate
grasps. We divide the approaches into three groups based on
whether they synthesize grasps for known, familiar or unknown
objects. This structure allows us to identify common object rep-
resentations and perceptual processes that facilitate the employed
data-driven grasp synthesis technique. In the case of known
objects, we concentrate on the approaches that are based on
object recognition and pose estimation. In the case of familiar
objects, the techniques use some form of a similarity matching
to a set of previously encountered objects. Finally, for the
approaches dealing with unknown objects, the core part is the
extraction of specific features that are indicative of good grasps.
Our survey provides an overview of the different methodologies
and discusses open problems in the area of robot grasping. We
also draw a parallel to the classical approaches that rely on
analytic formulations.
Index Terms—Object grasping and manipulation, grasp syn-
thesis, grasp planning, visual perception, object recognition and
classification, visual representations
I. INTRODUCTION
Given an object, grasp synthesis refers to the problem of
finding a grasp configuration that satisfies a set of criteria
relevant for the grasping task. Finding a suitable grasp among
the infinite set of candidates is a challenging problem and has
been addressed frequently in the robotics community, resulting
in an abundance of approaches.
In the recent review of Sahbani et al. [1], the authors divide
the methodologies into analytic and empirical. Following Shi-
moga [2], analytic refers to methods that construct force-
closure grasps with a multi-fingered robotic hand that are
dexterous, in equilibrium, stable and exhibit a certain dynamic
behaviour. Grasp synthesis is then usually formulated as a
constrained optimization problem over criteria that measure
one or several of these four properties. In this case, a grasp is
typically defined by the grasp map that transforms the forces
exerted at a set of contact points to object wrenches [3].
The criteria are based on geometric, kinematic or dynamic
formulations. Analytic formulations towards grasp synthesis
have also been reviewed by Bicchi and Kumar [4].
Empirical or data-driven approaches rely on sampling grasp
candidates for an object and ranking them according to a
specific metric. This process is usually based on some existing
J. Bohg is with the Autonomous Motion Department at the MPI for
Intelligent Systems, Tübingen, Germany, e-mail: jbohg@tuebingen.mpg.de.
A. Morales is with the Robotic Intelligence Lab. at Universitat Jaume I,
Castelló, Spain, e-mail: Antonio.Morales@uji.es.
T. Asfour is with the KIT, Karlsruhe, Germany, e-mail: asfour@kit.edu.
D. Kragic is with the Centre for Autonomous Systems, Computational
Vision and Active Perception Lab, Royal Institute fo Technology KTH,
Stockholm, Sweden, e-mail: dank@kth.se.
This work has been supported by FLEXBOT (FP7-ERC-279933).
grasp experience that can be a heuristic or is generated in
simulation or on a real robot. Kamon et al. [5] refer to this
as the comparative and Shimoga [2] as the knowledge-based
approach. Here, a grasp is commonly parameterized by [6, 7]:
the grasping point on the object with which the tool center
point (TCP) should be aligned,
the approach vector which describes the 3D angle that
the robot hand approaches the grasping point with,
the wrist orientation of the robotic hand and
an initial finger configuration
Data-driven approaches differ in how the set of grasp candi-
dates is sampled, how the grasp quality is estimated and how
good grasps are represented for future use. Some methods
measure grasp quality based on analytic formulations, but
more commonly they encode e.g. human demonstrations,
perceptual information or semantics.
A. Brief Overview of Analytic Approaches
Analytic approaches provide guarantees regarding the crite-
ria that measure the previously mentioned four grasp proper-
ties. However, these are usually based on assumptions such as
simplified contact models, Coulomb friction and rigid body
modeling [3, 8]. Although these assumptions render grasp
analysis practical, inconsistencies and ambiguities especially
regarding the analysis of grasp dynamics are usually attributed
to their approximate nature.
In this context, Bicchi and Kumar [4] identified the prob-
lem of finding an accurate and tractable model of contact
compliance as particularly relevant. This is needed to analyze
statically-indeterminate grasps in which not all internal forces
can be controlled. This case arises e.g. for under-actuated
hands or grasp synergies where the number of the controlled
degrees of freedom is fewer than the number of contact forces.
Prattichizzo et al. [9] model such a system by introducing a set
of springs at the contacts and joints and show how its dexterity
can be analyzed. Rosales et al. [10] adopt the same model of
compliance to synthesize feasible and prehensile grasps. In
this case, only statically-determinate grasps are considered.
The problem of finding a suitable hand configuration is cast
as a constrained optimization problem in which compliance is
introduced to simultaneously address the constraints of contact
reachability, object restraint and force controllability. As is
the case with many other analytic approaches towards grasp
synthesis, the proposed model is only studied in simulation
where accurate models of the hand kinematics, the object and
their relative alignment are available.
In practice, systematic and random errors are inherent to a
robotic system and are due to noisy sensors and inaccurate
object, robot, etc. models. The relative position of object

TRANSACTIONS ON ROBOTICS 2
and hand can therefore only be known approximately which
makes an accurate placement of the fingertips difficult. In
2000, Bicchi and Kumar [4] identified a lack of approaches
towards synthesizing grasps that are robust to positioning
errors. Since then, this problem has shifted into focus. One
line of research follows the approach of independent contact
regions (ICRs) as defined by Nguyen [11]: a set of regions on
the object in which each finger can be independently placed
anywhere without the grasp loosing the force-closure property.
Several examples for computing them are presented by Roa
and Suárez [12] or Krug et al. [13]. Another line of research
towards robustness against inaccurate end-effector positioning
makes use of the caging formulation. Rodriguez et al. [14]
found that there are caging configurations of a three-fingered
manipulator around a planar object that are specifically suited
as a way point to grasping it. Once the manipulator is in
such configuration, either opening or closing the fingers is
guaranteed to result in an equilibrium grasp without the need
for accurate positioning of the fingers. Seo et al. [15] exploited
the fact that two-fingered immobilizing grasps of an object are
always preceded by a caging configuration. Full body grasps
of planar objects are synthesized by first finding a two-contact
caging configuration and then using additional contacts to
restrain the object. Results have been presented in simulation
and demonstrated on a real robot.
Another assumption commonly made in analytic approaches
is that precise geometric and physical models of an object are
available to the robot which is not always the case. In addition,
we may not know the surface properties or friction coefficients,
weight, center of mass and weight distribution. Some of these
can be retrieved through interaction: Zhang and Trinkle [16]
propose to use a particle filter to simultaneously estimate the
physical parameters of an object and track it while it is being
pushed. The dynamic model of the object is formulated as a
mixed nonlinear complementarity problem. The authors show
that even when the object is occluded and the state estimate
cannot be updated through visual observation, the motion of
the object is accurately predicted over time. Although methods
like this relax some of the assumptions, they are still limited
to simulation [14, 10] or consider 2D objects [14, 15, 16].
B. Development of Data-Driven Methods
Up to the year 2000, the field of robotic grasping
1
was
clearly dominated by analytic approaches [11, 4, 17, 2]. Apart
from e.g. Kamon et al. [5], data-driven grasp synthesis started
to become popular with the availability of GraspIt! [18] in
2004. Many highly cited approaches have been developed,
analyzed and evaluated in this or other simulators [19, 20, 21,
22, 23, 24]. These approaches differ in how grasp candidates
are sampled from the infinite space of possibilities. For grasp
ranking they rely classical metrics based on analytic formula-
tions such as the widely used -metric proposed in Ferrari and
Canny [17]. It constructs the grasp wrench space (GWS) by
1
Citation counts for the most influential articles in the field. Extracted from
scholar.google.com in October 2013. [11]: 733. [4]: 490. [17]: 477. [2]: 405.
[5]: 77. [18]: 384. [19]: 353. [20]: 100. [21]: 110. [22]: 95. [23]: 96. [24]:
108. [25]: 38. [26]: 156. [27]: 39. [28]: 277. [29]: 75. [30]: 40. [31]: 21. [32]:
43. [33]: 77. [34]: 26. [35]: 191. [36]: 58. [37]: 75. [38]: 39.
computing the convex hull over the wrenches at the contact
points between the hand and the object. ranks the quality of a
force closure grasp by quantifying the radius of the maximum
sphere still fully contained in the GWS.
Developing and evaluating approaches in simulation is
attractive because the environment and its attributes can be
completely controlled. A large number of experiments can
be efficiently performed without having access to expensive
robotics hardware that would also add a lot of complexity to
the evaluation process. However, it is not clear if the simulated
environment resembles the real world well enough to transfer
methods easily. Only recently, several articles [39, 40, 24]
have analyzed this question and come to the conclusion that
the classic metrics are not good predictors for grasp success
in the real world. They do not seem to cope well with the
challenges arising in unstructured environments. Diankov [24]
claims that in practice grasps synthesized using this metric
tend to be relatively fragile. Balasubramanian et al. [39]
systematically tested a number of grasps in the real world that
were stable according to classical grasp metrics. Compared
to grasps planned by humans and transferred to a robot by
kinesthetic teaching on the same objects, they under-performed
significantly. A similar study has been conducted by Weisz and
Allen [40]. It focuses on the ability of the -metric to predict
grasp stability under object pose error. The authors found that
it performs poorly especially when grasping large objects.
As pointed out by Bicchi and Kumar [4] and Prattichizzo
and Trinkle [8], grasp closure is often wrongly equated with
stability. Closure states the existence of equilibrium which is
a necessary but not sufficient condition. Stability can only be
defined when considering the grasp as a dynamical system
and in the context of its behavior when perturbed from
an equilibrium. Seen in this light, the results of the above
mentioned studies are not surprising. However, they suggest
that there is a large gap between reality and the models for
grasping that are currently available and tractable.
For this reason, several researchers [25, 26, 27] proposed
to let the robot learn how to grasp by experience that is
gathered during grasp execution. Although, collecting exam-
ples is extremely time-consuming, the problem of transferring
the learned model to the real robot is non-existant. A crucial
question is how the object to be grasped is represented and
how the experience is generalized to novel objects.
Saxena et al. [28] pushed machine learning approaches for
data-driven grasp synthesis even further. A simple logistic
regressor was trained on large amounts of synthetic labeled
training data to predict good grasping points in a monocular
image. The authors demonstrated their method in a household
scenario in which a robot emptied a dishwasher. None of
the classical principles based on analytic formulations were
used. This paper spawned a lot of research [29, 30, 31, 32]
in which essentially one question is addressed: What are the
object features that are sufficiently discriminative to infer a
suitable grasp configuration?
From 2009, there were further developments in the area of
3D sensing. Projected Texture Stereo was proposed by Kono-
lige [41]. This technology is built into the sensor head of
the PR2 [42], a robot that is available to comparatively many

TRANSACTIONS ON ROBOTICS 3
Grasp Hypotheses
Prior Object
Knowledge
Known
Unknown
Familiar
Grasp
Synthesis
Analytical
Data-Driven
Object
Features
2D
3D
Multi-Modal
Task
Hand
Gripper
Multi-
Fingered
Object-Grasp
Representation
Global
Local
Figure 1: We identified a number of aspects that influence how the final set of grasp hypotheses is generated for an object. The most important one is the assumed prior object
knowledge as discussed in Section I-D. Numerous different object-grasp representations are proposed in the literature that are relying on features of different modalities such as
2D or 3D vision or tactile sensors. Either local object parts or the object as a whole are linked to specific grasp configurations. Grasp synthesis can either be analytic or data-driven.
The latter is further detailed in Fig. 2. Very few approaches explicitly address the task or hand kinematics of the robot.
robotics research labs and running on the OpenSource middle-
ware ROS [43]. In 2010, Microsoft released the Kinect [44], a
highly accurate depth sensing device based on the technology
developed by PrimeSense [45]. Due to its low price and
simple usage, it became a ubiquitous device within the robotics
community. Although the importance of 3D data for grasping
has been previously recognized, many new approaches were
proposed that operate on real world 3D data. They are either
heuristics that map structures in this data to grasp configu-
rations directly [33, 34] or they try to detect and recognize
objects and estimate their pose [35, 46].
Furthermore, we have recently seen an increasing amount of
robots fulfilling very specific tasks such as towel folding [37]
or preparing pancakes [38]. In these scenarios, grasping is
embedded into a sequence of different manipulation actions.
C. Analytic vs. Data-Driven Approaches
Contrary to analytic approaches, methods following the
data-driven paradigm place more weight on the object rep-
resentation and the perceptual processing, e.g., feature extrac-
tion, similarity metrics, object recognition or classification and
pose estimation. The resulting data is then used to retrieve
grasps from some knowledge base or sample and rank them by
comparison to existing grasp experience. The parameterization
of the grasp is less specific (e.g. an approach vector instead
of fingertip positions) and therefore accommodates for uncer-
tainties in perception and execution. This provides a natural
precursor to reactive grasping [47, 48, 49, 33, 50], which,
given a grasp hypothesis, considers the problem of robustly
acquiring it under uncertainty. Data-driven methods cannot
provide guarantees regarding the aforementioned criteria of
dexterity, equilibrium, stability and dynamic behaviour [2].
They can only be verified empirically. However, they form
the basis for studying grasp dynamics and further developing
analytic models that better resemble reality.
D. Classification of Data-Driven Approaches
Sahbani et al. [1] divide the data-driven methods based on
whether they employ object features or observation of humans
during grasping. We believe that this falls short of capturing
the diversity of these approaches especially in terms of the
ability to transfer grasp experience between similar objects
and the role of perception in this process. In this survey, we
propose to group data-driven grasp synthesis approaches based
on what they assume to know a priori about the query object:
Known Objects: These approaches assume that the query
object has been encountered before and that grasps have
already been generated for it. Commonly, the robot has
access to a database containing geometric object models
that are associated with a number of good grasp. This
database is usually built offline and in the following will
be referred to as an experience database. Once the object
has been recognized, the goal is to estimate its pose and
retrieve a suitable grasp.

TRANSACTIONS ON ROBOTICS 4
Familiar Objects: Instead of exact identity, the ap-
proaches in this group assume that the query object is
similar to previously encountered ones. New objects can
be familiar on different levels. Low-level similarity can
be defined in terms of shape, color or texture. High-level
similarity can be defined based on object category. These
approaches assume that new objects similar to old ones
can be grasped in a similar way. The challenge is to
find an object representation and a similarity metric that
allows to transfer grasp experience.
Unknown Objects: Approaches in this group do not
assume to have access to object models or any sort of
grasp experience. They focus on identifying structure or
features in sensory data for generating and ranking grasp
candidates. These are usually based on local or global
features of the object as perceived by the sensor.
We find the above classification suitable for surveying
the data-driven approaches since the assumed prior object
knowledge determines the necessary perceptual processing and
associated object representations for generating and ranking
grasp candidates. For known objects, the problems of recog-
nition and pose estimation have to be addressed. The object is
usually represented by a complete geometric 3D object model.
For familiar objects, an object representation has to be found
that is suitable for comparing them to already encountered
object in terms of graspability. For unknown objects, heuristics
have to be developed for directly linking structure in the
sensory data to candidate grasps.
Only a minority of the approaches discussed in this survey
cannot be clearly classified to belong to one of these three
groups. Most of the included papers use sensor data from the
scene to perform data-driven grasp synthesis and are part of a
real robotic system that can execute grasps.
Finally, this classification is well in line with the research in
the field of neuroscience, specifically, from the theory of the
dorsal and ventral stream in human visual processing [51]. The
dorsal pathway processes immediate action-relevant features
while the ventral pathway extracts context- and scene-relevant
information and is related to object recognition. The visual
processing in the ventral and dorsal pathways can be related
to the grouping of grasp synthesis for familiar/known and
unknown objects, respectively. The details of such links are
out of the scope of this paper. Extensive and detailed reviews
on the neuroscience of grasping are offered in [52, 53, 54].
E. Aspects Influencing the Generation of Grasp Hypotheses
The number of candidate grasps that can be applied to an
object is infinite. To sample some of these candidates and
define a quality metric for selecting a good subset of grasp
hypotheses is the core subject of the approaches reviewed
in this survey. In addition to the prior object knowledge,
we identified a number of other factors that characterize
these metrics. Thereby, they influence which grasp hypotheses
are selected by a method. Fig. 1 shows a mind map that
structures these aspects. An important one is how the quality
of a candidate grasp depends on the object, i.e., the object-
grasp representation. Some approaches extract local object
Data-Driven
Grasp Synthesis
Heuristics
Learning
Human
Demon-
stration
Labeled
Training
Data
Trial &
Error
Figure 2: Data-driven Grasp Synthesis can either be based on heuristics or on learning
from data. The data can either be provided in the form of offline generated labeled
training data, human demonstration or through trial and error.
Object-Grasp
Represen.
Object Features Grasp Synthesis
Local
Global
2D
3D
Multi-Modal
Heuristic
Human Demo
Labeled Data
Trial & Error
Task
Multi-Fingered
Deformable
Real Data
Glover et al. [55]
Goldfeder et al. [21]
Berenson et al. [56]
Miller et al. [19]
Przybylski et al. [57]
Roa et al. [58]
Detry et al. [27]
Detry et al. [59]
Huebner et al. [60]
Faria et al. [61]
Diankov [24]
Balasubramanian et al. [39]
Borst et al. [22]
Brook et al. [62]
Ciocarlie and Allen [23]
Romero et al. [63]
Papazov et al. [64]
Morales et al. [7]
Collet Romea et al. [65]
Kroemer et al. [66]
Ekvall and Kragic [6]
Tegin et al. [67]
Pastor et al. [49]
Stulp et al. [68]
Table I: Data-Driven Approaches for Grasping Known Objects
attributes (e.g. curvature, contact area with the hand) around a
candidate grasp. Other approaches take global characteristics
(e.g. center of mass, bounding box) and their relation to a
grasp configuration into account. Dependent on the sensor
device, object features can be based on 2D or 3D visual data
as well as on other modalities. Furthermore, grasp synthesis
can be analytic or data-driven. We further categorized the latter
in Fig. 2: there are methods for learning either from human
demonstrations, labeled examples or trial and error. Other
methods rely on various heuristics to directly link structure
in sensory data to candidate grasps. There is relatively little
work on task-dependent grasping. Also, the applied robotic
hand is usually not in the focus of the discussed approaches.
We will therefore not examine these two aspects. However, we
will indicate whether an approach takes the task into account
and whether an approach is developed for a gripper or for the
more complex case of a multi-fingered hand. Table I-III list
all the methods in this survey. The table columns follow the
structure proposed in Fig. 1 and 2.
II. GRASPING KNOWN OBJECTS
If the object to be grasped is known and there is already a
database of grasp hypotheses for it, the problem of finding a
feasible grasp reduces to estimating the object pose and then
filtering the hypotheses by reachability. Table I summarizes all
the approaches discussed in this section.

Figures
Citations
More filters
Journal ArticleDOI

Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection

TL;DR: The approach achieves effective real-time control, can successfully grasp novel objects, and corrects mistakes by continuous servoing, and illustrates that data from different robots can be combined to learn more reliable and effective grasping.
Proceedings ArticleDOI

Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours

TL;DR: This paper takes the leap of increasing the available training data to 40 times more than prior work, leading to a dataset size of 50K data points collected over 700 hours of robot grasping attempts, which allows us to train a Convolutional Neural Network for the task of predicting grasp locations without severe overfitting.
Journal ArticleDOI

Deep learning for detecting robotic grasps

TL;DR: This work presents a two-step cascaded system with two deep networks, where the top detections from the first are re-evaluated by the second, and shows that this method improves performance on an RGBD robotic grasping dataset, and can be used to successfully execute grasps on two different robotic platforms.
Proceedings Article

QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation

TL;DR: QT-Opt as mentioned in this paper is a scalable self-supervised vision-based reinforcement learning framework that can leverage over 580k real-world grasp attempts to train a deep neural network Q-function with over 1.2M parameters.
Proceedings Article

Deep Learning for Detecting Robotic Grasps

TL;DR: In this paper, a two-step cascaded system with two deep networks is proposed to detect robotic grasps in an RGB-D view of a scene containing objects, where the top detections from the first are re-evaluated by the second.
References
More filters
Proceedings ArticleDOI

Rapid object detection using a boosted cascade of simple features

TL;DR: A machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates and the introduction of a new image representation called the "integral image" which allows the features used by the detector to be computed very quickly.
Journal ArticleDOI

A method for registration of 3-D shapes

TL;DR: In this paper, the authors describe a general-purpose representation-independent method for the accurate and computationally efficient registration of 3D shapes including free-form curves and surfaces, based on the iterative closest point (ICP) algorithm, which requires only a procedure to find the closest point on a geometric entity to a given point.
Proceedings ArticleDOI

Object recognition from local scale-invariant features

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Proceedings ArticleDOI

Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

TL;DR: This paper presents a method for recognizing scene categories based on approximate global geometric correspondence that exceeds the state of the art on the Caltech-101 database and achieves high accuracy on a large database of fifteen natural scene categories.
Journal ArticleDOI

Shape matching and object recognition using shape contexts

TL;DR: This paper presents work on computing shape models that are computationally fast and invariant basic transformations like translation, scaling and rotation, and proposes shape detection using a feature called shape context, which is descriptive of the shape of the object.
Related Papers (5)