scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Vision for robotic object manipulation in domestic settings

31 Jul 2005-Robotics and Autonomous Systems (North-Holland)-Vol. 52, Iss: 1, pp 85-100
TL;DR: A vision system for robotic object manipulation tasks in natural, domestic environments and one important property is that the step from object recognition to pose estimation is completely automatic combining both appearance and geometric models.
About: This article is published in Robotics and Autonomous Systems.The article was published on 2005-07-31. It has received 118 citations till now. The article focuses on the topics: Pose & 3D single-object recognition.
Citations
More filters
12 Nov 2015
TL;DR: In this article, a Deep Q Network (DQNets) was used to learn target reaching with a three-joint robot manipulator using external visual observation, which was demonstrated to perform target reaching after training in simulation.
Abstract: This paper introduces a machine learning based system for controlling a robotic manipulator with visual perception only. The capability to autonomously learn robot controllers solely from raw-pixel images and without any prior knowledge of configuration is shown for the first time. We build upon the success of recent deep reinforcement learning and develop a system for learning target reaching with a three-joint robot manipulator using external visual observation. A Deep Q Network (DQN) was demonstrated to perform target reaching after training in simulation. Transferring the network to real hardware and real observation in a naive approach failed, but experiments show that the network works when replacing camera images with synthetic images.

156 citations

Journal ArticleDOI
TL;DR: The results show that a combination of a descriptor based on shape context with a non-linear classification algorithm leads to a stable detection of grasping points for a variety of objects.

152 citations


Cites background from "Vision for robotic object manipulat..."

  • ...Although the problem is still unsolved for general scenes, we have demonstrated in our previous work how simple assumptions about the environment help in segmentation of table-top scenes, [46, 47, 48]....

    [...]

Posted Content
TL;DR: In this paper, a Deep Q Network (DQNets) was used to learn target reaching with a three-joint robot manipulator using external visual observation, which was demonstrated to perform target reaching after training in simulation.
Abstract: This paper introduces a machine learning based system for controlling a robotic manipulator with visual perception only. The capability to autonomously learn robot controllers solely from raw-pixel images and without any prior knowledge of configuration is shown for the first time. We build upon the success of recent deep reinforcement learning and develop a system for learning target reaching with a three-joint robot manipulator using external visual observation. A Deep Q Network (DQN) was demonstrated to perform target reaching after training in simulation. Transferring the network to real hardware and real observation in a naive approach failed, but experiments show that the network works when replacing camera images with synthetic images.

151 citations

Journal ArticleDOI
TL;DR: In this paper, the surface plasmon absorption maxima for bimetallic nanoparticles changes linearly with increasing Au mole ratio content in various alloy compositions, and the transmission electron microscopy (TEM) showed formation of particles of 5-50nm diameter.

135 citations

Journal ArticleDOI
TL;DR: A novel framework for a data driven grasp planner that indexes partial sensor data into a database of 3D models with known grasps and transfersgrasps from those models to novel objects is proposed.
Abstract: This thesis introduces a new framework for data-driven grasping. We assume, with neuropsychological justification, that human grasping is fundamentally example based rather than rule based. This means that planning grasps for a novel object will usually reduce to identifying it as similar to known objects with known grasp affordances. Our framework is intended to allow robots to mimic this approach. For robots to succeed in the real world, it is essential that they be able to grasp objects based on realistically available sensor data. However, most existing grasp planners assume that the robot has access to the full 3D geometry of all objects to be grasped, which is unscalable, or abandon 3D geometry entirely to plan grasps based on appearance, which is difficult to extend to dexterous hands. The core advantage of our data-driven framework is that it naturally allows grasps to be planned for partially sensed objects. We accomplish this by using the partial sensor data to find similar 3D models, which can be used as proxy geometries for grasp planning. Along with the framework, we present a new set of shape descriptors suitable for matching partial sensor data to similar - but not identical - 3D models. This is in contrast to most previous descriptors for partial matching, which tend to rely on local feature correspondences that will often not exist in our problem setting. In a similar vein we also present new algorithms for aligning the pose and scale of partial sensor data to the best matching models, where no local correspondences may be assumed to exist. Our grasp planner makes use of a grasp database, consisting of example grasps for a large number of 3D models. As no such database has previously existed, this thesis introduces the Columbia Grasp Database, a freely available collection of hundreds of thousands of grasps for thousands of 3D models using a variety of robotic hands. To construct this database we modified the Eigengrasp grasp planner, which uses a low dimensional control space to simplify the grasp search space. We also discuss some limitations of this planner and show how they can be addressed by model decomposition. Our use of a database of 3D models annotated with precomputed grasps suggests the possibility of annotating the models with other forms of information as well. With this in mind, we show how to leverage noisy user data downloaded from the internet to suggest likely text tags for previously unlabeled 3D models. Although this work has not yet been applied to the problem of grasp planning, we demonstrate a content-based 3D model search engine that implements our automatic labeling algorithm. The title of this thesis, “Data-Driven Grasping,” represents a vision of robotic grasping that is larger than our particular implementation. Instead, the major contribution of this thesis is bridging the worlds of robotics and 3D model similarity search. Content based search is an important area of research in its own right, but the combination of content based search and robotics is particularly exciting because it opens up the possibility of using the entire internet as a knowledge base for intelligent robots.

120 citations


Cites background from "Vision for robotic object manipulat..."

  • ...The authors of [Kragic et al., 2005] proposed reverting to simple grasping heuristics for unknown objects since it is “likely to be that the shape of an object has to be determined in order to successfully grasp it....

    [...]

  • ..., 2009] and [Kragic et al., 2005] which can find exactly known objects in the presence of strong occlusion....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: New results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form that provide the basis for an automatic system that can solve the Location Determination Problem under difficult viewing.
Abstract: A new paradigm, Random Sample Consensus (RANSAC), for fitting a model to experimental data is introduced. RANSAC is capable of interpreting/smoothing data containing a significant percentage of gross errors, and is thus ideally suited for applications in automated image analysis where interpretation is based on the data provided by error-prone feature detectors. A major portion of this paper describes the application of RANSAC to the Location Determination Problem (LDP): Given an image depicting a set of landmarks with known locations, determine that point in space from which the image was obtained. In response to a RANSAC requirement, new results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form. These results provide the basis for an automatic system that can solve the LDP under difficult viewing

23,396 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations


"Vision for robotic object manipulat..." refers methods or result in this paper

  • ...Since the observed matching scores did not significantly differ from those already published in Lowe [ 24 ] and Mikolajczyk and Schmid [30] we have chosen not to include any additional quantitative results....

    [...]

  • ...Two recognition modules are available for this purpose: (i) a feature based module based on Scale Invariant Feature Transform (SIFT) features Lowe [ 24 ], and (ii) an appearance based module using color histograms, Ekvall et al. [25]....

    [...]

  • ...For a more thorough analysis on the SIFT recognition performance we refer to Lowe [ 24 ]....

    [...]

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: In this article, a visual attention system inspired by the behavior and the neuronal architecture of the early primate visual system is presented, where multiscale image features are combined into a single topographical saliency map.
Abstract: A visual attention system, inspired by the behavior and the neuronal architecture of the early primate visual system, is presented. Multiscale image features are combined into a single topographical saliency map. A dynamical neural network then selects attended locations in order of decreasing saliency. The system breaks down the complex problem of scene understanding by rapidly selecting, in a computationally efficient manner, conspicuous locations to be analyzed in detail.

10,525 citations

Journal ArticleDOI
01 Oct 1996
TL;DR: This article provides a tutorial introduction to visual servo control of robotic manipulators by reviewing the prerequisite topics from robotics and computer vision, including a brief review of coordinate transformations, velocity representation, and a description of the geometric aspects of the image formation process.
Abstract: This article provides a tutorial introduction to visual servo control of robotic manipulators. Since the topic spans many disciplines our goal is limited to providing a basic conceptual framework. We begin by reviewing the prerequisite topics from robotics and computer vision, including a brief review of coordinate transformations, velocity representation, and a description of the geometric aspects of the image formation process. We then present a taxonomy of visual servo control systems. The two major classes of systems, position-based and image-based systems, are then discussed in detail. Since any visual servo system must be capable of tracking image features in a sequence of images, we also include an overview of feature-based and correlation-based methods for tracking. We conclude the tutorial with a number of observations on the current directions of the research field of visual servo control.

3,619 citations


"Vision for robotic object manipulat..." refers background in this paper

  • ...Our current research considers the roblem of mobile manipulation in domestic settings here, in order for the robot to be able to detect and anipulate objects in the environment, robust visual eedback is of key importance....

    [...]