scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Yale-CMU-Berkeley dataset for robotic manipulation research:

TL;DR: An image and model dataset of the real-life objects from the Yale-CMU-Berkeley Object Set, which is specifically designed for benchmarking in manipulation research, is presented.
Abstract: In this paper, we present an image and model dataset of the real-life objects from the Yale-CMU-Berkeley Object Set, which is specifically designed for benchmarking in manipulation research. For ea...
Citations
More filters
Proceedings ArticleDOI
01 Jan 2019
TL;DR: This paper introduces ScanObjectNN, a new real-world point cloud object dataset based on scanned indoor scene data, and proposes new point cloud classification neural networks that achieve state-of-the-art performance on classifying objects with cluttered background.
Abstract: Deep learning techniques for point cloud data have demonstrated great potentials in solving classical problems in 3D computer vision such as 3D object classification and segmentation. Several recent 3D object classification methods have reported state-of-the-art performance on CAD model datasets such as ModelNet40 with high accuracy (~92\%). Despite such impressive results, in this paper, we argue that object classification is still a challenging task when objects are framed with real-world settings. To prove this, we introduce ScanObjectNN, a new real-world point cloud object dataset based on scanned indoor scene data. From our comprehensive benchmark, we show that our dataset poses great challenges to existing point cloud classification techniques as objects from real-world scans are often cluttered with background and/or are partial due to occlusions. We identify three key open problems for point cloud object classification, and propose new point cloud classification neural networks that achieve state-of-the-art performance on classifying objects with cluttered background. Our dataset and code are publicly available in our project page https://hkust-vgd.github.io/scanobjectnn/.

413 citations


Cites background from "Yale-CMU-Berkeley dataset for robot..."

  • ...Some datasets [36, 5] are captured in controlled environment which might greatly differ from real-world scenes....

    [...]

  • ...Prior to our work, there are also a few datasets of realworld object scans [10, 8, 5] but most are small in scale and are not suitable for training object classification networks, which often have thousands of parameters....

    [...]

Proceedings ArticleDOI
16 Jun 2019
TL;DR: This paper introduces a segmentation-driven 6D pose estimation framework where each visible part of the objects contributes a local pose prediction in the form of 2D keypoint locations and uses a predicted measure of confidence to combine these pose candidates into a robust set of 3D-to-2D correspondences.
Abstract: The most recent trend in estimating the 6D pose of rigid objects has been to train deep networks to either directly regress the pose from the image or to predict the 2D locations of 3D keypoints, from which the pose can be obtained using a PnP algorithm. In both cases, the object is treated as a global entity, and a single pose estimate is computed. As a consequence, the resulting techniques can be vulnerable to large occlusions. In this paper, we introduce a segmentation-driven 6D pose estimation framework where each visible part of the objects contributes a local pose prediction in the form of 2D keypoint locations. We then use a predicted measure of confidence to combine these pose candidates into a robust set of 3D-to-2D correspondences, from which a reliable pose estimate can be obtained. We outperform the state-of-the-art on the challenging Occluded-LINEMOD and YCB-Video datasets, which is evidence that our approach deals well with multiple poorly-textured objects occluding each other. Furthermore, it relies on a simple enough architecture to achieve real-time performance.

253 citations


Cites background from "Yale-CMU-Berkeley dataset for robot..."

  • ...It comprises 21 objects taken from the YCB dataset [5, 4], which are of diverse sizes and with different degrees of texture....

    [...]

Book ChapterDOI
08 Sep 2018
TL;DR: A novel method for robust and accurate 3D object pose estimation from a single color image under large occlusions by predicting heatmaps from multiple small patches independently and then computing the 3D pose from these correspondences using a geometric method.
Abstract: We introduce a novel method for robust and accurate 3D object pose estimation from a single color image under large occlusions. Following recent approaches, we first predict the 2D projections of 3D points related to the target object and then compute the 3D pose from these correspondences using a geometric method. Unfortunately, as the results of our experiments show, predicting these 2D projections using a regular CNN or a Convolutional Pose Machine is highly sensitive to partial occlusions, even when these methods are trained with partially occluded examples. Our solution is to predict heatmaps from multiple small patches independently and to accumulate the results to obtain accurate and robust predictions. Training subsequently becomes challenging because patches with similar appearances but different positions on the object correspond to different heatmaps. However, we provide a simple yet effective solution to deal with such ambiguities. We show that our approach outperforms existing methods on two challenging datasets: The Occluded LineMOD dataset and the YCB-Video dataset, both exhibiting cluttered scenes with highly occluded objects.

244 citations


Cites background from "Yale-CMU-Berkeley dataset for robot..."

  • ...There are 21 objects in the dataset, which are taken from the YCB dataset [35] and are publicly available for purchase....

    [...]

Proceedings ArticleDOI
01 Oct 2018
TL;DR: MaskFusion as discussed by the authors is a real-time object-aware, semantic and dynamic RGB-D SLAM system that goes beyond traditional systems which output a purely geometric map of a static scene.
Abstract: We present MaskFusion, a real-time, object-aware, semantic and dynamic RGB-D SLAM system that goes beyond traditional systems which output a purely geometric map of a static scene. MaskFusion recognizes, segments and assigns semantic class labels to different objects in the scene, while tracking and reconstructing them even when they move independently from the camera. As an RGB-D camera scans a cluttered scene, image-based instance-level semantic segmentation creates semantic object masks that enable realtime object recognition and the creation of an object-level representation for the world map. Unlike previous recognition-based SLAM systems, MaskFusion does not require known models of the objects it can recognize, and can deal with multiple independent motions. MaskFusion takes full advantage of using instance-level semantic segmentation to enable semantic labels to be fused into an object-aware map, unlike recent semantics enabled SLAM systems that perform voxel-level semantic segmentation. We show augmented-reality applications that demonstrate the unique features of the map output by MaskFusion: instance-aware, semantic and dynamic. Code will be made available.

234 citations

Proceedings ArticleDOI
14 Jun 2020
TL;DR: This work contributes a large-scale grasp pose detection dataset with a unified evaluation system and proposes an end-to-end grasp pose prediction network given point cloud inputs, where the network learns approaching direction and operation parameters in a decoupled manner.
Abstract: Object grasping is critical for many applications, which is also a challenging computer vision problem. However, for cluttered scene, current researches suffer from the problems of insufficient training data and the lacking of evaluation benchmarks. In this work, we contribute a large-scale grasp pose detection dataset with a unified evaluation system. Our dataset contains 97,280 RGB-D image with over one billion grasp poses. Meanwhile, our evaluation system directly reports whether a grasping is successful by analytic computation, which is able to evaluate any kind of grasp poses without exhaustively labeling ground-truth. In addition, we propose an end-to-end grasp pose prediction network given point cloud inputs, where we learn approaching direction and operation parameters in a decoupled manner. A novel grasp affinity field is also designed to improve the grasping robustness. We conduct extensive experiments to show that our dataset and evaluation system can align well with real-world experiments and our proposed network achieves the state-of-the-art performance. Our dataset, source code and models are publicly available at www.graspnet.net.

234 citations


Cites methods from "Yale-CMU-Berkeley dataset for robot..."

  • ...We select 32 objects that are suitable for grasping from the YCB dataset [6], 13 adversarial objects from DexNet 2....

    [...]

References
More filters
Proceedings Article
01 Jan 2009
TL;DR: This paper discusses how ROS relates to existing robot software frameworks, and briefly overview some of the available application software which uses ROS.
Abstract: This paper gives an overview of ROS, an opensource robot operating system. ROS is not an operating system in the traditional sense of process management and scheduling; rather, it provides a structured communications layer above the host operating systems of a heterogenous compute cluster. In this paper, we discuss how ROS relates to existing robot software frameworks, and briefly overview some of the available application software which uses ROS.

8,387 citations


"Yale-CMU-Berkeley dataset for robot..." refers methods in this paper

  • ...Together with the dataset, Python scripts and a Robot Operating System node are provided to download the data, generate point clouds and create Unified Robot Description Files....

    [...]

  • ...In addition, a Robot Operating System (ROS; Quigley et al., 2009) node is available at YCB-Benchmarks (2016a) to manage the data and generate Unified Robot Description Files (URDFs) of the mesh models for easy integration to software platforms such as Gazebo (Koenig and Howard, 2004) and MoveIt (Chitta et al....

    [...]

  • ...In addition, a Robot Operating System (ROS; Quigley et al., 2009) node is available at YCB-Benchmarks (2016a) to manage the data and generate Unified Robot Description Files (URDFs) of the mesh models for easy integration to software platforms such as Gazebo (Koenig and Howard, 2004) and MoveIt…...

    [...]

  • ...In addition, a Robot Operating System (ROS; Quigley et al., 2009) node is available at YCB-Benchmarks (2016a) to manage the data and generate Unified Robot Description Files (URDFs) of the mesh models for easy integration to software platforms such as Gazebo (Koenig and Howard, 2004) and MoveIt (Chitta et al., 2012)....

    [...]

Proceedings ArticleDOI
01 Aug 1996
TL;DR: This paper presents a volumetric method for integrating range images that is able to integrate a large number of range images yielding seamless, high-detail models of up to 2.6 million triangles.
Abstract: A number of techniques have been developed for reconstructing surfaces by integrating groups of aligned range images. A desirable set of properties for such algorithms includes: incremental updating, representation of directional uncertainty, the ability to fill gaps in the reconstruction, and robustness in the presence of outliers. Prior algorithms possess subsets of these properties. In this paper, we present a volumetric method for integrating range images that possesses all of these properties. Our volumetric representation consists of a cumulative weighted signed distance function. Working with one range image at a time, we first scan-convert it to a distance function, then combine this with the data already acquired using a simple additive scheme. To achieve space efficiency, we employ a run-length encoding of the volume. To achieve time efficiency, we resample the range image to align with the voxel grid and traverse the range and voxel scanlines synchronously. We generate the final manifold by extracting an isosurface from the volumetric grid. We show that under certain assumptions, this isosurface is optimal in the least squares sense. To fill gaps in the model, we tessellate over the boundaries between regions seen to be empty and regions never observed. Using this method, we are able to integrate a large number of range images (as many as 70) yielding seamless, high-detail models of up to 2.6 million triangles.

3,282 citations


"Yale-CMU-Berkeley dataset for robot..." refers methods in this paper

  • ...While the Poisson method provides watertight meshes, the TSDF models are not guaranteed to be watertight....

    [...]

  • ...Note that both the Poisson and TSDF methods fail on objects with missing depth data due to the transparent or reflective regions: for objects 22, 30, 31, 32, 38, 39, 42, 43 and 44, the mesh models are partially distorted and for objects 23 and 28 no meaningful model could be generated with the adopted methods....

    [...]

  • ...Two kinds of textured mesh models are generated using these data by utilizing Poisson reconstruction (Kazhdan et al., 2006) and Truncated Signed Distance Function (TSDF) (Curless and Levoy, 1996) techniques....

    [...]

  • ...The ‘berkeley_processed’ file contains the following: s a point cloud in .ply extension obtained by merging the data acquired from all the viewpoints; s Poisson meshes; s TSDF meshes....

    [...]

  • ...Two sets of textured mesh models are obtained using Poisson reconstruction (Kazhdan et al., 2006) and TSDF (Bylow et al., 2013) methods....

    [...]

Proceedings ArticleDOI
01 Sep 2004
TL;DR: Gazebo is designed to fill this niche by creating a 3D dynamic multi-robot environment capable of recreating the complex worlds that would be encountered by the next generation of mobile robots.
Abstract: Simulators have played a critical role in robotics research as tools for quick and efficient testing of new concepts, strategies, and algorithms. To date, most simulators have been restricted to 2D worlds, and few have matured to the point where they are both highly capable and easily adaptable. Gazebo is designed to fill this niche by creating a 3D dynamic multi-robot environment capable of recreating the complex worlds that would be encountered by the next generation of mobile robots. Its open source status, fine grained control, and high fidelity place Gazebo in a unique position to become more than just a stepping stone between the drawing board and real hardware: data visualization, simulation of remote environments, and even reverse engineering of blackbox systems are all possible applications. Gazebo is developed in cooperation with the Player and Stage projects (Gerkey, B. P., et al., July 2003), (Gerkey, B. P., et al., May 2001), (Vaughan, R. T., et al., Oct. 2003), and is available from http://playerstage.sourceforge.net/gazebo/ gazebo.html.

2,824 citations


"Yale-CMU-Berkeley dataset for robot..." refers methods in this paper

  • ..., 2009) node is available at YCB-Benchmarks (2016a) to manage the data and generate Unified Robot Description Files (URDFs) of the mesh models for easy integration to software platforms such as Gazebo (Koenig and Howard, 2004) and MoveIt (Chitta et al....

    [...]

  • ...…Operating System (ROS; Quigley et al., 2009) node is available at YCB-Benchmarks (2016a) to manage the data and generate Unified Robot Description Files (URDFs) of the mesh models for easy integration to software platforms such as Gazebo (Koenig and Howard, 2004) and MoveIt (Chitta et al., 2012)....

    [...]

  • ...In addition, a Robot Operating System (ROS; Quigley et al., 2009) node is available at YCB-Benchmarks (2016a) to manage the data and generate Unified Robot Description Files (URDFs) of the mesh models for easy integration to software platforms such as Gazebo (Koenig and Howard, 2004) and MoveIt (Chitta et al., 2012)....

    [...]

Proceedings ArticleDOI
26 Jun 2006
TL;DR: A spatially adaptive multiscale algorithm whose time and space complexities are proportional to the size of the reconstructed model, and which reduces to a well conditioned sparse linear system.
Abstract: We show that surface reconstruction from oriented points can be cast as a spatial Poisson problem. This Poisson formulation considers all the points at once, without resorting to heuristic spatial partitioning or blending, and is therefore highly resilient to data noise. Unlike radial basis function schemes, our Poisson approach allows a hierarchy of locally supported basis functions, and therefore the solution reduces to a well conditioned sparse linear system. We describe a spatially adaptive multiscale algorithm whose time and space complexities are proportional to the size of the reconstructed model. Experimenting with publicly available scan data, we demonstrate reconstruction of surfaces with greater detail than previously achievable.

2,712 citations


"Yale-CMU-Berkeley dataset for robot..." refers methods in this paper

  • ...Two kinds of textured mesh models are generated using these data by utilizing Poisson reconstruction (Kazhdan et al., 2006) and Truncated Signed Distance Function (TSDF) (Curless and Levoy, 1996) techniques....

    [...]

  • ...Two sets of textured mesh models are obtained using Poisson reconstruction (Kazhdan et al., 2006) and TSDF (Bylow et al....

    [...]

  • ...Two sets of textured mesh models are obtained using Poisson reconstruction (Kazhdan et al., 2006) and TSDF (Bylow et al., 2013) methods....

    [...]

Proceedings ArticleDOI
18 Jun 2003
TL;DR: A method for acquiring high-complexity stereo image pairs with pixel-accurate correspondence information using structured light that does not require the calibration of the light sources and yields registered disparity maps between all pairs of cameras and illumination projectors.
Abstract: Progress in stereo algorithm performance is quickly outpacing the ability of existing stereo data sets to discriminate among the best-performing algorithms, motivating the need for more challenging scenes with accurate ground truth information. This paper describes a method for acquiring high-complexity stereo image pairs with pixel-accurate correspondence information using structured light. Unlike traditional range-sensing approaches, our method does not require the calibration of the light sources and yields registered disparity maps between all pairs of cameras and illumination projectors. We present new stereo data sets acquired with our method and demonstrate their suitability for stereo algorithm evaluation. Our results are available at http://www.middlebury.edu/stereo/.

1,840 citations


"Yale-CMU-Berkeley dataset for robot..." refers background in this paper

  • ...Each scanhead is a custom structured light (Scharstein and Szeliski, 2003) capture unit, consisting of a consumer digital light processing (DLP) projector, two monochrome cameras in a stereo pair and a color camera to capture fine detail texture/color information (in total three cameras per scanhead)....

    [...]

  • ...Each scanhead is a custom structured light (Scharstein and Szeliski, 2003) capture unit, consisting of a consumer digital light processing (DLP) projector, two monochrome cameras in a stereo pair and a color camera to capture fine detail texture/color information (in total three cameras per…...

    [...]