Multimodal blending for high-accuracy instance recognition
Ziang Xie,Arjun Singh,Justin Uang,Karthik S. Narayan,Pieter Abbeel +4 more
- pp 2214-2221
TLDR
This work examines the benefits of dense feature extraction and multimodal features for improving the accuracy and robustness of an instance recognition system and obtains significant improvements over previously published results on two RGB-D datasets.Abstract:
Despite the rich information provided by sensors such as the Microsoft Kinect in the robotic perception setting, the problem of detecting object instances remains unsolved, even in the tabletop setting, where segmentation is greatly simplified. Existing object detection systems often focus on textured objects, for which local feature descriptors can be used to reliably obtain correspondences between different views of the same object. We examine the benefits of dense feature extraction and multimodal features for improving the accuracy and robustness of an instance recognition system. By combining multiple modalities and blending their scores through an ensemble-based method in order to generate our final object hypotheses, we obtain significant improvements over previously published results on two RGB-D datasets. On the Challenge dataset, our method results in only one missed detection (achieving 100% precision and 99.77% recall). On the Willow dataset, we also make significant gains on the prior state of the art (achieving 98.28% precision and 87.78% recall), resulting in an increase in F-score from 0.8092 to 0.9273.read more
Citations
More filters
Proceedings ArticleDOI
Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics
Jeffrey Mahler,Jacky Liang,Sherdil Niyaz,Michael Laskey,Richard Doan,Xinyu Liu,Juan Aparicio,Ken Goldberg +7 more
TL;DR: Experiments with over 1,000 trials on an ABB YuMi comparing grasp planning methods on singulated objects suggest that a GQ-CNN trained with only synthetic data from Dex-Net 2.0 can be used to plan grasps in 0.8sec with a success rate of 93% on eight known objects with adversarial geometry.
Posted Content
Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics
Jeffrey Mahler,Jacky Liang,Sherdil Niyaz,Michael Laskey,Richard Doan,Xinyu Liu,Juan Aparicio Ojea,Ken Goldberg +7 more
TL;DR: In this article, a grasp quality convolutional neural network (GQ-CNN) is trained from a synthetic dataset of 6.7 million point clouds, grasps and analytic grasp metrics generated from thousands of 3D models from Dex-Net 1.0 in randomized poses on a table.
Proceedings ArticleDOI
6-DoF object pose from semantic keypoints
TL;DR: In this paper, the authors combine semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model to estimate the continuous 6-DoF pose of an object from a single RGB image.
Proceedings ArticleDOI
BigBIRD: A large-scale 3D database of object instances
TL;DR: A high-quality, large-scale dataset of 3D object instances, with accurate calibration information for every image, is presented, anticipating that “solving” this dataset will effectively remove many perception-related problems for mobile, sensing-based robots.
Proceedings ArticleDOI
T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects
Tomas Hodan,Pavel Haluza,Stepan Obdrzalek,Jiri Matas,Manolis I. A. Lourakis,Xenophon Zabulis +5 more
TL;DR: T-LESS as discussed by the authors is a dataset for estimating the 6D pose of texture-less rigid objects with no significant texture and no discriminative color or reflectance properties, but some of the objects are parts of others.
References
More filters
Journal ArticleDOI
Distinctive Image Features from Scale-Invariant Keypoints
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Proceedings ArticleDOI
Histograms of oriented gradients for human detection
Navneet Dalal,Bill Triggs +1 more
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Journal ArticleDOI
Bagging predictors
TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.
Journal ArticleDOI
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting
Yoav Freund,Robert E. Schapire +1 more
TL;DR: The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems.
Distinctive Image Features from Scale-Invariant Keypoints
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.