Multimodal blending for high-accuracy instance recognition

doi:10.1109/IROS.2013.6696666

Open AccessProceedings ArticleDOI

Multimodal blending for high-accuracy instance recognition

- pp 2214-2221

TLDR

This work examines the benefits of dense feature extraction and multimodal features for improving the accuracy and robustness of an instance recognition system and obtains significant improvements over previously published results on two RGB-D datasets.

Abstract:

Despite the rich information provided by sensors such as the Microsoft Kinect in the robotic perception setting, the problem of detecting object instances remains unsolved, even in the tabletop setting, where segmentation is greatly simplified. Existing object detection systems often focus on textured objects, for which local feature descriptors can be used to reliably obtain correspondences between different views of the same object. We examine the benefits of dense feature extraction and multimodal features for improving the accuracy and robustness of an instance recognition system. By combining multiple modalities and blending their scores through an ensemble-based method in order to generate our final object hypotheses, we obtain significant improvements over previously published results on two RGB-D datasets. On the Challenge dataset, our method results in only one missed detection (achieving 100% precision and 99.77% recall). On the Willow dataset, we also make significant gains on the prior state of the art (achieving 98.28% precision and 87.78% recall), resulting in an increase in F-score from 0.8092 to 0.9273.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics

Jeffrey Mahler, +7 more

TL;DR: Experiments with over 1,000 trials on an ABB YuMi comparing grasp planning methods on singulated objects suggest that a GQ-CNN trained with only synthetic data from Dex-Net 2.0 can be used to plan grasps in 0.8sec with a success rate of 93% on eight known objects with adversarial geometry.

...read moreread less

Posted Content

Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics

Jeffrey Mahler, +7 more

- 27 Mar 2017 -

arXiv: Robotics

TL;DR: In this article, a grasp quality convolutional neural network (GQ-CNN) is trained from a synthetic dataset of 6.7 million point clouds, grasps and analytic grasp metrics generated from thousands of 3D models from Dex-Net 1.0 in randomized poses on a table.

...read moreread less

Proceedings ArticleDOI

6-DoF object pose from semantic keypoints

Georgios Pavlakos, +4 more

TL;DR: In this paper, the authors combine semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model to estimate the continuous 6-DoF pose of an object from a single RGB image.

...read moreread less

Proceedings ArticleDOI

BigBIRD: A large-scale 3D database of object instances

Arjun Singh, +4 more

TL;DR: A high-quality, large-scale dataset of 3D object instances, with accurate calibration information for every image, is presented, anticipating that “solving” this dataset will effectively remove many perception-related problems for mobile, sensing-based robots.

...read moreread less

Proceedings ArticleDOI

T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects

Tomas Hodan, +5 more

TL;DR: T-LESS as discussed by the authors is a dataset for estimating the 6D pose of texture-less rigid objects with no significant texture and no discriminative color or reflectance properties, but some of the objects are parts of others.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Distinctive Image Features from Scale-Invariant Keypoints

David G. Lowe

- 01 Nov 2004 -

International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Proceedings ArticleDOI

Histograms of oriented gradients for human detection

Navneet Dalal, +1 more

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.

...read moreread less

Journal ArticleDOI

Bagging predictors

Leo Breiman

TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.

...read moreread less

Journal ArticleDOI

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

Yoav Freund, +1 more

TL;DR: The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems.

...read moreread less

Distinctive Image Features from Scale-Invariant Keypoints

Matthijs Dorst

TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.

...read moreread less