scispace - formally typeset
Open AccessProceedings ArticleDOI

Multimodal blending for high-accuracy instance recognition

TLDR
This work examines the benefits of dense feature extraction and multimodal features for improving the accuracy and robustness of an instance recognition system and obtains significant improvements over previously published results on two RGB-D datasets.
Abstract
Despite the rich information provided by sensors such as the Microsoft Kinect in the robotic perception setting, the problem of detecting object instances remains unsolved, even in the tabletop setting, where segmentation is greatly simplified. Existing object detection systems often focus on textured objects, for which local feature descriptors can be used to reliably obtain correspondences between different views of the same object. We examine the benefits of dense feature extraction and multimodal features for improving the accuracy and robustness of an instance recognition system. By combining multiple modalities and blending their scores through an ensemble-based method in order to generate our final object hypotheses, we obtain significant improvements over previously published results on two RGB-D datasets. On the Challenge dataset, our method results in only one missed detection (achieving 100% precision and 99.77% recall). On the Willow dataset, we also make significant gains on the prior state of the art (achieving 98.28% precision and 87.78% recall), resulting in an increase in F-score from 0.8092 to 0.9273.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics

TL;DR: Experiments with over 1,000 trials on an ABB YuMi comparing grasp planning methods on singulated objects suggest that a GQ-CNN trained with only synthetic data from Dex-Net 2.0 can be used to plan grasps in 0.8sec with a success rate of 93% on eight known objects with adversarial geometry.
Posted Content

Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics

TL;DR: In this article, a grasp quality convolutional neural network (GQ-CNN) is trained from a synthetic dataset of 6.7 million point clouds, grasps and analytic grasp metrics generated from thousands of 3D models from Dex-Net 1.0 in randomized poses on a table.
Proceedings ArticleDOI

6-DoF object pose from semantic keypoints

TL;DR: In this paper, the authors combine semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model to estimate the continuous 6-DoF pose of an object from a single RGB image.
Proceedings ArticleDOI

BigBIRD: A large-scale 3D database of object instances

TL;DR: A high-quality, large-scale dataset of 3D object instances, with accurate calibration information for every image, is presented, anticipating that “solving” this dataset will effectively remove many perception-related problems for mobile, sensing-based robots.
Proceedings ArticleDOI

T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects

TL;DR: T-LESS as discussed by the authors is a dataset for estimating the 6D pose of texture-less rigid objects with no significant texture and no discriminative color or reflectance properties, but some of the objects are parts of others.
References
More filters
Journal ArticleDOI

Distinctive Image Features from Scale-Invariant Keypoints

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Proceedings ArticleDOI

Histograms of oriented gradients for human detection

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Journal ArticleDOI

Bagging predictors

Leo Breiman
TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.
Journal ArticleDOI

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

TL;DR: The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems.

Distinctive Image Features from Scale-Invariant Keypoints

TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Related Papers (5)