REIN - A fast, robust, scalable REcognition INfrastructure
read more
Citations
Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes
Gradient Response Maps for Real-Time Detection of Textureless Objects
6-DoF object pose from semantic keypoints
CAD-model recognition and 6DOF pose estimation using 3D cues
A Dataset for Improved RGBD-Based Object Detection and Pose Estimation for Warehouse Pick-and-Place
References
Histograms of oriented gradients for human detection
"GrabCut": interactive foreground extraction using iterated graph cuts
Fast Point Feature Histograms (FPFH) for 3D registration
Fast approximate nearest neighbors with automatic algorithm configuration
A discriminatively trained, multiscale, deformable part model
Related Papers (5)
Frequently Asked Questions (22)
Q2. What future works have the authors mentioned in the paper "Rein - a fast, robust, scalable recognition infrastructure" ?
In the near future the authors will add feature detector-descriptor techniques for textured objects where a bag of words technique is used to propose objects and geometric RANSAC with a 3D feature point model is used to verify ( ” dispose ” ) the proposed recognition while computing the 6DOF object pose similar to the work described in [ 7 ] which they call ” TOD ” for Textured Object Detector. As future plans, the authors are currently working on object picking tasks, with a goal to scale to 1000 objects or more. The authors plan to extend and use ReIn as their main recognition framework, this time using a cross-validated voting of classifiers to handle this span of textured, transparent and untextured items.
Q3. What is the object recognition and pose strategy?
Their object recognition and object pose strategy is to use a fast 2D classifier set at a low recognition threshold to rapidly over-detect objects in order to minimize mis-detections.
Q4. What is the way to remove spurious gradients?
3) To remove spurious gradients, a 3x3 filter is next run that eliminates binarized gradient directions that only appear once in a given 3x3 region.
Q5. Why does BiGG use only grayscale gradients?
Because BiGG captures gradients in their context, it can take advantage of the interior texture where it can find it but can also recognize textureless and even transparent objects just from their outer contour.
Q6. What is the common format for storing and accessing ROS messages?
3“Bag file” is the common format for storing and accessing ROS messages in an efficient way4PR2 (Personal Robot 2) is a robotic platform developed by Willow Garage – http://www.willowgarage.comTo develop a rapid object detector for their object proposal stage the authors drew on ideas from the HoG detector [9] which is essentially a grid of gradient histograms.
Q7. How many datasets were used in this example?
Please note that in this example, the query cluster was part of the “training” data – which consisted of 2720 datasets representing different objects in various poses – meaning that the distance from the query to itself should be 0.
Q8. What is the meta locality of the descriptor?
The meta locality of the descriptor comes from the fact that it is usually applied to a cluster of 3D points that contains the object to be recognized with a high probability.
Q9. How did the authors evaluate the effectiveness of the ReIn data passing architecture?
To evaluate the effectiveness of the ReIn data passing architecture, the authors built a simple attention operator that receives a 3D point cloud from the stereo camera at full 640x480, and returns the entire image as a ROI/mask.
Q10. What is the advantage of a plugin system?
The plugin system allows for great flexibility, making possible for different algorithms to be loaded as part of the same process, part of different processes of even on different machines (on a compute cluster for example).
Q11. How can you parallelize the summary image patch with the template?
if cleanly written, BiGG can be quite fast and can take advantage of SSE or CUDA instructions to parallelize matching via parallel AND’ing of the summary image patch with the template.
Q12. What is the advantage of wrapping existing algorithms in a distributed message passing architecture?
Since ReIn is built on a distributed message passing architecture (that will take advantage of shared memory where available to avoid copying data), it is simple to configure the “roles” that classifiers will take.
Q13. What are the default values for VFH?
Although there are a fair number of parameters in BiGGPy such as pyramid levels, pyramid blur, gradient magnitudes etc., in practice the authors use the default values mentioned above which have performed well and mainly just tune the top threshold value and how fast it decays through lower pyramid levels.
Q14. What is the distance threshold for the first object detector?
The first object detector (BiGG) is configured with low detection thresholds to obtain high recall at the cost of many false positive detections (see figure 15), detections which are then filtered by the second object detector (VFH).
Q15. What did the authors assume was that the objects of interest were supported by horizontal planes?
In their previous work, the authors assumed that the objects of interest are supported by horizontal planes, and used segmentation and clustering techniques to extract individual objects asseparate clusters.
Q16. What is the advantage of wrapping existing algorithms in their infrastructure?
An additional advantage of wrapping existing algorithms in their infrastructure is the fact that they automatically become plugins (ROS nodelets2), capable of being dynamically loaded/unloaded from a system.
Q17. What is the advantage of wrapping objects in a way that allows for uniform training?
The advantage of doing this is that all detectors implementing the Trainable interface can be trained in an uniform manner, using the same data formats (for examplebag files3 or sets of annotated images) and the same tools.
Q18. What is the way to train a gradient detector?
4) The authors next compute a gradient ”Summary Image” where in each n x n block (typically 7x7) the authors OR the gradients together to provide some generalization to exact alignment and pose.
Q19. What is the definition of a blackbox?
In ReIn an algorithm is viewed as a blackbox, with a well defined interface, that consumes a set of inputs, produces some outputs and is configured by a set of parameters.
Q20. What is the purpose of the object detection stage?
The authors will then use a 3D object and pose detection algorithm to filter out the incorrect object proposals from the correct ones which the authors term the ”Disposal” stage.
Q21. What is the definition of a detection algorithm?
• Detector: Takes as input an image, a 3D point cloud, a list of ROIs/masks or a list of detections, and produces as output a list of detections and potentially a list of poses.
Q22. What is the current application example of a table clearing?
An application example that the authors are currently pursuing is table clearing with their PR2 platform4, which involves the recognition of plates, cups, and common household items.