Simultaneous Object Recognition and Segmentation from Single or Multiple Model Views
read more
Citations
A Comparison of Affine Region Detectors
DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion
3D Bounding Box Estimation Using Deep Learning and Geometry
3D Bounding Box Estimation Using Deep Learning and Geometry
3D generic object categorization, localization and pose estimation
References
Distinctive Image Features from Scale-Invariant Keypoints
A performance evaluation of local descriptors
Video Google: a text retrieval approach to object matching in videos
Color indexing
Robust wide-baseline stereo from maximally stable extremal regions
Related Papers (5)
Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography
Frequently Asked Questions (17)
Q2. What are the future works in "Simultaneous object recognition and segmentation from single or multiple model views" ?
Although the authors plan a number of speedups, the method is unlikely to reach the speed of the fastest other systems ( the system of Lowe ( 2001, 2004 ) is reported to perform recognition within seconds ). Finally, using several types of affine invariant regions simultaneously, rather than only those of Tuytelaars and Van-Gool ( 2000 ), would push the performance further upwards.
Q3. Why does the method tend to implode in negative cases?
the exploration process tends to implode in negative cases, because the expansion phases can do little and the contraction phases eat up most of the matches.
Q4. What is the effect of the large scale change on the image?
The large scale change, combined with the modest resolution (720×576), causes heavy image degradation which corrupts edges and texture.
Q5. What is the way to cover the visible part of the object?
Densely covering the visible part of the object is desirable, as it increases the evidence for its presence, which results in higher detection power.
Q6. What is the main advantage of the proposed filter?
The proposed filter does not try to capture the transformations of all matches in a single, overall model, but it relies instead on simpler, weak properties, involving only three matches each.
Q7. How does it solve a problem with n = 20?
It is also very time efficient, as it solves cases with n = 20 within some seconds (exhaustive search needs more than 1 hour), and scales well, taking less than one minute for n = 60, a problem size for which the real optimum cannot be computed.
Q8. What is the method for constructing two-view region correspondences?
Once two-view region correspondences have been produced for all ordered pairs of model views (vi, vj), i = j, they can be organized into multi-view region tracks.
Q9. Why is the image-exploration technique faster and less powerful?
The use of this faster, less powerful version is justified because matching model views is easier than matching to a test image: there is no background clutter, and the object appears at approximately the same scale.
Q10. What is the way to evaluate the correctness of sets of matches?
When evaluating the correctness and interrelations of sets of matches, it is convenient to reason at the higher perceptual grouping level that GAMs offer: no longer consider unrelated region matches, but the collection of GAMs instead.
Q11. How is the coherence of a configuration of GAMs evaluated?
The coherence of a configuration of GAMs, possibly originating from different model views, is evaluated using the region tracks that span the model views.
Q12. What is the effect of refinement on the similarity of correctly propagated matches?
Refinement raises the similarity of correctly propagated matches much more than the similarity of mispropagated ones, thereby helping correct supports to win.
Q13. Why is the decomposition of the GAM interesting?
When applied to this input, the GAM decomposition is most interesting, because the constructor has enough prime matter to build GAMs covering larger areas, even if curved or deformed.
Q14. How do the contributions from all model views of a single object be combined?
The contributions from all model views of a single object are combined by superimposing the area covered by the final set of matched regions (to find the contour), and by summing their number (detection criterion).
Q15. What is the method for constructing correspondences between two view views?
Let’s recall that the image-exploration technique constructs correspondences for many overlapping circular regions, arranged on a grid completely covering the first model view vi (coverage regions, see Section 4.1).
Q16. How is the probability that mismatches are grouped in the same GAM?
More precisely, the probability that N mismatches are grouped in the same GAM is expected to decrease roughly exponentially with N.
Q17. What is the relationship between the model views and the region tracks?
Since the model views are interconnected by the model tracks, the authors know the correspondences of the regions on the paw between views 3 and 4.