View Synthesis for Recognizing Unseen Poses of Object Classes
read more
Citations
Computer Vision: Algorithms and Applications
Shape Analysis of Elastic Curves in Euclidean Spaces
Deep Stereo: Learning to Predict New Views from the World's Imagery
Multi-view object class detection with a 3D geometric model
From Virtual to Reality: Fast Adaptation of Virtual Object Detectors to Real Domains
References
Rapid object detection using a boosted cascade of simple features
Object recognition from local scale-invariant features
Multiple view geometry in computer vision
Multiple View Geometry in Computer Vision.
Visual categorization with bags of keypoints
Related Papers (5)
Frequently Asked Questions (15)
Q2. What future works have the authors mentioned in the paper "View synthesis for recognizing unseen poses of object classes" ?
But beyond the possibility of semantic labeling of objects seen under specific views, it is often crucial to recognize the pose of the objects in the 3D space, along with its categorical identity. Further research is also needed to explore to what degree the inherent nuisances in category-level recognition ( lighting variability, occlusions and background clutter ) affect the view morphing formulation. Finally, it would be interesting to extend their framework and incorporate the ability to model non-rigid objects. Their initial testing of the algorithm shows promising results.
Q3. What is the limitation of the interpolation scheme described in Sec. 2.2?
One limitation of the interpolation scheme described in Sec. 2.2 is that a new view can be synthesized only if it belongs to the linear camera trajectory from one view to the other.
Q4. What is the key property of view-synthesis techniques?
The key property of view-synthesis techniques is their ability to generate new views of an object without reconstructing its actual 3D model.
Q5. What is the significance of the pose estimation algorithm?
But beyond the possibility of semantic labeling of objects seen under specific views, it is often crucial to recognize the pose of the objects in the 3D space, along with its categorical identity.
Q6. what is the h of the vector quantized descriptors?
B is a 2×4 vector encoding the b = [x, y]T coordinates of the four corners of the quadrangle, i.e. B = [ b1 . . . b4 ] ; h is a M×1 vector, where M is the size of the vocabulary of the vector quantized descriptors.
Q7. What is the main contribution of the approach?
the main contribution of their approach is that the synthesis takes place at the categorical level as opposed to the single object level (as previously explored).
Q8. What is the way to synthesize a novel view?
The authors notice that under the assumption of having the views in a neighborhood on the viewing sphere, the cameras can be approximated as being parallel, enabling a simple linear interpolation scheme (Fig. 3).
Q9. How can view-synthesis techniques be extended for handling object classes?
since these methods achieve recognition by matching local features [21,22,23,24] or group of local features [25,26] under rigid geometrical transformations, they can be hardly extended for handling object classes.
Q10. How many classes of images are collected from the Internet?
In this new dataset of 8 object classes, 7 classes of images (cellphone, bike, iron, shoe, stapler, mouse, and toaster) are collected from the Internet (mostly Google and Flickr) by using an automatic image crawler.
Q11. What is the main contribution of the experimental analysis?
Their experimental analysis validates their theoretical findings and shows that their algorithm is able to successfully estimate object classes and poses under very challenging conditions.
Q12. What is the main contribution of the proposed algorithm?
The authors propose a new algorithm that takes advantage of their view-synthesis machinery for recognizing objects seen under arbitrary views.
Q13. What is the key to a novel view of the object category?
Notice that the output of this representation (synthesis) is a novel view of the object category, not just a novel view of a single object instance, whereas all previous morphing techniques are used for synthesizing novel views of single objects.
Q14. How do the authors train the model to recognize unseen views?
To assess the performance of their algorithm to recognize unseen views, the authors train both the model in [1] and ours by using a reduced set of poses in training.
Q15. What is the canonical part representation of the car rear bumper?
Given an assortment of canonical parts (e.g. the colored patches in Fig. 2(b)), a linkage structure connects each pair of canonical parts {Pj , Pi} if they can be both visible at the same time (Fig. 2(c)).