How do humans sketch objects
read more
Citations
Image-to-Image Translation with Conditional Adversarial Networks
Image-to-Image Translation with Conditional Adversarial Networks
Edge Boxes: Locating Object Proposals from Edges
Multi-view Convolutional Neural Networks for 3D Shape Recognition
PointCNN: convolution on Χ -transformed points
References
Distinctive Image Features from Scale-Invariant Keypoints
Visualizing Data using t-SNE
The Pascal Visual Object Classes (VOC) Challenge
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
Video Google: a text retrieval approach to object matching in videos
Related Papers (5)
Frequently Asked Questions (19)
Q2. What are the contributions mentioned in the paper "How do humans sketch objects?" ?
This paper is the first large scale exploration of human sketches. The authors analyze the distribution of non-expert sketches of everyday objects such as ‘ teapot ’ or ‘ car ’. With this dataset the authors perform a perceptual study and find that humans can correctly identify the object category of a sketch 73 % of the time. The authors compare human performance against computational recognition methods. The authors develop a bag-of-features sketch representation and use multi-class support vector machines, trained on their sketch dataset, to classify sketches. Based on the computational model, the authors demonstrate an interactive sketch recognition system.
Q3. What is the general strategy for such techniques?
The general strategy for such techniques is to remove complexity (e.g. delete edges) while staying as close as possible to the original instance according to some geometric error metric.
Q4. How do the authors extract the local descriptor lj from the response image?
For each response image, the authors extract a local descriptor lj by binning the underlying orientational response values into a small, lo-cal histogram using 4 × 4 spatial bins, again linearly interpolating into neighboring bins.
Q5. What is the direct way to define a sketch?
Probably the most direct way to define a feature space for sketches is to directly use its (possibly down-scaled) bitmap representation.
Q6. How do the authors build their final, local patch descriptor?
The authors build their final, local patch descriptor by stacking the orientational descriptors into a single column vector d = [l1, . . . , lr]
Q7. What is the main assumption in all of these works?
The assumption in all of these works is that, in some well-engineered feature space, sketched objects resemble their real-world counterparts.
Q8. What is the way to make a sketch more natural?
A stroke-based model might be more natural and facilitate easier synthesis applications such as simplification, beautification, and even synthesis of novel sketches by mixing existing strokes.
Q9. How does the study show that humans still perform better than computers at this task?
While the authors achieve a high computational recognition accuracy of 56% (chance is 0.4%), their study also reveals that humans still perform significantly better than computers at this task.
Q10. How many HITs are required to identify each sketch?
The authors submit a total of 5,000 HITs to Mechanical Turk, each requiring workers to sequentially identify four sketches from random categories.
Q11. What is the simplest way to visualize the distribution of sketches in the feature space?
8.To visualize the distribution of sketches in the feature space the authors apply dimensionality reduction to the feature vectors from each category.
Q12. What is the way to find clusters of sketches in the proposed feature space?
the authors would find clusters of sketches in this space that clearly represent their categories, i.e. the authors would hope to find that features within a category are close to each other while having large distances to all other features.•
Q13. Why do people need a large dataset of sketches?
Because people represent the same object using differing degrees of realism and distinct drawing styles (see Fig. 1), the authors need a large dataset of sketches which adequately samples these variations.
Q14. How does the computational model perform in such cases?
indeed, computational classification can perform better in such cases: for ‘armchair’ and ‘suv’ the computational model achieves significantly higher accuracy than humans.
Q15. How old are the ability to recognize sketched objects?
Such pictographs predate the appearance of language by tens of thousands of years and today the ability to draw and recognize sketched objects is ubiquitous.
Q16. What is the simplest way to represent a sketch?
At this point, a sketch is represented as a so-called bag-of-features, containing a large number of local, 64-dimensional feature vectors (4 × 4 spatial bins and 4 orientational bins).
Q17. How does the performance gain for larger training sets change as the dataset grows?
The performance gain for larger training set sizes becomes smaller as the authors approach the full size of their dataset: this suggests that the dataset is large enough to capture most of the variance within each category.
Q18. What is the way to get a good representation of the local patch?
While in computer vision applications the size of local patches used to analyze photographs is often quite small (e.g. 16 × 16 pixels [Lazebnik et al. 2006]), sketches contain little information at that scale and larger patch sizes are required for an effective representation.
Q19. What is the way to retrieve images from a sketch?
The authors propose the following extension to sketch-based image retrieval: a) perform classification on the user sketch and query a traditional keyword based search engine using the determined category; b) (optionally) re-order the resulting images according to their geometric similarity to the user sketch.