TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation
read more
Citations
Deep Sets
Is object localization for free? - Weakly-supervised learning with convolutional neural networks
Learning Fine-grained Image Similarity with Deep Ranking
Learning Fine-Grained Image Similarity with Deep Ranking
CNN-RNN: A Unified Framework for Multi-label Image Classification
References
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
Labeling images with a computer game
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary
Matching words and pictures
Related Papers (5)
Frequently Asked Questions (12)
Q2. What are the future works in "Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation" ?
In future work, the authors will consider extending the model to assign tags to image regions, in order to address tasks such as image region labelling and object detection from imagewide annotations.
Q3. What is the goal of the proposed method?
Their proposed method is based on a weighted nearest neighbor approach, inspired by recent successful methods [5, 11, 13, 17], that propagate the annotations of training images to new images.
Q4. What are the two types of global image descriptors?
The authors use two types of global image descriptors: Gist features [21], and color histograms with 16 bins in each color channel for RGB, LAB, HSV representations.
Q5. What are the performance measures used in previous work?
The authors evaluate their models with standard performance measures, used in previous work, that evaluate retrieval performance per keyword, and then average over keywords.
Q6. How do the authors combine the base distances?
The authors also combine the base distances by learning a binary classifier separating image pairs that have several tags in common from images that do not share any tags.
Q7. How does the model perform on image regions?
In future work, the authors will consider extending the model to assign tags to image regions, in order to address tasks such as image region labelling and object detection from imagewide annotations.
Q8. Why is the model easlily used to predict the relevance of images?
Due to the probabilistic output of TagProp, this is easlily done by taking the product over the single keyword probabilities, as their model does not explicitly account for dependencies between words.
Q9. What methods have been used to learn the likelihood over visual features?
Different learning methods have been used, including support vector machines, multiple-instance learning, and Bayes point machines.
Q10. What is the smallest neighbor rank for each i?
For each i, the authors select K neighbors such that the authors maximise k∗ = min{kd}, where kd is the largest neighbor rank for which neighbors 1 to k of base distance d are included among the selected neighbors.
Q11. How do the authors determine the weights for neighbors?
the weights for neighbors are either determined based on the neighbor rank or its distance, and set automatically by maximizing the likelihood of annotations in a set of training images.
Q12. What is the metric used to compute the histograms?
The authors compute the histograms over three horizontal regions of the image, and concatenate them to form a new global descriptor, albeit one that encodes some information of the spatial layout of the image.