Distance-Based Image Classification: Generalizing to New Classes at Near-Zero Cost
read more
Citations
Prototypical Networks for Few-shot Learning
iCaRL: Incremental Classifier and Representation Learning
Statistical Pattern Recognition
Meta-Learning With Differentiable Convex Optimization
Towards Open Set Deep Networks
References
ImageNet: A large-scale hierarchical image database
Distinctive Image Features from Scale-Invariant Keypoints
Convex Optimization
Large-Scale Machine Learning with Stochastic Gradient Descent
Visual categorization with bags of keypoints
Related Papers (5)
Frequently Asked Questions (13)
Q2. What is the way to obtain the centroids of each class?
(20)To obtain the centroids of each class, the authors apply k-means clustering on the features x belonging to that class, using the `2 distance.
Q3. How many training images are used to estimate the gradient?
To learn the projection matrix W , the authors use SGD training and sample at each iteration a fixed number of m training images to estimate the gradient.
Q4. What is the definition of a query-by-example image retrieval problem?
Query-by-example image retrieval can be seen as an image classification problem where only a single positive sample (the query) is given and negative examples are not explicitly provided.
Q5. What is the cost of the class p(c|x)?
The cost of this computation is dominated by the computation of the squared distances dW (x,µc), required to compute the m × C probabilities p(c|x) for C classes in the SGD update.
Q6. How can the authors compute the gradient without iterating over all triplets?
For either choice of the target set Pq , the gradient can be computed without explicitly iterating over all triplets, by sorting the distances w.r.t. query images.
Q7. What is the cost of projecting the m data vectors and C class centers?
the authors can first project both the m data vectors and C class centers, and then compute distances in the low dimensional space, at a total cost of O ( dD(m+C)+mC(d) ) .
Q8. What other methods aim at learning metrics for verification problems?
Other methods aim at learning metrics for verification problems and essentially learn binary classifiers that threshold the learned distance to decide whether two images belong to the same class or not, see e.g . [21], [22], [23].
Q9. What is the way to learn the projection matrix?
Instead of using a fixed set of class means, it could be advantageous to iterate the k-means clustering and the learning of the projection matrix W .
Q10. What is the posterior probability for class c?
The posterior probability for class c can be defined as:p(c|x) = k∑j=1p(mcj |x), (16)p(mcj |x) = 1Z exp( − 12dW (x,mcj) ) , (17)where p(mcj |x) denotes the posterior of a centroid mcj , and Z = ∑ c ∑ j exp ( − 12dW (x,mcj) ) is the normalizer.
Q11. What is the gradient of the objective of Eq. (6) to M?
The gradient of the objective of Eq. (6) w.r.t. to M is:∇ML = 1N ∑ i,c αic zicz > ic ≡ H, (22)where αic = [[yi = c]]− p(c|xi), and zic = µc − xi.
Q12. What are the core components of the methods that can be written as matrix products?
the core components of these methods can be written as matrix products (e.g . projections of the means or images, the gradients of the objectives, etc .), for which the authors benefit from optimized multi-threaded implementations.
Q13. How many triplets can be used to compute the gradient?
This shows that once A is available, the gradient can be computed in time O(m2), even if a much larger number of triplets is used.