Image classification using super-vector coding of local image descriptors
read more
Citations
Going deeper with convolutions
ImageNet Large Scale Visual Recognition Challenge
Selective Search for Object Recognition
Deep learning for visual understanding
Understanding deep image representations by inverting them
References
Gradient-based learning applied to document recognition
The Pascal Visual Object Classes (VOC) Challenge
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
Visual categorization with bags of keypoints
A Bayesian hierarchical model for learning natural scene categories
Related Papers (5)
Frequently Asked Questions (13)
Q2. What is the main challenge to the computer vision community?
Image classification, including object recognition and scene classification, remains to be a major challenge to the computer vision community.
Q3. What is the purpose of the paper?
A large body of work investigates probabilistic generative models, with the objective towards understanding the semantic content of images.
Q4. What is the function that is used to find whether an image is contained in a particular category?
An image’s spacial pyramid representation is then obtained by concatenating the results of local poolingΦs(X) = [ Φ(X111), Φ(X 2 11), Φ(X 2 12), Φ(X 2 21), Φ(X 2 22), Φ(X 3 11), Φ(X 3 12), Φ(X 3 13) ]Image classification is done by applying classifiers based on the image representations obtained from the pooling step.
Q5. What is the notable example of the SPM kernel approach?
One notable example is the SPM kernel approach [4], which applies average pooling on top of VQ coding, plus a nonlinear SVM classifier using Chi-square or intersection kernels.
Q6. What is the way to train a classifier?
Once the model is trained, Eq. (5) suggests that one can compute a response map based on g(x), which visualizes where the classifier focuses on in the image, as shown in their experiments.
Q7. What is the effect of a change of measure in (x)?
In particular, a change of measure µ(·) (still piece-wise constant in each partition) leads to a rescaling of different components in Φ(X).
Q8. What is the result of g(x) = w>(x)?
As the result, g(x) = w>φ(x) gives rise to a local linear function (i.e., piece-wise linear) to approximate the unknown nonlinear function f(x).
Q9. What is the simplest way to approximate f(x)?
Eq. (1) also suggests that the approximation to f(x) can be expressed as a linear function on a nonlinear coding schemef(x) ≈ g(x) ≡ w>φ(x),where φ(x) is called the Super-Vector (SV) coding of x, defined byφ(x) = [ sγv(x), γv(x)(x− v)> ]>
Q10. What is the difference between the two methods?
Since their method naturally requires a linear classifier, it enjoys a training scalability which is linear to the number of training images, while nonlinear kernel-based methods suffer quadratic or higher complexity.
Q11. What is the coding scheme for f(x)?
Let us consider a general unsupervised learning setting, where a set of bases C ⊂ Rd, called codebook or dictionary, is employed to approximate any x, namely,x ≈ ∑ v∈C γv(x)v,where γ(x) = [γv(x)]v∈C is the coefficients, and sometimes ∑ v γv(x) =
Q12. What is the method for coding images?
These methods are: (1) VQ coding – using Bhattacharyya kernel on spatial pyramid histogram presentations; (2) GMM – the method described in [22]; (3) SV – the super-vector coding proposed by this paper; (4) SV-soft – the soft version of SV coding, where [pk(x)]k for each x is truncated to retain the top 20 elements with the rest elements being set zero.
Q13. What is the purpose of this paper?
The work stresses the importance of learning good coding of local descriptors in the context of image classification, and makes the first attempt to formally incorporate the metric of local descriptors into distribution kernels.