Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study
read more
Citations
The Pascal Visual Object Classes (VOC) Challenge
Object Detection with Discriminatively Trained Part-Based Models
Learning realistic human actions from movies
On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation.
Improving the fisher kernel for large-scale image classification
References
Distinctive Image Features from Scale-Invariant Keypoints
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
A performance evaluation of local descriptors
Video Google: a text retrieval approach to object matching in videos
Visual categorization with bags of keypoints
Related Papers (5)
Frequently Asked Questions (10)
Q2. What have the authors stated for future works in "Local features and kernels for classification of texture and object categories: a comprehensive study" ?
Future research should focus on designing improved feature representations. Another promising area is the development of hybrid sparse/dense representations. For example, the recent successes of the novel feature extraction schemes of [ 10, 28 ] suggest that increasing the density and redundancy of local feature sets may be beneficial for recognition.
Q3. What can the authors do to achieve rotation invariance?
To achieve rotation invariance, the authors can either use rotationally invariant descriptors—for example, SPIN and RIFT [31], as presented in the following section—or rotate the circular regions in the direction of the dominant gradient orientation [36, 43].
Q4. What is the way to overcome this potential weakness?
One way to overcome this potential weakness is to use feature selection [12] or boosting [48] to retain only the most discriminative features for recognition.
Q5. What is the definition of a RIFT descriptor?
The SPIN descriptor, based on spin images used for matching range data [26], is a rotation-invariant two-dimensional histogram of intensities within an image region.
Q6. What is the way to obtain a global texton vocabulary?
An alternative to image signatures is to obtain a global texton vocabulary (or visual vocabulary) by clustering descriptors from a special training set, and then to represent each image in the database as a histogram of texton labels [8, 57, 58, 61].
Q7. What is the advantage of orderless bag-of-keypoints methods?
On the other hand, orderless bag-of-keypoints methods [55, 61] have the advantage of simplicity and computational efficiency, though they fail to represent the geometric structure of the object class or to distinguish between foreground and background features.
Q8. What is the power of bag-of-keypoints representations?
The power of orderless bag-of-keypoints representations is not particularly surprising in the case of texture images, which lack clutter and have uniform statistical properties.
Q9. What is the way to avoid the cost of building global vocabularies?
To avoid the computational expense of building global vocabularies for each dataset, the authors use the EMD kernel in the following experiments.
Q10. What is the method for combining geometric invariance with a discriminative classifier?
Their results show that for most datasets, combining geometric invariance at the representation level with a discriminative classifier at the learning level, results in a very effective texture recognition system.