scispace - formally typeset
Open AccessJournal ArticleDOI

Image Classification with the Fisher Vector: Theory and Practice

Reads0
Chats0
TLDR
This work proposes to use the Fisher Kernel framework as an alternative patch encoding strategy: it describes patches by their deviation from an “universal” generative Gaussian mixture model, and reports experimental results showing that the FV framework is a state-of-the-art patch encoding technique.
Abstract
A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements This leads to the popular Bag-of-Visual words representation In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an "universal" generative Gaussian mixture model This representation, which we call Fisher vector has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization We report experimental results on five standard datasets--PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K--with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Squeeze-and-Excitation Networks

TL;DR: This work proposes a novel architectural unit, which is term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and finds that SE blocks produce significant performance improvements for existing state-of-the-art deep architectures at minimal additional computational cost.
Posted Content

Squeeze-and-Excitation Networks

TL;DR: Squeeze-and-excitation (SE) as mentioned in this paper adaptively recalibrates channel-wise feature responses by explicitly modeling interdependencies between channels, which can be stacked together to form SENet architectures.
Journal ArticleDOI

On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation.

TL;DR: This work proposes a general solution to the problem of understanding classification decisions by pixel-wise decomposition of nonlinear classifiers by introducing a methodology that allows to visualize the contributions of single pixels to predictions for kernel-based classifiers over Bag of Words features and for multilayered neural networks.
Proceedings Article

Learning Deep Features for Scene Recognition using Places Database

TL;DR: A new scene-centric database called Places with over 7 million labeled pictures of scenes is introduced with new methods to compare the density and diversity of image datasets and it is shown that Places is as dense as other scene datasets and has more diversity.
References
More filters
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Journal ArticleDOI

Distinctive Image Features from Scale-Invariant Keypoints

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Journal ArticleDOI

The Pascal Visual Object Classes (VOC) Challenge

TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.
Proceedings ArticleDOI

Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

TL;DR: This paper presents a method for recognizing scene categories based on approximate global geometric correspondence that exceeds the state of the art on the Caltech-101 database and achieves high accuracy on a large database of fifteen natural scene categories.
Related Papers (5)