Proceedings ArticleDOI
Multiclass Object Recognition with Sparse, Localized Features
J. Mutch,David G. Lowe +1 more
- Vol. 1, pp 11-18
Reads0
Chats0
TLDR
A biologically inspired model of visual object recognition to the multiclass object categorization problem, modifies that of Serre, Wolf, and Poggio, and demonstrates the value of retaining some position and scale information above the intermediate feature level.Abstract:
We apply a biologically inspired model of visual object recognition to the multiclass object categorization problem. Our model modifies that of Serre, Wolf, and Poggio. As in that work, we first apply Gabor filters at all positions and scales; feature complexity and position/scale invariance are then built up by alternating template matching and max pooling operations. We refine the approach in several biologically plausible ways, using simple versions of sparsification and lateral inhibition. We demonstrate the value of retaining some position and scale information above the intermediate feature level. Using feature selection we arrive at a model that performs better with fewer features. Our final model is tested on the Caltech 101 object categories and the UIUC car localization task, in both cases achieving state-of-the-art performance. The results strengthen the case for using this class of model in computer vision.read more
Citations
More filters
Proceedings ArticleDOI
Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations
TL;DR: The convolutional deep belief network is presented, a hierarchical generative model which scales to realistic image sizes and is translation-invariant and supports efficient bottom-up and top-down probabilistic inference.
Proceedings ArticleDOI
What is the best multi-stage architecture for object recognition?
TL;DR: It is shown that using non-linearities that include rectification and local contrast normalization is the single most important ingredient for good accuracy on object recognition benchmarks and that two stages of feature extraction yield better accuracy than one.
Proceedings ArticleDOI
Convolutional networks and applications in vision
TL;DR: New unsupervised learning algorithms, and new non-linear stages that allow ConvNets to be trained with very few labeled samples are described, including one for visual object recognition and vision navigation for off-road mobile robots.
Journal ArticleDOI
Robust Object Recognition with Cortex-Like Mechanisms
TL;DR: A hierarchical system that closely follows the organization of visual cortex and builds an increasingly complex and invariant feature representation by alternating between a template matching and a maximum pooling operation is described.
Proceedings ArticleDOI
Representing shape with a spatial pyramid kernel
TL;DR: This work introduces a descriptor that represents local image shape and its spatial layout, together with a spatial pyramid kernel that is designed so that the shape correspondence between two images can be measured by the distance between their descriptors using the kernel.
References
More filters
Journal ArticleDOI
Gradient-based learning applied to document recognition
Yann LeCun,Léon Bottou,Léon Bottou,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio,Patrick Haffner +6 more
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Proceedings ArticleDOI
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
TL;DR: This paper presents a method for recognizing scene categories based on approximate global geometric correspondence that exceeds the state of the art on the Caltech-101 database and achieves high accuracy on a large database of fifteen natural scene categories.
Journal ArticleDOI
Emergence of simple-cell receptive field properties by learning a sparse code for natural images
TL;DR: It is shown that a learning algorithm that attempts to find sparse linear codes for natural scenes will develop a complete family of localized, oriented, bandpass receptive fields, similar to those found in the primary visual cortex.
Proceedings Article
Visual categorization with bags of keypoints
TL;DR: This bag of keypoints method is based on vector quantization of affine invariant descriptors of image patches and shows that it is simple, computationally efficient and intrinsically invariant.
Journal ArticleDOI
Neocognitron: A Self Organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position
TL;DR: A neural network model for a mechanism of visual pattern recognition that is self-organized by “learning without a teacher”, and acquires an ability to recognize stimulus patterns based on the geometrical similarity of their shapes without affected by their positions.