scispace - formally typeset
Proceedings ArticleDOI

Image Classification using Random Forests and Ferns

TLDR
It is shown that selecting the ROI adds about 5% to the performance and, together with the other improvements, the result is about a 10% improvement over the state of the art for Caltech-256.
Abstract
We explore the problem of classifying images by the object categories they contain in the case of a large number of object categories. To this end we combine three ingredients: (i) shape and appearance representations that support spatial pyramid matching over a region of interest. This generalizes the representation of Lazebnik et al., (2006) from an image to a region of interest (ROI), and from appearance (visual words) alone to appearance and local shape (edge distributions); (ii) automatic selection of the regions of interest in training. This provides a method of inhibiting background clutter and adding invariance to the object instance 's position; and (iii) the use of random forests (and random ferns) as a multi-way classifier. The advantage of such classifiers (over multi-way SVM for example) is the ease of training and testing. Results are reported for classification of the Caltech-101 and Caltech-256 data sets. We compare the performance of the random forest/ferns classifier with a benchmark multi-way SVM classifier. It is shown that selecting the ROI adds about 5% to the performance and, together with the other improvements, the result is about a 10% improvement over the state of the art for Caltech-256.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Vlfeat: an open and portable library of computer vision algorithms

TL;DR: VLFeat is an open and portable library of computer vision algorithms that includes rigorous implementations of common building blocks such as feature detectors, feature extractors, (hierarchical) k-means clustering, randomized kd-tree matching, and super-pixelization.
Proceedings ArticleDOI

Linear spatial pyramid matching using sparse coding for image classification

TL;DR: An extension of the SPM method is developed, by generalizing vector quantization to sparse coding followed by multi-scale spatial max pooling, and a linear SPM kernel based on SIFT sparse codes is proposed, leading to state-of-the-art performance on several benchmarks by using a single type of descriptors.
Proceedings ArticleDOI

Automated Flower Classification over a Large Number of Classes

TL;DR: Results show that learning the optimum kernel combination of multiple features vastly improves the performance, from 55.1% for the best single feature to 72.8% forThe combination of all features.
Journal ArticleDOI

ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R

TL;DR: Ranger as mentioned in this paper is a C++ application and R package for high-dimensional data, which is a fast implementation of random forests for high dimensional data and supports ensemble of classification, regression and survival trees.
Journal Article

Multimodal learning with deep Boltzmann machines

TL;DR: A Deep Boltzmann Machine is proposed for learning a generative model of multimodal data and it is shown that the model can be used to create fused representations by combining features across modalities, which are useful for classification and information retrieval.
References
More filters
Journal ArticleDOI

Random Forests

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Journal ArticleDOI

Distinctive Image Features from Scale-Invariant Keypoints

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Proceedings ArticleDOI

Histograms of oriented gradients for human detection

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Proceedings ArticleDOI

Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

TL;DR: This paper presents a method for recognizing scene categories based on approximate global geometric correspondence that exceeds the state of the art on the Caltech-101 database and achieves high accuracy on a large database of fifteen natural scene categories.
Proceedings ArticleDOI

Video Google: a text retrieval approach to object matching in videos

TL;DR: An approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video, represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion.
Related Papers (5)