Book ChapterDOI
What does classifying more than 10,000 image categories tell us?
Jia Deng,Alexander C. Berg,Kai Li,Li Fei-Fei +3 more
- pp 71-84
Reads0
Chats0
TLDR
A study of large scale categorization including a series of challenging experiments on classification with more than 10,000 image classes finds that computational issues become crucial in algorithm design and conventional wisdom from a couple of hundred image categories does not necessarily hold when the number of categories increases.Abstract:
Image classification is a critical task for both humans and computers. One of the challenges lies in the large scale of the semantic space. In particular, humans can recognize tens of thousands of object classes and scenes. No computer vision algorithm today has been tested at this scale. This paper presents a study of large scale categorization including a series of challenging experiments on classification with more than 10, 000 image classes. We find that a) computational issues become crucial in algorithm design; b) conventional wisdom from a couple of hundred image categories on relative performance of different classifiers does not necessarily hold when the number of categories increases; c) there is a surprisingly strong relationship between the structure of WordNet (developed for studying language) and the difficulty of visual categorization; d) classification can be improved by exploiting the semantic hierarchy. Toward the future goal of developing automatic vision algorithms to recognize tens of thousands or even millions of image categories, we make a series of observations and arguments about dataset scale, category density, and image hierarchy.read more
Citations
More filters
Book
Deep Learning
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Posted Content
Building high-level features using large scale unsupervised learning
Quoc V. Le,Marc'Aurelio Ranzato,Rajat Monga,Matthieu Devin,Kai Chen,Greg S. Corrado,Jeffrey Dean,Andrew Y. Ng +7 more
TL;DR: In this paper, a 9-layered locally connected sparse autoencoder with pooling and local contrast normalization was used to train a face detector without having to label images as containing a face or not.
Journal ArticleDOI
Image Classification with the Fisher Vector: Theory and Practice
TL;DR: This work proposes to use the Fisher Kernel framework as an alternative patch encoding strategy: it describes patches by their deviation from an “universal” generative Gaussian mixture model, and reports experimental results showing that the FV framework is a state-of-the-art patch encoding technique.
Posted Content
NIPS 2016 Tutorial: Generative Adversarial Networks
TL;DR: This report summarizes the tutorial presented by the author at NIPS 2016 on generative adversarial networks (GANs), and describes state-of-the-art image models that combine GANs with other methods.
References
More filters
Proceedings ArticleDOI
ImageNet: A large-scale hierarchical image database
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Journal ArticleDOI
Distinctive Image Features from Scale-Invariant Keypoints
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Proceedings ArticleDOI
Histograms of oriented gradients for human detection
Navneet Dalal,Bill Triggs +1 more
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Journal ArticleDOI
The Pascal Visual Object Classes (VOC) Challenge
TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.
Journal ArticleDOI
WordNet : an electronic lexical database
TL;DR: The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.