What does classifying more than 10,000 image categories tell us?

doi:10.1007/978-3-642-15555-0_6

Book ChapterDOI

What does classifying more than 10,000 image categories tell us?

Jia Deng, +3 more

- pp 71-84

Chats0

TLDR

A study of large scale categorization including a series of challenging experiments on classification with more than 10,000 image classes finds that computational issues become crucial in algorithm design and conventional wisdom from a couple of hundred image categories does not necessarily hold when the number of categories increases.

Abstract:

Image classification is a critical task for both humans and computers. One of the challenges lies in the large scale of the semantic space. In particular, humans can recognize tens of thousands of object classes and scenes. No computer vision algorithm today has been tested at this scale. This paper presents a study of large scale categorization including a series of challenging experiments on classification with more than 10, 000 image classes. We find that a) computational issues become crucial in algorithm design; b) conventional wisdom from a couple of hundred image categories on relative performance of different classifiers does not necessarily hold when the number of categories increases; c) there is a surprisingly strong relationship between the structure of WordNet (developed for studying language) and the difficulty of visual categorization; d) classification can be improved by exploiting the semantic hierarchy. Toward the future goal of developing automatic vision algorithms to recognize tens of thousands or even millions of image categories, we make a series of observations and arguments about dataset scale, category density, and image hierarchy.

Citations

PDF

Open Access

More filters

Book

Deep Learning

Ian Goodfellow, +2 more

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

...read moreread less

The PASCAL Visual Object Classes Challenge

Jianguo Zhang

Posted Content

Building high-level features using large scale unsupervised learning

Quoc V. Le, +7 more

- 29 Dec 2011 -

arXiv: Learning

TL;DR: In this paper, a 9-layered locally connected sparse autoencoder with pooling and local contrast normalization was used to train a face detector without having to label images as containing a face or not.

...read moreread less

Journal ArticleDOI

Image Classification with the Fisher Vector: Theory and Practice

Jorge Sanchez, +3 more

- 01 Dec 2013 -

International Journal of Computer Vision

TL;DR: This work proposes to use the Fisher Kernel framework as an alternative patch encoding strategy: it describes patches by their deviation from an “universal” generative Gaussian mixture model, and reports experimental results showing that the FV framework is a state-of-the-art patch encoding technique.

...read moreread less

Posted Content

NIPS 2016 Tutorial: Generative Adversarial Networks

Ian Goodfellow

- 31 Dec 2016 -

arXiv: Learning

TL;DR: This report summarizes the tutorial presented by the author at NIPS 2016 on generative adversarial networks (GANs), and describes state-of-the-art image models that combine GANs with other methods.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

Jia Deng, +5 more

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

Journal ArticleDOI

Distinctive Image Features from Scale-Invariant Keypoints

David G. Lowe

- 01 Nov 2004 -

International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Proceedings ArticleDOI

Histograms of oriented gradients for human detection

Navneet Dalal, +1 more

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.

...read moreread less

Journal ArticleDOI

The Pascal Visual Object Classes (VOC) Challenge

Mark Everingham, +4 more

- 01 Jun 2010 -

International Journal of Computer Vision

TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.

...read moreread less

Journal ArticleDOI

WordNet : an electronic lexical database

Christiane Fellbaum

- 01 Sep 2000 -

Language

TL;DR: The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.

...read moreread less

Collapse

Related Papers (5)

ImageNet: A large-scale hierarchical image database

Jia Deng, +5 more

Distinctive Image Features from Scale-Invariant Keypoints

David G. Lowe

- 01 Nov 2004 -

International Journal of Computer Vision

What does classifying more than 10,000 image categories tell us?

Citations

Deep Learning

The PASCAL Visual Object Classes Challenge

Building high-level features using large scale unsupervised learning

Image Classification with the Fisher Vector: Theory and Practice

NIPS 2016 Tutorial: Generative Adversarial Networks

References

ImageNet: A large-scale hierarchical image database

Distinctive Image Features from Scale-Invariant Keypoints

Histograms of oriented gradients for human detection

The Pascal Visual Object Classes (VOC) Challenge

WordNet : an electronic lexical database

Related Papers (5)

ImageNet: A large-scale hierarchical image database

Distinctive Image Features from Scale-Invariant Keypoints

Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

ImageNet Classification with Deep Convolutional Neural Networks

Histograms of oriented gradients for human detection