scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

Jia Deng1, Wei Dong1, Richard Socher1, Li-Jia Li1, Kai Li1, Li Fei-Fei1 
20 Jun 2009-pp 248-255
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Abstract: The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce here a new database called “ImageNet”, a large-scale ontology of images built upon the backbone of the WordNet structure. ImageNet aims to populate the majority of the 80,000 synsets of WordNet with an average of 500-1000 clean and full resolution images. This will result in tens of millions of annotated images organized by the semantic hierarchy of WordNet. This paper offers a detailed analysis of ImageNet in its current state: 12 subtrees with 5247 synsets and 3.2 million images in total. We show that ImageNet is much larger in scale and diversity and much more accurate than the current image datasets. Constructing such a large-scale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We hope that the scale, accuracy, diversity and hierarchical structure of ImageNet can offer unparalleled opportunities to researchers in the computer vision community and beyond.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings Article
03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

73,978 citations

Proceedings Article
04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

55,235 citations

Proceedings Article
01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

49,914 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

40,257 citations

Book
18 Nov 2016
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

38,208 citations

References
More filters
Book ChapterDOI
20 Oct 2008
TL;DR: This work presents a discriminative learning process which employs active, online learning to quickly classify many images with minimal user input, and demonstrates precision which is often superior to the state-of-the-art, with scalability which exceeds previous work.
Abstract: As computer vision research considers more object categories and greater variation within object categories, it is clear that larger and more exhaustive datasets are necessary. However, the process of collecting such datasets is laborious and monotonous. We consider the setting in which many images have been automatically collected for a visual category (typically by automatic internet search), and we must separate relevant images from noise. We present a discriminative learning process which employs active, online learning to quickly classify many images with minimal user input. The principle advantage of this work over previous endeavors is its scalability. We demonstrate precision which is often superior to the state-of-the-art, with scalability which exceeds previous work.

150 citations


"ImageNet: A large-scale hierarchica..." refers methods in this paper

  • ...In this case, we use an AdaBoost-based classifier proposed by [6]....

    [...]

Proceedings ArticleDOI
26 Dec 2007
TL;DR: An interesting computational property of the object hierarchy is observed: comparing the recognition rate when using models of objects at different levels, the higher more inclusive levels exhibit higher recall but lower precision when compared with the class specific level.
Abstract: We investigated the computational properties of natural object hierarchy in the context of constellation object class models, and its utility for object class recognition. We first observed an interesting computational property of the object hierarchy: comparing the recognition rate when using models of objects at different levels, the higher more inclusive levels (e.g., closed-frame vehicles or vehicles) exhibit higher recall but lower precision when compared with the class specific level (e.g., bus). These inherent differences suggest that combining object classifiers from different hierarchical levels into a single classifier may improve classification, as it appears like these models capture different aspects of the object. We describe a method to combine these classifiers, and analyze the conditions under which improvement can be guaranteed. When given a small sample of a new object class, we describe a method to transfer knowledge across the tree hierarchy, between related objects. Finally, we describe extensive experiments using object hierarchies obtained from publicly available datasets, and show that the combined classifiers significantly improve recognition results.

136 citations


Additional excerpts

  • ...[16, 17, 28, 18])....

    [...]

01 Jan 2007
TL;DR: This paper outlines the architecture of a semantic database for Dutch, developed in the Cornetto project, and its possible usage in language technology and information access applications, and gives an overview of possible application areas in order to inspire future research based on theCornetto database.
Abstract: We outline the architecture of a semantic database for Dutch, developed in the Cornetto project, and its possible usage in language technology and information access applications. The database combines the Dutch part of EuroWordNet with the Referentie Bestand Nederlands. The resulting database is also aligned with the English WordNet and with a formal ontology. As such it represents a unique database with rich semantic and combinatoric information. There are many ways in which this knowledge can be exploited. In this paper we give an overview of possible application areas in order to inspire future research based on the Cornetto database.

40 citations


"ImageNet: A large-scale hierarchica..." refers methods in this paper

  • ...We obtain accurate translations by WordNets in those languages [3, 2, 4, 26]....

    [...]

Book ChapterDOI
17 Sep 1997
TL;DR: A prototype of the Italian version of WordNet is presented, a general computational lexical resource, which is coupled with a parser and a number of experiments have been performed to individuate the methodology with the best trade-off between disambiguation rate and precision.
Abstract: We present a prototype of the Italian version of WordNet, a general computational lexical resource. Some relevant extensions are discussed to make it usable for parsing: in particular we add verbal selectional restrictions to make lexical discrimination effective. Italian WordNet has been coupled with a parser and a number of experiments have been performed to individuate the methodology with the best trade-off between disambiguation rate and precision. Results confirm intuitive hypothesis on the role of selectional restrictions and show evidences for a WordNet-like organization of lexical senses.

26 citations


"ImageNet: A large-scale hierarchica..." refers methods in this paper

  • ...We obtain accurate translations by WordNets in those languages [3, 2, 4, 26]....

    [...]

Journal ArticleDOI
TL;DR: It is suggested that a labeled benchmark of the type provided, containing a large number of related classes is crucial for the development and evaluation of classification methods which make efficient use of interclass transfer.
Abstract: Current object recognition systems aim at recognizing numerous object classes under limited supervision conditions. This paper provides a benchmark for evaluating progress on this fundamental task. Several methods have recently proposed to utilize the commonalities between object classes in order to improve generalization accuracy. Such methods can be termed interclass transfer techniques. However, it is currently difficult to asses which of the proposed methods maximally utilizes the shared structure of related classes. In order to facilitate the development, as well as the assessment of methods for dealing with multiple related classes, a new dataset including images of several hundred mammal classes, is provided, together with preliminary results of its use. The images in this dataset are organized into five levels of variability, and their labels include information on the objects' identity, location and pose. From this dataset, a classification benchmark has been derived, requiring fine distinctions between 72 mammal classes. It is then demonstrated that a recognition method which is highly successful on the Caltech101, attains limited accuracy on the current benchmark (36.5%). Since this method does not utilize the shared structure between classes, the question remains as to whether interclass transfer methods can increase the accuracy to the level of human performance (90%). We suggest that a labeled benchmark of the type provided, containing a large number of related classes is crucial for the development and evaluation of classification methods which make efficient use of interclass transfer.

22 citations


"ImageNet: A large-scale hierarchica..." refers methods in this paper

  • ...pose datasets, such as FERET faces [19], Labeled faces in the Wild [13] and the Mammal Benchmark by Fink and Ullman [ 11 ] are not included....

    [...]