ImageNet: A large-scale hierarchical image database

doi:10.1109/CVPR.2009.5206848

Home
/
Papers
/
ImageNet: A large-scale hierarchical image database

Proceedings Article•DOI•

ImageNet: A large-scale hierarchical image database

Jia Deng¹, Wei Dong¹, Richard Socher¹, Li-Jia Li¹, Kai Li¹, Li Fei-Fei¹ - Show less +2 more•Institutions (1)

Princeton University¹

20 Jun 2009-pp 248-255

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

read less

Abstract: The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce here a new database called “ImageNet”, a large-scale ontology of images built upon the backbone of the WordNet structure. ImageNet aims to populate the majority of the 80,000 synsets of WordNet with an average of 500-1000 clean and full resolution images. This will result in tens of millions of annotated images organized by the semantic hierarchy of WordNet. This paper offers a detailed analysis of ImageNet in its current state: 12 subtrees with 5247 synsets and 3.2 million images in total. We show that ImageNet is much larger in scale and diversity and much more accurate than the current image datasets. Constructing such a large-scale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We hope that the scale, accuracy, diversity and hierarchical structure of ImageNet can offer unparalleled opportunities to researchers in the computer vision community and beyond.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•

ImageNet Classification with Deep Convolutional Neural Networks

[...]

Alex Krizhevsky¹, Ilya Sutskever¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

...read moreread less

73,978 citations

Proceedings Article•

Very Deep Convolutional Networks for Large-Scale Image Recognition

[...]

Karen Simonyan¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

04 Sep 2014

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

...read moreread less

55,235 citations

Proceedings Article•

Very Deep Convolutional Networks for Large-Scale Image Recognition

[...]

Karen Simonyan¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

01 Jan 2015

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

49,914 citations

Proceedings Article•DOI•

Going deeper with convolutions

[...]

Christian Szegedy¹, Wei Liu², Yangqing Jia¹, Pierre Sermanet¹, Scott Reed³, Dragomir Anguelov¹, Dumitru Erhan¹, Vincent Vanhoucke¹, Andrew Rabinovich - Show less +5 more•Institutions (3)

Google¹, University of North Carolina at Chapel Hill², University of Michigan³

07 Jun 2015

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

...read moreread less

40,257 citations

Book•

Deep Learning

[...]

Ian Goodfellow¹, Yoshua Bengio², Aaron Courville²•Institutions (2)

Google¹, Université de Montréal²

18 Nov 2016

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

...read moreread less

Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

...read moreread less

38,208 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book Chapter•DOI•

Towards Scalable Dataset Construction: An Active Learning Approach

[...]

Brendan M. Collins¹, Jia Deng¹, Kai Li¹, Li Fei-Fei¹•Institutions (1)

Princeton University¹

20 Oct 2008

TL;DR: This work presents a discriminative learning process which employs active, online learning to quickly classify many images with minimal user input, and demonstrates precision which is often superior to the state-of-the-art, with scalability which exceeds previous work.

...read moreread less

Abstract: As computer vision research considers more object categories and greater variation within object categories, it is clear that larger and more exhaustive datasets are necessary. However, the process of collecting such datasets is laborious and monotonous. We consider the setting in which many images have been automatically collected for a visual category (typically by automatic internet search), and we must separate relevant images from noise. We present a discriminative learning process which employs active, online learning to quickly classify many images with minimal user input. The principle advantage of this work over previous endeavors is its scalability. We demonstrate precision which is often superior to the state-of-the-art, with scalability which exceeds previous work.

...read moreread less

150 citations

"ImageNet: A large-scale hierarchica..." refers methods in this paper

...In this case, we use an AdaBoost-based classifier proposed by [6]....
[...]

Proceedings Article•DOI•

Exploiting Object Hierarchy: Combining Models from Different Category Levels

[...]

Alon Zweig¹, Daphna Weinshall¹•Institutions (1)

Hebrew University of Jerusalem¹

26 Dec 2007

TL;DR: An interesting computational property of the object hierarchy is observed: comparing the recognition rate when using models of objects at different levels, the higher more inclusive levels exhibit higher recall but lower precision when compared with the class specific level.

...read moreread less

Abstract: We investigated the computational properties of natural object hierarchy in the context of constellation object class models, and its utility for object class recognition. We first observed an interesting computational property of the object hierarchy: comparing the recognition rate when using models of objects at different levels, the higher more inclusive levels (e.g., closed-frame vehicles or vehicles) exhibit higher recall but lower precision when compared with the class specific level (e.g., bus). These inherent differences suggest that combining object classifiers from different hierarchical levels into a single classifier may improve classification, as it appears like these models capture different aspects of the object. We describe a method to combine these classifiers, and analyze the conditions under which improvement can be guaranteed. When given a small sample of a new object class, we describe a method to transfer knowledge across the tree hierarchy, between related objects. Finally, we describe extensive experiments using object hierarchies obtained from publicly available datasets, and show that the combined classifiers significantly improve recognition results.

...read moreread less

136 citations

Additional excerpts

...[16, 17, 28, 18])....
[...]

The Cornetto Database: Architecture and User-Scenarios

[...]

Piek Vossen¹, Katja Hofmann², M. de Rijke², E.F. Tjong Kim Sang², K. Deschacht³ - Show less +1 more•Institutions (3)

VU University Amsterdam¹, University of Amsterdam², Katholieke Universiteit Leuven³

01 Jan 2007

TL;DR: This paper outlines the architecture of a semantic database for Dutch, developed in the Cornetto project, and its possible usage in language technology and information access applications, and gives an overview of possible application areas in order to inspire future research based on theCornetto database.

...read moreread less

Abstract: We outline the architecture of a semantic database for Dutch, developed in the Cornetto project, and its possible usage in language technology and information access applications. The database combines the Dutch part of EuroWordNet with the Referentie Bestand Nederlands. The resulting database is also aligned with the English WordNet and with a formal ontology. As such it represents a unique database with rich semantic and combinatoric information. There are many ways in which this knowledge can be exploited. In this paper we give an overview of possible application areas in order to inspire future research based on the Cornetto database.

...read moreread less

40 citations

"ImageNet: A large-scale hierarchica..." refers methods in this paper

...We obtain accurate translations by WordNets in those languages [3, 2, 4, 26]....
[...]

Book Chapter•DOI•

WordNet for Italian and Its Use for Lexical Deiscrimination

[...]

Alessandro Artale, Bernardo Magnini, Carlo Strapparava

17 Sep 1997

TL;DR: A prototype of the Italian version of WordNet is presented, a general computational lexical resource, which is coupled with a parser and a number of experiments have been performed to individuate the methodology with the best trade-off between disambiguation rate and precision.

...read moreread less

Abstract: We present a prototype of the Italian version of WordNet, a general computational lexical resource. Some relevant extensions are discussed to make it usable for parsing: in particular we add verbal selectional restrictions to make lexical discrimination effective. Italian WordNet has been coupled with a parser and a number of experiments have been performed to individuate the methodology with the best trade-off between disambiguation rate and precision. Results confirm intuitive hypothesis on the role of selectional restrictions and show evidences for a WordNet-like organization of lexical senses.

...read moreread less

26 citations

"ImageNet: A large-scale hierarchica..." refers methods in this paper

...We obtain accurate translations by WordNets in those languages [3, 2, 4, 26]....
[...]

Journal Article•DOI•

From Aardvark to Zorro: A Benchmark for Mammal Image Classification

[...]

Michael Fink¹, Shimon Ullman²•Institutions (2)

Interdisciplinary Center for Neural Computation¹, Weizmann Institute of Science²

01 May 2008-International Journal of Computer Vision

TL;DR: It is suggested that a labeled benchmark of the type provided, containing a large number of related classes is crucial for the development and evaluation of classification methods which make efficient use of interclass transfer.

...read moreread less

Abstract: Current object recognition systems aim at recognizing numerous object classes under limited supervision conditions. This paper provides a benchmark for evaluating progress on this fundamental task. Several methods have recently proposed to utilize the commonalities between object classes in order to improve generalization accuracy. Such methods can be termed interclass transfer techniques. However, it is currently difficult to asses which of the proposed methods maximally utilizes the shared structure of related classes. In order to facilitate the development, as well as the assessment of methods for dealing with multiple related classes, a new dataset including images of several hundred mammal classes, is provided, together with preliminary results of its use. The images in this dataset are organized into five levels of variability, and their labels include information on the objects' identity, location and pose. From this dataset, a classification benchmark has been derived, requiring fine distinctions between 72 mammal classes. It is then demonstrated that a recognition method which is highly successful on the Caltech101, attains limited accuracy on the current benchmark (36.5%). Since this method does not utilize the shared structure between classes, the question remains as to whether interclass transfer methods can increase the accuracy to the level of human performance (90%). We suggest that a labeled benchmark of the type provided, containing a large number of related classes is crucial for the development and evaluation of classification methods which make efficient use of interclass transfer.

...read moreread less

22 citations

"ImageNet: A large-scale hierarchica..." refers methods in this paper

...pose datasets, such as FERET faces [19], Labeled faces in the Wild [13] and the Mammal Benchmark by Fink and Ullman [ 11 ] are not included....
[...]