ImageNet Classification with Deep Convolutional Neural Networks

Home
/
Papers
/
ImageNet Classification with Deep Convolutional Neural Networks

Proceedings Article•

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky¹, Ilya Sutskever¹, Geoffrey E. Hinton¹•Institutions (1)

03 Dec 2012-Vol. 25, pp 1097-1105

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

read less

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Pansharpening by Convolutional Neural Networks

[...]

Giuseppe Masi, Davide Cozzolino, Luisa Verdoliva, Giuseppe Scarpa

14 Jul 2016-Remote Sensing

TL;DR: A new pansharpening method is proposed, based on convolutional neural networks, which is largely competitive with the current state of the art in terms of both full-reference and no-reference metrics, and also at a visual inspection.

...read moreread less

Abstract: A new pansharpening method is proposed, based on convolutional neural networks. We adapt a simple and effective three-layer architecture recently proposed for super-resolution to the pansharpening problem. Moreover, to improve performance without increasing complexity, we augment the input by including several maps of nonlinear radiometric indices typical of remote sensing. Experiments on three representative datasets show the proposed method to provide very promising results, largely competitive with the current state of the art in terms of both full-reference and no-reference metrics, and also at a visual inspection.

...read moreread less

719 citations

Proceedings Article•DOI•

Convolutional Neural Networks for human activity recognition using mobile sensors

[...]

Ming Zeng¹, Le T. Nguyen¹, Bo Yu¹, Ole J. Mengshoel¹, Jiang Zhu¹, Pang Wu¹, Joy Zhang¹ - Show less +3 more•Institutions (1)

Carnegie Mellon University¹

28 Nov 2014

TL;DR: An approach to automatically extract discriminative features for activity recognition based on Convolutional Neural Networks, which can capture local dependency and scale invariance of a signal as it has been shown in speech recognition and image recognition domains is proposed.

...read moreread less

Abstract: A variety of real-life mobile sensing applications are becoming available, especially in the life-logging, fitness tracking and health monitoring domains. These applications use mobile sensors embedded in smart phones to recognize human activities in order to get a better understanding of human behavior. While progress has been made, human activity recognition remains a challenging task. This is partly due to the broad range of human activities as well as the rich variation in how a given activity can be performed. Using features that clearly separate between activities is crucial. In this paper, we propose an approach to automatically extract discriminative features for activity recognition. Specifically, we develop a method based on Convolutional Neural Networks (CNN), which can capture local dependency and scale invariance of a signal as it has been shown in speech recognition and image recognition domains. In addition, a modified weight sharing technique, called partial weight sharing, is proposed and applied to accelerometer signals to get further improvements. The experimental results on three public datasets, Skoda (assembly line activities), Opportunity (activities in kitchen), Actitracker (jogging, walking, etc.), indicate that our novel CNN-based approach is practical and achieves higher accuracy than existing state-of-the-art methods.

...read moreread less

719 citations

Journal Article•DOI•

Convolutional networks for fast, energy-efficient neuromorphic computing

[...]

Steven K. Esser¹, Paul A. Merolla¹, John V. Arthur¹, Andrew S. Cassidy¹, Rathinakumar Appuswamy¹, Alexander Andreopoulos¹, David Berg¹, Jeffrey L. McKinstry¹, Timothy Melano¹, R Davis¹, Carmelo di Nolfo¹, Pallab Datta¹, Arnon Amir¹, Brian Taba¹, Myron D. Flickner¹, Dharmendra S. Modha¹ - Show less +12 more•Institutions (1)

IBM¹

11 Oct 2016-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: This approach allows the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors, bringing the promise of embedded, intelligent, brain-inspired computing one step closer.

...read moreread less

Abstract: Deep networks are now able to achieve human-level performance on a broad spectrum of recognition tasks. Independently, neuromorphic computing has now demonstrated unprecedented energy-efficiency through a new chip architecture based on spiking neurons, low precision synapses, and a scalable communication network. Here, we demonstrate that neuromorphic computing, despite its novel architectural primitives, can implement deep convolution networks that (i) approach state-of-the-art classification accuracy across eight standard datasets encompassing vision and speech, (ii) perform inference while preserving the hardware’s underlying energy-efficiency and high throughput, running on the aforementioned datasets at between 1,200 and 2,600 frames/s and using between 25 and 275 mW (effectively >6,000 frames/s per Watt), and (iii) can be specified and trained using backpropagation with the same ease-of-use as contemporary deep learning. This approach allows the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors, bringing the promise of embedded, intelligent, brain-inspired computing one step closer.

...read moreread less

719 citations

Proceedings Article•DOI•

Feature Denoising for Improving Adversarial Robustness

[...]

Cihang Xie¹, Yuxin Wu², Laurens van der Maaten², Alan L. Yuille¹, Kaiming He² - Show less +1 more•Institutions (2)

Johns Hopkins University¹, Facebook²

15 Jun 2019

TL;DR: It is suggested that adversarial perturbations on images lead to noise in the features constructed by these networks, and new network architectures are developed that increase adversarial robustness by performing feature denoising.

...read moreread less

Abstract: Adversarial attacks to image classification systems present challenges to convolutional networks and opportunities for understanding them. This study suggests that adversarial perturbations on images lead to noise in the features constructed by these networks. Motivated by this observation, we develop new network architectures that increase adversarial robustness by performing feature denoising. Specifically, our networks contain blocks that denoise the features using non-local means or other filters; the entire networks are trained end-to-end. When combined with adversarial training, our feature denoising networks substantially improve the state-of-the-art in adversarial robustness in both white-box and black-box attack settings. On ImageNet, under 10-iteration PGD white-box attacks where prior art has 27.9% accuracy, our method achieves 55.7%; even under extreme 2000-iteration PGD white-box attacks, our method secures 42.6% accuracy. Our method was ranked first in Competition on Adversarial Attacks and Defenses (CAAD) 2018 --- it achieved 50.6% classification accuracy on a secret, ImageNet-like test dataset against 48 unknown attackers, surpassing the runner-up approach by ~10%. Code is available at https://github.com/facebookresearch/ImageNet-Adversarial-Training.

...read moreread less

717 citations

Journal Article•DOI•

Scale-Aware Fast R-CNN for Pedestrian Detection

[...]

Jianan Li¹, Xiaodan Liang², Sheng Mei Shen³, Tingfa Xu¹, Jiashi Feng⁴, Shuicheng Yan⁴ - Show less +2 more•Institutions (4)

Beijing Institute of Technology¹, Carnegie Mellon University², Panasonic³, National University of Singapore⁴

01 Apr 2018-IEEE Transactions on Multimedia

TL;DR: SAF R-CNN as discussed by the authors introduces multiple built-in subnetworks which detect pedestrians with scales from disjoint ranges, and outputs from all of the sub-networks are then adaptively combined to generate the final detection results that are shown to be robust to large variance in instance scales.

...read moreread less

Abstract: In this paper, we consider the problem of pedestrian detection in natural scenes. Intuitively, instances of pedestrians with different spatial scales may exhibit dramatically different features. Thus, large variance in instance scales, which results in undesirable large intracategory variance in features, may severely hurt the performance of modern object instance detection methods. We argue that this issue can be substantially alleviated by the divide-and-conquer philosophy. Taking pedestrian detection as an example, we illustrate how we can leverage this philosophy to develop a Scale-Aware Fast R-CNN (SAF R-CNN) framework. The model introduces multiple built-in subnetworks which detect pedestrians with scales from disjoint ranges. Outputs from all of the subnetworks are then adaptively combined to generate the final detection results that are shown to be robust to large variance in instance scales, via a gate function defined over the sizes of object proposals. Extensive evaluations on several challenging pedestrian detection datasets well demonstrate the effectiveness of the proposed SAF R-CNN. Particularly, our method achieves state-of-the-art performance on Caltech [P. Dollar, C. Wojek, B. Schiele, and P. Perona, “Pedestrian detection: An evaluation of the state of the art,” IEEE Trans. Pattern Anal. Mach. Intell. , vol. 34, no. 4, pp. 743–761, Apr. 2012], and obtains competitive results on INRIA [N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , 2005, pp. 886–893], ETH [A. Ess, B. Leibe, and L. V. Gool, “Depth and appearance for mobile scene analysis,” in Proc. Int. Conf. Comput. Vis ., 2007, pp. 1–8], and KITTI [A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit ., 2012, pp. 3354–3361].

...read moreread less

716 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
…
170
171
172
173
174
175
176
…
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Random Forests

[...]

Leo Breiman¹•Institutions (1)

University of California, Berkeley¹

01 Oct 2001

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.

...read moreread less

Abstract: Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, aaa, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.

...read moreread less

79,257 citations

Proceedings Article•DOI•

ImageNet: A large-scale hierarchical image database

[...]

Jia Deng¹, Wei Dong¹, Richard Socher¹, Li-Jia Li¹, Kai Li¹, Li Fei-Fei¹ - Show less +2 more•Institutions (1)

Princeton University¹

20 Jun 2009

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

Abstract: The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce here a new database called “ImageNet”, a large-scale ontology of images built upon the backbone of the WordNet structure. ImageNet aims to populate the majority of the 80,000 synsets of WordNet with an average of 500-1000 clean and full resolution images. This will result in tens of millions of annotated images organized by the semantic hierarchy of WordNet. This paper offers a detailed analysis of ImageNet in its current state: 12 subtrees with 5247 synsets and 3.2 million images in total. We show that ImageNet is much larger in scale and diversity and much more accurate than the current image datasets. Constructing such a large-scale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We hope that the scale, accuracy, diversity and hierarchical structure of ImageNet can offer unparalleled opportunities to researchers in the computer vision community and beyond.

...read moreread less

49,639 citations

Book Chapter•DOI•

Learning internal representations by error propagation

[...]

David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams

01 Jan 1988

TL;DR: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion.

...read moreread less

Abstract: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion

...read moreread less

17,604 citations

Dissertation•

Learning Multiple Layers of Features from Tiny Images

[...]

Alex Krizhevsky¹•Institutions (1)

University of Toronto¹

01 Jan 2009

TL;DR: In this paper, the authors describe how to train a multi-layer generative model of natural images, using a dataset of millions of tiny colour images, described in the next section.

...read moreread less

Abstract: In this work we describe how to train a multi-layer generative model of natural images. We use a dataset of millions of tiny colour images, described in the next section. This has been attempted by several groups but without success. The models on which we focus are RBMs (Restricted Boltzmann Machines) and DBNs (Deep Belief Networks). These models learn interesting-looking filters, which we show are more useful to a classifier than the raw pixels. We train the classifier on a labeled subset that we have collected and call the CIFAR-10 dataset.

...read moreread less

15,005 citations

Proceedings Article•

Rectified Linear Units Improve Restricted Boltzmann Machines

[...]

Vinod Nair¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

21 Jun 2010

TL;DR: Restricted Boltzmann machines were developed using binary stochastic hidden units that learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset.

...read moreread less

Abstract: Restricted Boltzmann machines were developed using binary stochastic hidden units. These can be generalized by replacing each binary unit by an infinite number of copies that all have the same weights but have progressively more negative biases. The learning and inference rules for these "Stepped Sigmoid Units" are unchanged. They can be approximated efficiently by noisy, rectified linear units. Compared with binary units, these units learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset. Unlike binary units, rectified linear units preserve information about relative intensities as information travels through multiple layers of feature detectors.

...read moreread less

14,799 citations