ImageNet Classification with Deep Convolutional Neural Networks

Home
/
Papers
/
ImageNet Classification with Deep Convolutional Neural Networks

Proceedings Article•

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky¹, Ilya Sutskever¹, Geoffrey E. Hinton¹•Institutions (1)

03 Dec 2012-Vol. 25, pp 1097-1105

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

read less

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A deep learning framework for financial time series using stacked autoencoders and long-short term memory

[...]

Wei Bao¹, Jun Yue², Yulei Rao¹•Institutions (2)

Central South University¹, Peking University²

14 Jul 2017-PLOS ONE

TL;DR: A novel deep learning framework where wavelet transforms, stacked autoencoders and long-short term memory are combined for stock price forecasting and shows that the proposed model outperforms other similar models in both predictive accuracy and profitability performance.

...read moreread less

Abstract: The application of deep learning approaches to finance has received a great deal of attention from both investors and researchers. This study presents a novel deep learning framework where wavelet transforms (WT), stacked autoencoders (SAEs) and long-short term memory (LSTM) are combined for stock price forecasting. The SAEs for hierarchically extracted deep features is introduced into stock price forecasting for the first time. The deep learning framework comprises three stages. First, the stock price time series is decomposed by WT to eliminate noise. Second, SAEs is applied to generate deep high-level features for predicting the stock price. Third, high-level denoising features are fed into LSTM to forecast the next day’s closing price. Six market indices and their corresponding index futures are chosen to examine the performance of the proposed model. Results show that the proposed model outperforms other similar models in both predictive accuracy and profitability performance.

...read moreread less

656 citations

Posted Content•

A Simple Way to Initialize Recurrent Networks of Rectified Linear Units

[...]

Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton

03 Apr 2015-arXiv: Neural and Evolutionary Computing

TL;DR: This paper proposes a simpler solution that use recurrent neural networks composed of rectified linear units that is comparable to LSTM on four benchmarks: two toy problems involving long-range temporal structures, a large language modeling problem and a benchmark speech recognition problem.

...read moreread less

Abstract: Learning long term dependencies in recurrent networks is difficult due to vanishing and exploding gradients. To overcome this difficulty, researchers have developed sophisticated optimization techniques and network architectures. In this paper, we propose a simpler solution that use recurrent neural networks composed of rectified linear units. Key to our solution is the use of the identity matrix or its scaled version to initialize the recurrent weight matrix. We find that our solution is comparable to LSTM on our four benchmarks: two toy problems involving long-range temporal structures, a large language modeling problem and a benchmark speech recognition problem.

...read moreread less

655 citations

Proceedings Article•DOI•

Pose-Driven Deep Convolutional Model for Person Re-identification

[...]

Chi Su¹, Jianing Li¹, Shiliang Zhang¹, Junliang Xing², Wen Gao¹, Qi Tian³ - Show less +2 more•Institutions (3)

Peking University¹, Chinese Academy of Sciences², University of Texas at San Antonio³

25 Sep 2017

TL;DR: Zhang et al. as mentioned in this paper proposed a Pose-driven Deep Convolutional (PDC) model to learn improved feature extraction and matching models from end to end, which explicitly leverages the human part cues to alleviate the pose variations and learn robust feature representations from both the global image and different local parts.

...read moreread less

Abstract: Feature extraction and matching are two crucial components in person Re-Identification (ReID). The large pose deformations and the complex view variations exhibited by the captured person images significantly increase the difficulty of learning and matching of the features from person images. To overcome these difficulties, in this work we propose a Pose-driven Deep Convolutional (PDC) model to learn improved feature extraction and matching models from end to end. Our deep architecture explicitly leverages the human part cues to alleviate the pose variations and learn robust feature representations from both the global image and different local parts. To match the features from global human body and local body parts, a pose driven feature weighting sub-network is further designed to learn adaptive feature fusions. Extensive experimental analyses and results on three popular datasets demonstrate significant performance improvements of our model over all published state-of-the-art methods.

...read moreread less

655 citations

Journal Article•DOI•

Real-time cryo-electron microscopy data preprocessing with Warp.

[...]

Dimitry Tegunov¹, Patrick Cramer¹•Institutions (1)

Max Planck Society¹

07 Oct 2019-Nature Methods

TL;DR: Warp is described, a software that automates all preprocessing steps of cryo-EM data acquisition and enables real-time evaluation, and includes deep-learning-based models for accurate particle picking and image denoising.

...read moreread less

Abstract: The acquisition of cryo-electron microscopy (cryo-EM) data from biological specimens must be tightly coupled to data preprocessing to ensure the best data quality and microscope usage. Here we describe Warp, a software that automates all preprocessing steps of cryo-EM data acquisition and enables real-time evaluation. Warp corrects micrographs for global and local motion, estimates the local defocus and monitors key parameters for each recorded micrograph or tomographic tilt series in real time. The software further includes deep-learning-based models for accurate particle picking and image denoising. The output from Warp can be fed into established programs for particle classification and 3D-map refinement. Our benchmarks show improvement in the nominal resolution, which went from 3.9 A to 3.2 A, of a published cryo-EM data set for influenza virus hemagglutinin. Warp is easy to install from http://github.com/cramerlab/warp and computationally inexpensive, and has an intuitive, streamlined user interface. The user-friendly software tool Warp enables automated, on-the-fly preprocessing of cryo-EM data, including motion correction, defocus estimation, particle picking and image denoising.

...read moreread less

655 citations

Proceedings Article•DOI•

Deeply-Learned Part-Aligned Representations for Person Re-identification

[...]

Liming Zhao¹, Xi Li¹, Yueting Zhuang¹, Jingdong Wang²•Institutions (2)

Zhejiang University¹, Microsoft²

01 Oct 2017

TL;DR: This paper proposes a simple yet effective human part-aligned representation for handling the body part misalignment problem, and shows state-of-the-art results over standard datasets, Market-1501,CUHK03, CUHK01 and VIPeR.

...read moreread less

Abstract: In this paper, we address the problem of person re-identification, which refers to associating the persons captured from different cameras. We propose a simple yet effective human part-aligned representation for handling the body part misalignment problem. Our approach decomposes the human body into regions (parts) which are discriminative for person matching, accordingly computes the representations over the regions, and aggregates the similarities computed between the corresponding regions of a pair of probe and gallery images as the overall matching score. Our formulation, inspired by attention models, is a deep neural network modeling the three steps together, which is learnt through minimizing the triplet loss function without requiring body part labeling information. Unlike most existing deep learning algorithms that learn a global or spatial partition-based local representation, our approach performs human body partition, and thus is more robust to pose changes and various human spatial distributions in the person bounding box. Our approach shows state-of-the-art results over standard datasets, Market-1501, CUHK03, CUHK01 and VIPeR. 1

...read moreread less

653 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
…
189
190
191
192
193
194
195
…
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Random Forests

[...]

Leo Breiman¹•Institutions (1)

University of California, Berkeley¹

01 Oct 2001

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.

...read moreread less

Abstract: Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, aaa, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.

...read moreread less

79,257 citations

Proceedings Article•DOI•

ImageNet: A large-scale hierarchical image database

[...]

Jia Deng¹, Wei Dong¹, Richard Socher¹, Li-Jia Li¹, Kai Li¹, Li Fei-Fei¹ - Show less +2 more•Institutions (1)

Princeton University¹

20 Jun 2009

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

Abstract: The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce here a new database called “ImageNet”, a large-scale ontology of images built upon the backbone of the WordNet structure. ImageNet aims to populate the majority of the 80,000 synsets of WordNet with an average of 500-1000 clean and full resolution images. This will result in tens of millions of annotated images organized by the semantic hierarchy of WordNet. This paper offers a detailed analysis of ImageNet in its current state: 12 subtrees with 5247 synsets and 3.2 million images in total. We show that ImageNet is much larger in scale and diversity and much more accurate than the current image datasets. Constructing such a large-scale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We hope that the scale, accuracy, diversity and hierarchical structure of ImageNet can offer unparalleled opportunities to researchers in the computer vision community and beyond.

...read moreread less

49,639 citations

Book Chapter•DOI•

Learning internal representations by error propagation

[...]

David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams

01 Jan 1988

TL;DR: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion.

...read moreread less

Abstract: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion

...read moreread less

17,604 citations

Dissertation•

Learning Multiple Layers of Features from Tiny Images

[...]

Alex Krizhevsky¹•Institutions (1)

University of Toronto¹

01 Jan 2009

TL;DR: In this paper, the authors describe how to train a multi-layer generative model of natural images, using a dataset of millions of tiny colour images, described in the next section.

...read moreread less

Abstract: In this work we describe how to train a multi-layer generative model of natural images. We use a dataset of millions of tiny colour images, described in the next section. This has been attempted by several groups but without success. The models on which we focus are RBMs (Restricted Boltzmann Machines) and DBNs (Deep Belief Networks). These models learn interesting-looking filters, which we show are more useful to a classifier than the raw pixels. We train the classifier on a labeled subset that we have collected and call the CIFAR-10 dataset.

...read moreread less

15,005 citations

Proceedings Article•

Rectified Linear Units Improve Restricted Boltzmann Machines

[...]

Vinod Nair¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

21 Jun 2010

TL;DR: Restricted Boltzmann machines were developed using binary stochastic hidden units that learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset.

...read moreread less

Abstract: Restricted Boltzmann machines were developed using binary stochastic hidden units. These can be generalized by replacing each binary unit by an infinite number of copies that all have the same weights but have progressively more negative biases. The learning and inference rules for these "Stepped Sigmoid Units" are unchanged. They can be approximated efficiently by noisy, rectified linear units. Compared with binary units, these units learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset. Unlike binary units, rectified linear units preserve information about relative intensities as information travels through multiple layers of feature detectors.

...read moreread less

14,799 citations