Open AccessPosted Content
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe,Christian Szegedy +1 more
Reads0
Chats0
TLDR
Batch Normalization as mentioned in this paper normalizes layer inputs for each training mini-batch to reduce the internal covariate shift in deep neural networks, and achieves state-of-the-art performance on ImageNet.Abstract:
Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.9% top-5 validation error (and 4.8% test error), exceeding the accuracy of human raters.read more
Citations
More filters
Posted Content
In Search of Lost Domain Generalization
Ishaan Gulrajani,David Lopez-Paz +1 more
TL;DR: This paper implements DomainBed, a testbed for domain generalization including seven multi-domain datasets, nine baseline algorithms, and three model selection criteria, and finds that, when carefully implemented, empirical risk minimization shows state-of-the-art performance across all datasets.
Posted Content
Exploring the Limits of Weakly Supervised Pretraining
Dhruv Mahajan,Ross Girshick,Vignesh Ramanathan,Kaiming He,Manohar Paluri,Yixuan Li,Ashwin Bharambe,Laurens van der Maaten +7 more
TL;DR: This paper presents a unique study of transfer learning with large convolutional networks trained to predict hashtags on billions of social media images and shows improvements on several image classification and object detection tasks, and reports the highest ImageNet-1k single-crop, top-1 accuracy to date.
Posted Content
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
Pratik Chaudhari,Pratik Chaudhari,Anna Choromanska,Stefano Soatto,Yann LeCun,Yann LeCun,Carlo Baldassi,Carlo Baldassi,Christian Borgs,Jennifer Chayes,Levent Sagun,Riccardo Zecchina,Riccardo Zecchina +12 more
TL;DR: This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape and compares favorably to state-of-the-art techniques in terms of generalization error and training time.
Book ChapterDOI
PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model
TL;DR: In this article, a CNN is used to detect individual keypoints and predict their relative displacements, allowing them to group keypoints into person pose instances and then associate semantic person pixels with their corresponding person instance, delivering instance-level person segmentations.
Book ChapterDOI
Ambient Sound Provides Supervision for Visual Learning
Andrew Owens,Jiajun Wu,Josh H. McDermott,William T. Freeman,William T. Freeman,Antonio Torralba +5 more
TL;DR: This work trains a convolutional neural network to predict a statistical summary of the sound associated with a video frame, and shows that this representation is comparable to that of other state-of-the-art unsupervised learning methods.
References
More filters
Journal ArticleDOI
Gradient-based learning applied to document recognition
Yann LeCun,Léon Bottou,Léon Bottou,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio,Patrick Haffner +6 more
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Journal Article
Dropout: a simple way to prevent neural networks from overfitting
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Proceedings ArticleDOI
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
TL;DR: In this paper, a Parametric Rectified Linear Unit (PReLU) was proposed to improve model fitting with nearly zero extra computational cost and little overfitting risk, which achieved a 4.94% top-5 test error on ImageNet 2012 classification dataset.
Journal ArticleDOI
Independent component analysis: algorithms and applications
Aapo Hyvärinen,Erkki Oja +1 more
TL;DR: The basic theory and applications of ICA are presented, and the goal is to find a linear representation of non-Gaussian data so that the components are statistically independent, or as independent as possible.
Journal Article
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
TL;DR: This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.