Open AccessProceedings Article
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe,Christian Szegedy +1 more
- Vol. 1, pp 448-456
TLDR
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.Abstract:
Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82% top-5 test error, exceeding the accuracy of human raters.read more
Citations
More filters
Proceedings Article
Coupled Generative Adversarial Networks
Ming-Yu Liu,Oncel Tuzel +1 more
TL;DR: This work proposes coupled generative adversarial network (CoGAN), which can learn a joint distribution without any tuple of corresponding images, and applies it to several joint distribution learning tasks, and demonstrates its applications to domain adaptation and image transformation.
Book ChapterDOI
BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation
TL;DR: BiSeNet as discussed by the authors designs a spatial path with a small stride to preserve the spatial information and generate high-resolution features, while a context path with fast downsampling strategy is employed to obtain sufficient receptive field.
Posted Content
LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop
TL;DR: This work proposes to amplify human effort through a partially automated labeling scheme, leveraging deep learning with humans in the loop, and constructs a new image dataset, LSUN, which contains around one million labeled images for each of 10 scene categories and 20 object categories.
Proceedings Article
PointCNN: convolution on Χ -transformed points
TL;DR: This work proposes to learn an Χ-transformation from the input points to simultaneously promote two causes: the first is the weighting of the input features associated with the points, and the second is the permutation of the points into a latent and potentially canonical order.
Proceedings Article
Deep speech 2: end-to-end speech recognition in English and mandarin
Dario Amodei,Sundaram Ananthanarayanan,Rishita Anubhai,Jingliang Bai,Eric Battenberg,Carl Case,Jared Casper,Bryan Catanzaro,Qiang Cheng,Guoliang Chen,Jie Chen,Jingdong Chen,Zhijie Chen,Mike Chrzanowski,Adam Coates,Greg Diamos,Ke Ding,Niandong Du,Erich Elsen,Jesse Engel,Weiwei Fang,Linxi Fan,Christopher Fougner,Liang Gao,Caixia Gong,Awni Hannun,Tony X. Han,Lappi Vaino Johannes,Bing Jiang,Cai Ju,Billy Jun,Patrick LeGresley,Libby Lin,Junjie Liu,Yang Liu,Weigao Li,Xiangang Li,Dongpeng Ma,Sharan Narang,Andrew Y. Ng,Sherjil Ozair,Yiping Peng,Ryan Prenger,Sheng Qian,Zongfeng Quan,Jonathan Raiman,Vinay Rao,Sanjeev Satheesh,David Seetapun,Shubho Sengupta,Kavya Srinet,Anuroop Sriram,Haiyuan Tang,Liliang Tang,Chong Wang,Jidong Wang,Kaifu Wang,Yi Wang,Zhijian Wang,Zhiqian Wang,Shuang Wu,Likai Wei,Bo Xiao,Wen Xie,Yan Xie,Dani Yogatama,Bin Yuan,Jun Zhan,Zhenyao Zhu +68 more
TL;DR: In this article, an end-to-end deep learning approach was used to recognize either English or Mandarin Chinese speech-two vastly different languages-using HPC techniques, enabling experiments that previously took weeks to now run in days.
References
More filters
Journal ArticleDOI
Gradient-based learning applied to document recognition
Yann LeCun,Léon Bottou,Léon Bottou,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio,Patrick Haffner +6 more
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Proceedings ArticleDOI
Going deeper with convolutions
Christian Szegedy,Wei Liu,Yangqing Jia,Pierre Sermanet,Scott Reed,Dragomir Anguelov,Dumitru Erhan,Vincent Vanhoucke,Andrew Rabinovich +8 more
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Journal Article
Dropout: a simple way to prevent neural networks from overfitting
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Journal ArticleDOI
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky,Jia Deng,Hao Su,Jonathan Krause,Sanjeev Satheesh,Sean Ma,Zhiheng Huang,Andrej Karpathy,Aditya Khosla,Michael S. Bernstein,Alexander C. Berg,Li Fei-Fei +11 more
TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Proceedings Article
Rectified Linear Units Improve Restricted Boltzmann Machines
Vinod Nair,Geoffrey E. Hinton +1 more
TL;DR: Restricted Boltzmann machines were developed using binary stochastic hidden units that learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset.