Open AccessProceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TLDR
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.Abstract:
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.read more
Citations
More filters
Posted Content
Denoising Diffusion Probabilistic Models
TL;DR: High quality image synthesis results are presented using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics, which naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding.
Posted Content
In Defense of the Triplet Loss for Person Re-Identification
TL;DR: It is shown that, for models trained from scratch as well as pretrained ones, using a variant of the triplet loss to perform end-to-end deep metric learning outperforms most other published methods by a large margin.
Proceedings Article
Spectral Normalization for Generative Adversarial Networks
TL;DR: In this paper, the authors proposed a novel weight normalization technique called spectral normalization to stabilize the training of the discriminator, which is computationally light and easy to incorporate into existing implementations.
Proceedings Article
Neural Architecture Search with Reinforcement Learning
Barret Zoph,Quoc V. Le +1 more
TL;DR: In this paper, the authors use a recurrent network to generate the model descriptions of neural networks and train this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set.
Journal ArticleDOI
Grandmaster level in StarCraft II using multi-agent reinforcement learning.
Oriol Vinyals,Igor Babuschkin,Wojciech Marian Czarnecki,Michael Mathieu,Andrew Dudzik,Junyoung Chung,David H. Choi,Richard E. Powell,Timo Ewalds,Petko Georgiev,Junhyuk Oh,Dan Horgan,Manuel Kroiss,Ivo Danihelka,Aja Huang,Laurent Sifre,Trevor Cai,John P. Agapiou,Max Jaderberg,Alexander Vezhnevets,Rémi Leblond,Tobias Pohlen,Valentin Dalibard,David Budden,Yury Sulsky,James Molloy,Tom Le Paine,Caglar Gulcehre,Ziyu Wang,Tobias Pfaff,Yuhuai Wu,Roman Ring,Dani Yogatama,Dario Wünsch,Katrina McKinney,Oliver Smith,Tom Schaul,Timothy P. Lillicrap,Koray Kavukcuoglu,Demis Hassabis,Chris Apps,David Silver +41 more
TL;DR: The agent, AlphaStar, is evaluated, which uses a multi-agent reinforcement learning algorithm and has reached Grandmaster level, ranking among the top 0.2% of human players for the real-time strategy game StarCraft II.
References
More filters
Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article
Auto-Encoding Variational Bayes
Diederik P. Kingma,Max Welling +1 more
TL;DR: A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.
Journal ArticleDOI
Reducing the Dimensionality of Data with Neural Networks
TL;DR: In this article, an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data is described.
Journal ArticleDOI
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
Geoffrey E. Hinton,Li Deng,Dong Yu,George E. Dahl,Abdelrahman Mohamed,Navdeep Jaitly,Andrew W. Senior,Vincent Vanhoucke,Patrick Nguyen,Tara N. Sainath,Brian Kingsbury +10 more
TL;DR: This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Proceedings Article
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.
TL;DR: Adaptive subgradient methods as discussed by the authors dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning, which allows us to find needles in haystacks in the form of very predictive but rarely seen features.