Adam: A Method for Stochastic Optimization

Open AccessPosted Content

Adam: A Method for Stochastic Optimization

- 22 Dec 2014 -

TLDR

In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.

Abstract:

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

Citations

PDF

Open Access

More filters

Posted Content

Proximal Policy Optimization Algorithms

John Schulman, +4 more

- 20 Jul 2017 -

arXiv: Learning

TL;DR: A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent, are proposed.

...read moreread less

Posted Content

Decoupled Weight Decay Regularization

Ilya Loshchilov, +1 more

- 14 Nov 2017 -

arXiv: Learning

TL;DR: This work proposes a simple modification to recover the original formulation of weight decay regularization by decoupling the weight decay from the optimization steps taken w.r.t. the loss function, and provides empirical evidence that this modification substantially improves Adam's generalization performance.

...read moreread less

Posted Content

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Alec Radford, +2 more

- 19 Nov 2015 -

arXiv: Learning

TL;DR: This work introduces a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrates that they are a strong candidate for unsupervised learning.

...read moreread less

Posted Content

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Kelvin Xu, +7 more

- 10 Feb 2015 -

arXiv: Learning

TL;DR: This paper proposed an attention-based model that automatically learns to describe the content of images by focusing on salient objects while generating corresponding words in the output sequence, which achieved state-of-the-art performance on three benchmark datasets: Flickr8k, Flickr30k and MS COCO.

...read moreread less

Posted Content

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

Justin Johnson, +2 more

- 27 Mar 2016 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work considers image transformation problems, and proposes the use of perceptual loss functions for training feed-forward networks for image transformation tasks, and shows results on image style transfer, where aFeed-forward network is trained to solve the optimization problem proposed by Gatys et al. in real-time.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Reducing the Dimensionality of Data with Neural Networks

Geoffrey E. Hinton, +1 more

- 28 Jul 2006 -

Science

TL;DR: In this article, an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data is described.

...read moreread less

Journal ArticleDOI

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

Geoffrey E. Hinton, +10 more

- 18 Oct 2012 -

IEEE Signal Processing Magazine

TL;DR: This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.

...read moreread less

Proceedings ArticleDOI

Speech recognition with deep recurrent neural networks

Alex Graves, +2 more

TL;DR: This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.

...read moreread less

Journal Article

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

John C. Duchi, +2 more

- 01 Feb 2011 -

Journal of Machine Learning Research

TL;DR: This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.

...read moreread less

Posted Content

Improving neural networks by preventing co-adaptation of feature detectors

Geoffrey E. Hinton, +4 more

- 03 Jul 2012 -

arXiv: Neural and Evolutionary Computing

TL;DR: The authors randomly omits half of the feature detectors on each training case to prevent complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors.

...read moreread less

Adam: A Method for Stochastic Optimization

Citations

Proximal Policy Optimization Algorithms

Decoupled Weight Decay Regularization

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

References

Reducing the Dimensionality of Data with Neural Networks

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

Speech recognition with deep recurrent neural networks

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

Improving neural networks by preventing co-adaptation of feature detectors

Related Papers (5)

Deep Residual Learning for Image Recognition

Long short-term memory

Very Deep Convolutional Networks for Large-Scale Image Recognition

U-Net: Convolutional Networks for Biomedical Image Segmentation

Generative Adversarial Nets

Trending Questions (1)