Adam: A Method for Stochastic Optimization

Open AccessProceedings Article

Adam: A Method for Stochastic Optimization

Chats0

TLDR

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

Abstract:

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis.

Christos Baziotis, +2 more

TL;DR: Two deep-learning systems that competed at SemEval-2017 Task 4 “Sentiment Analysis in Twitter” are presented, which use Long Short-Term Memory networks augmented with two kinds of attention mechanisms, on top of word embeddings pre-trained on a big collection of Twitter messages.

...read moreread less

Proceedings ArticleDOI

Learning Correspondence From the Cycle-Consistency of Time

Xiaolong Wang, +2 more

TL;DR: A self-supervised method to use cycle-consistency in time as free supervisory signal for learning visual representations from scratch and demonstrates the generalizability of the representation -- without finetuning -- across a range of visual correspondence tasks, including video object segmentation, keypoint tracking, and optical flow.

...read moreread less

Proceedings ArticleDOI

VulDeePecker: A Deep Learning-Based System for Vulnerability Detection

Zhen Li, +7 more

- 05 Jan 2018 -

arXiv: Cryptography and Security

TL;DR: The study of using deep learning-based vulnerability detection to relieve human experts from the tedious and subjective task of manually defining features and Experimental results show that VulDeePecker can achieve much fewer false negatives and reasonable false positives than other approaches.

...read moreread less

Proceedings ArticleDOI

Large-Scale Learnable Graph Convolutional Networks

Hongyang Gao, +2 more

TL;DR: In this paper, a learnable graph convolutional layer (LGCL) is proposed to transform graph data into grid-like structures in 1-D format, thereby enabling the use of regular convolution operations on generic graphs.

...read moreread less

Journal ArticleDOI

A new deep convolutional neural network for fast hyperspectral image classification

Mercedes E. Paoletti, +3 more

- 06 Dec 2017 -

Isprs Journal of Photogrammetry and Remo...

TL;DR: A new CNN architecture for the classification of hyperspectral images is presented, a 3-D network that uses both spectral and spatial information and implements a border mirroring strategy to effectively process border areas in the image.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Proceedings Article

Auto-Encoding Variational Bayes

Diederik P. Kingma, +1 more

TL;DR: A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.

...read moreread less

Journal ArticleDOI

Reducing the Dimensionality of Data with Neural Networks

Geoffrey E. Hinton, +1 more

- 28 Jul 2006 -

Science

TL;DR: In this article, an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data is described.

...read moreread less

Journal ArticleDOI

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

Geoffrey E. Hinton, +10 more

- 18 Oct 2012 -

IEEE Signal Processing Magazine

TL;DR: This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.

...read moreread less

Proceedings Article

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.

John C. Duchi, +2 more

TL;DR: Adaptive subgradient methods as discussed by the authors dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning, which allows us to find needles in haystacks in the form of very predictive but rarely seen features.

...read moreread less

Related Papers (5)

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

Adam: A Method for Stochastic Optimization

Citations

DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis.

Learning Correspondence From the Cycle-Consistency of Time

VulDeePecker: A Deep Learning-Based System for Vulnerability Detection

Large-Scale Learnable Graph Convolutional Networks

A new deep convolutional neural network for fast hyperspectral image classification

References

ImageNet Classification with Deep Convolutional Neural Networks

Auto-Encoding Variational Bayes

Reducing the Dimensionality of Data with Neural Networks

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.

Related Papers (5)

Deep Residual Learning for Image Recognition

Long short-term memory

Attention is All you Need

ImageNet Classification with Deep Convolutional Neural Networks

Generative Adversarial Nets

Trending Questions (2)