scispace - formally typeset
Open AccessProceedings Article

Adam: A Method for Stochastic Optimization

Reads0
Chats0
TLDR
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Abstract
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

read more

Citations
More filters
Posted Content

Deep Learning using Rectified Linear Units (ReLU)

TL;DR: The use of rectified linear units (ReLU) as the classification function in a deep neural network (DNN) is introduced by taking the activation of the penultimate layer of a neural network, then multiplying it by weight parameters $\theta$ to get the raw scores.
Proceedings ArticleDOI

Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring

TL;DR: This work proposes a multi-scale convolutional neural network that restores sharp images in an end-to-end manner where blur is caused by various sources and presents a new large-scale dataset that provides pairs of realistic blurry image and the corresponding ground truth sharp image that are obtained by a high-speed camera.
Proceedings Article

A theoretically grounded application of dropout in recurrent neural networks

TL;DR: The authors apply this variational inference based dropout technique in LSTM and GRU models, assessing it on language modelling and sentiment analysis tasks, and to the best of their knowledge improve on the single model state-of-the-art in language modelling with the Penn Treebank (73.4 test perplexity).
Proceedings ArticleDOI

Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks

TL;DR: In this paper, a generative adversarial network (GAN)-based method adapts source-domain images to appear as if drawn from the target domain by learning in an unsupervised manner a transformation in the pixel space from one domain to another.
Proceedings Article

Coupled Generative Adversarial Networks

TL;DR: This work proposes coupled generative adversarial network (CoGAN), which can learn a joint distribution without any tuple of corresponding images, and applies it to several joint distribution learning tasks, and demonstrates its applications to domain adaptation and image transformation.
References
More filters
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article

Auto-Encoding Variational Bayes

TL;DR: A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.
Journal ArticleDOI

Reducing the Dimensionality of Data with Neural Networks

TL;DR: In this article, an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data is described.
Journal ArticleDOI

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

TL;DR: This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Proceedings Article

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.

TL;DR: Adaptive subgradient methods as discussed by the authors dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning, which allows us to find needles in haystacks in the form of very predictive but rarely seen features.
Related Papers (5)