An overview of gradient descent optimization algorithms

Open AccessPosted Content

An overview of gradient descent optimization algorithms

- 15 Sep 2016 -

TLDR

This article looks at different variants of gradient descent, summarize challenges, introduce the most common optimization algorithms, review architectures in a parallel and distributed setting, and investigate additional strategies for optimizing gradient descent.

Abstract:

Gradient descent optimization algorithms, while increasingly popular, are often used as black-box optimizers, as practical explanations of their strengths and weaknesses are hard to come by. This article aims to provide the reader with intuitions with regard to the behaviour of different algorithms that will allow her to put them to use. In the course of this overview, we look at different variants of gradient descent, summarize challenges, introduce the most common optimization algorithms, review architectures in a parallel and distributed setting, and investigate additional strategies for optimizing gradient descent.

Citations

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Journal ArticleDOI

Convolutional neural networks: an overview and application in radiology

Rikiya Yamashita, +4 more

- 22 Jun 2018 -

Insights Into Imaging

TL;DR: A perspective on the basic concepts of convolutional neural network and its application to various radiological tasks is offered, and its challenges and future directions in the field of radiology are discussed.

...read moreread less

Proceedings ArticleDOI

Universal Language Model Fine-tuning for Text Classification

Jeremy Howard, +1 more

TL;DR: Universal Language Model Fine-tuning (ULMFiT) as mentioned in this paper is an effective transfer learning method that can be applied to any task in NLP, and introduces techniques that are key for finetuning a language model.

...read moreread less

Posted Content

Supervised Contrastive Learning.

Prannay Khosla, +8 more

- 23 Apr 2020 -

arXiv: Learning

TL;DR: In this paper, the authors extend the self-supervised batch contrastive approach to the fully supervised setting, allowing them to effectively leverage label information and achieve state-of-the-art performance in unsupervised training of deep image models.

...read moreread less

Proceedings ArticleDOI

Cyclical Learning Rates for Training Neural Networks

Leslie N. Smith

TL;DR: A new method for setting the learning rate, named cyclical learning rates, is described, which practically eliminates the need to experimentally find the best values and schedule for the global learning rates.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, +1 more

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

...read moreread less

Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

Jeffrey Pennington, +2 more

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

Posted Content

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, +1 more

- 11 Feb 2015 -

arXiv: Learning

TL;DR: Batch Normalization as mentioned in this paper normalizes layer inputs for each training mini-batch to reduce the internal covariate shift in deep neural networks, and achieves state-of-the-art performance on ImageNet.

...read moreread less

Posted Content

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

Martín Abadi, +39 more

- 01 Jan 2015 -

arXiv: Distributed, Parallel, and Cluste...

TL;DR: The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields.

...read moreread less

Nature

Dropout: a simple way to prevent neural networks from overfitting

Nitish Srivastava, +4 more

- 01 Jan 2014 -

Journal of Machine Learning Research

An overview of gradient descent optimization algorithms

Citations

Adam: A Method for Stochastic Optimization

Convolutional neural networks: an overview and application in radiology

Universal Language Model Fine-tuning for Text Classification

Supervised Contrastive Learning.

Cyclical Learning Rates for Training Neural Networks

References

Adam: A Method for Stochastic Optimization

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Glove: Global Vectors for Word Representation

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

Related Papers (5)

Adam: A Method for Stochastic Optimization

Deep Residual Learning for Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

Deep learning

Dropout: a simple way to prevent neural networks from overfitting