No more pesky learning rates

Open AccessProceedings Article

No more pesky learning rates

Tom Schaul, +2 more

- pp 343-351

Chats0

TLDR

This paper proposed a method to automatically adjust multiple learning rates so as to minimize the expected error at any one time, which relies on local gradient variations across samples, making it suitable for non-stationary problems, and the resulting algorithm matches the performance of SGD or other adaptive approaches with their best settings obtained through systematic search.

Abstract:

The performance of stochastic gradient descent (SGD) depends critically on how learning rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any one time. The method relies on local gradient variations across samples. In our approach, learning rates can increase as well as decrease, making it suitable for non-stationary problems. Using a number of convex and non-convex learning tasks, we show that the resulting algorithm matches the performance of SGD or other adaptive approaches with their best settings obtained through systematic search, and effectively removes the need for learning rate tuning.

Citations

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Journal ArticleDOI

Deep learning in neural networks

Jürgen Schmidhuber

- 01 Jan 2015 -

Neural Networks

TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

...read moreread less

Book

Machine Learning : A Probabilistic Perspective

Kevin P. Murphy

TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

...read moreread less

Posted Content

ADADELTA: An Adaptive Learning Rate Method

Matthew D. Zeiler

- 22 Dec 2012 -

arXiv: Learning

TL;DR: A novel per-dimension learning rate method for gradient descent called ADADELTA that dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent is presented.

...read moreread less

Journal ArticleDOI

Recent advances in convolutional neural networks

Jiuxiang Gu, +10 more

- 01 May 2018 -

Pattern Recognition

TL;DR: A broad survey of the recent advances in convolutional neural networks can be found in this article, where the authors discuss the improvements of CNN on different aspects, namely, layer design, activation function, loss function, regularization, optimization and fast computation.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Dissertation

Learning Multiple Layers of Features from Tiny Images

Alex Krizhevsky

TL;DR: In this paper, the authors describe how to train a multi-layer generative model of natural images, using a dataset of millions of tiny colour images, described in the next section.

...read moreread less

Proceedings Article

Understanding the difficulty of training deep feedforward neural networks

Xavier Glorot, +1 more

TL;DR: The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.

...read moreread less

Journal ArticleDOI

A Stochastic Approximation Method

Herbert Robbins, +1 more

- 01 Sep 1951 -

Annals of Mathematical Statistics

TL;DR: In this article, a method for making successive experiments at levels x1, x2, ··· in such a way that xn will tend to θ in probability is presented.

...read moreread less

Proceedings Article

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.

John C. Duchi, +2 more

TL;DR: Adaptive subgradient methods as discussed by the authors dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning, which allows us to find needles in haystacks in the form of very predictive but rarely seen features.

...read moreread less

Journal Article

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

John C. Duchi, +2 more

- 01 Feb 2011 -

Journal of Machine Learning Research

TL;DR: This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.

...read moreread less

No more pesky learning rates

Citations

Adam: A Method for Stochastic Optimization

Deep learning in neural networks

Machine Learning : A Probabilistic Perspective

ADADELTA: An Adaptive Learning Rate Method

Recent advances in convolutional neural networks

References

Learning Multiple Layers of Features from Tiny Images

Understanding the difficulty of training deep feedforward neural networks

A Stochastic Approximation Method

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

Related Papers (5)

Adam: A Method for Stochastic Optimization

Deep Residual Learning for Image Recognition

A Stochastic Approximation Method

Learning Multiple Layers of Features from Tiny Images

Gradient-based learning applied to document recognition