Stochastic gradient descent with noise of machine learning type. Part II: Continuous time analysis.

Open AccessPosted Content

Stochastic gradient descent with noise of machine learning type. Part II: Continuous time analysis.

- 04 Jun 2021 -

TLDR

In this paper, a continuous time model for stochastic gradient descent with noise that follows the machine learning scaling was proposed, where the optimization algorithm prefers flat minima of the objective function in a sense which is different from the flat minimum selection of continuous time SGD with homogeneous noise.

Abstract:

The representation of functions by artificial neural networks depends on a large number of parameters in a non-linear fashion. Suitable parameters of these are found by minimizing a 'loss functional', typically by stochastic gradient descent (SGD) or an advanced SGD-based algorithm. In a continuous time model for SGD with noise that follows the 'machine learning scaling', we show that in a certain noise regime, the optimization algorithm prefers 'flat' minima of the objective function in a sense which is different from the flat minimum selection of continuous time SGD with homogeneous noise.

Citations

PDF

Open Access

More filters

Posted Content

Stochastic gradient descent with noise of machine learning type. Part I: Discrete time analysis.

Stephan Wojtowytsch, +1 more

- 04 May 2021 -

arXiv: Machine Learning

TL;DR: In particular, the authors showed that SGD with machine learning noise can be chosen to be small, but uniformly positive for all times if the energy landscape resembles that of overparametrized deep learning problems.

...read moreread less

Posted Content

On minimal representations of shallow ReLU networks.

Steffen Dereich, +1 more

- 12 Aug 2021 -

arXiv: Learning

TL;DR: In this article, the authors show that the minimal representation of a shallow ReLU network can be represented by a set of hyperplanes, where each hyperplane represents a continuous and piecewise affine function.

...read moreread less

Posted Content

SGD May Never Escape Saddle Points

Liu Ziyin, +3 more

- 25 Jul 2021 -

arXiv: Learning

TL;DR: The authors showed that SGD may escape a saddle point arbitrarily slowly, SGD prefers sharp minima over the flat ones, and AMSGrad may converge to a local maximum, and that the noise structure of SGD might be more important than the loss landscape in neural network training.

...read moreread less

References

PDF

Open Access

More filters

Book

Partial Differential Equations

Lawrence C. Evans

TL;DR: In this paper, the authors present a theory for linear PDEs: Sobolev spaces Second-order elliptic equations Linear evolution equations, Hamilton-Jacobi equations and systems of conservation laws.

...read moreread less

Book

Elliptic Partial Differential Equations of Second Order

David Gilbarg, +1 more

TL;DR: In this article, Leray-Schauder and Harnack this article considered the Dirichlet Problem for Poisson's Equation and showed that it is a special case of Divergence Form Operators.

...read moreread less

Journal ArticleDOI

A Stochastic Approximation Method

Herbert Robbins, +1 more

- 01 Sep 1951 -

Annals of Mathematical Statistics

TL;DR: In this article, a method for making successive experiments at levels x1, x2, ··· in such a way that xn will tend to θ in probability is presented.

...read moreread less

Book

Brownian Motion and Stochastic Calculus

Ioannis Karatzas, +1 more

TL;DR: In this paper, the authors present a characterization of continuous local martingales with respect to Brownian motion in terms of Markov properties, including the strong Markov property, and a generalized version of the Ito rule.

...read moreread less

Journal ArticleDOI

Optimization Methods for Large-Scale Machine Learning

Léon Bottou, +2 more

- 08 May 2018 -

Siam Review

TL;DR: The authors provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications and discusses how optimization problems arise in machine learning and what makes them challenging.

...read moreread less