Open AccessPosted Content
Stochastic gradient descent with noise of machine learning type. Part II: Continuous time analysis.
TLDR
In this paper, a continuous time model for stochastic gradient descent with noise that follows the machine learning scaling was proposed, where the optimization algorithm prefers flat minima of the objective function in a sense which is different from the flat minimum selection of continuous time SGD with homogeneous noise.Abstract:
The representation of functions by artificial neural networks depends on a large number of parameters in a non-linear fashion. Suitable parameters of these are found by minimizing a 'loss functional', typically by stochastic gradient descent (SGD) or an advanced SGD-based algorithm.
In a continuous time model for SGD with noise that follows the 'machine learning scaling', we show that in a certain noise regime, the optimization algorithm prefers 'flat' minima of the objective function in a sense which is different from the flat minimum selection of continuous time SGD with homogeneous noise.read more
Citations
More filters
Posted Content
Stochastic gradient descent with noise of machine learning type. Part I: Discrete time analysis.
TL;DR: In particular, the authors showed that SGD with machine learning noise can be chosen to be small, but uniformly positive for all times if the energy landscape resembles that of overparametrized deep learning problems.
Posted Content
On minimal representations of shallow ReLU networks.
TL;DR: In this article, the authors show that the minimal representation of a shallow ReLU network can be represented by a set of hyperplanes, where each hyperplane represents a continuous and piecewise affine function.
Posted Content
SGD May Never Escape Saddle Points
TL;DR: The authors showed that SGD may escape a saddle point arbitrarily slowly, SGD prefers sharp minima over the flat ones, and AMSGrad may converge to a local maximum, and that the noise structure of SGD might be more important than the loss landscape in neural network training.
References
More filters
Book
Partial Differential Equations
TL;DR: In this paper, the authors present a theory for linear PDEs: Sobolev spaces Second-order elliptic equations Linear evolution equations, Hamilton-Jacobi equations and systems of conservation laws.
Book
Elliptic Partial Differential Equations of Second Order
David Gilbarg,Neil S. Trudinger +1 more
TL;DR: In this article, Leray-Schauder and Harnack this article considered the Dirichlet Problem for Poisson's Equation and showed that it is a special case of Divergence Form Operators.
Journal ArticleDOI
A Stochastic Approximation Method
Herbert Robbins,Sutton Monro +1 more
TL;DR: In this article, a method for making successive experiments at levels x1, x2, ··· in such a way that xn will tend to θ in probability is presented.
Book
Brownian Motion and Stochastic Calculus
TL;DR: In this paper, the authors present a characterization of continuous local martingales with respect to Brownian motion in terms of Markov properties, including the strong Markov property, and a generalized version of the Ito rule.
Journal ArticleDOI
Optimization Methods for Large-Scale Machine Learning
TL;DR: The authors provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications and discusses how optimization problems arise in machine learning and what makes them challenging.
Related Papers (5)
Dynamic of Stochastic Gradient Descent with State-dependent Noise
Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks
Pratik Chaudhari,Stefano Soatto +1 more