A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics

doi:10.3390/E22010101

Open AccessJournal ArticleDOI

A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics

Rita Fioresi, +2 more

- 15 Jan 2020 -

Entropy

- Vol. 22, Iss: 1, pp 101

Chats0

TLDR

In this article, a geometric understanding of stochastic gradient descent (SGD) was developed, in which the trajectories of dynamical systems are described via geodesics of a family of metrics arising from a certain diffusion matrix.

Abstract:

This paper is a step towards developing a geometric understanding of a popular algorithm for training deep neural networks named stochastic gradient descent (SGD). We built upon a recent result which observed that the noise in SGD while training typical networks is highly non-isotropic. That motivated a deterministic model in which the trajectories of our dynamical systems are described via geodesics of a family of metrics arising from a certain diffusion matrix; namely, the covariance of the stochastic gradients in SGD. Our model is analogous to models in general relativity: the role of the electromagnetic field in the latter is played by the gradient of the loss function of a deep network in the former.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Chaos and Complexity from Quantum Neural Network: A study with Diffusion Metric in Machine Learning

Sayantan Choudhury, +4 more

- 01 Apr 2021 -

Journal of High Energy Physics

TL;DR: In this article, a Parameterized Quantum Circuits (PQCs) in the hybrid quantum-classical framework is introduced as a universal function approximator to perform optimization with Stochastic Gradient Descent (SGD).

...read moreread less

Journal ArticleDOI

Chaos and Complexity from Quantum Neural Network: A study with Diffusion Metric in Machine Learning

Sayantan Choudhury, +4 more

- 16 Nov 2020 -

arXiv: High Energy Physics - Theory

TL;DR: This work establishes the parametrized version of Quantum Complexity and Quantum Chaos in terms of physically relevant quantities, which are not only essential in determining the stability, but also essential in providing a very significant lower bound to the generalization capability of QNN.

...read moreread less

Posted Content

Geometry Perspective Of Estimating Learning Capability Of Neural Networks.

Ankan Dutta, +1 more

- 03 Nov 2020 -

arXiv: Learning

TL;DR: By correlating the principles of high-energy physics with the learning theory of neural networks, the paper establishes a variant of the Complexity-Action conjecture from an artificial neural network perspective.

...read moreread less

Book ChapterDOI

On the Thermodynamic Interpretation of Deep Learning Systems

Rita Fioresi, +3 more

TL;DR: In this paper, a more conceptual approach involving contact dynamics and Lie Group Thermodynamics is proposed to study the time evolution of the parameters in deep learning systems, subject to optimization via stochastic gradient descent.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

Deep learning

Yann LeCun, +4 more

- 28 May 2015 -

Nature

TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.

...read moreread less

Journal ArticleDOI

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

Journal ArticleDOI

Natural gradient works efficiently in learning

Shun-ichi Amari

- 15 Feb 1998 -

Neural Computation

TL;DR: In this paper, the authors used information geometry to calculate the natural gradients in the parameter space of perceptrons, the space of matrices (for blind source separation), and the spaces of linear dynamical systems for blind source deconvolution, and proved that Fisher efficient online learning has asymptotically the same performance as the optimal batch estimation of parameters.

...read moreread less

Journal ArticleDOI

Entropy-SGD: biasing gradient descent into wide valleys*

Pratik Chaudhari, +12 more

- 20 Dec 2019 -

Journal of Statistical Mechanics: Theory...

TL;DR: In this article, a local-entropy-based objective function is proposed for training deep neural networks that is motivated by the local geometry of the energy landscape, where the gradient of the local entropy is computed before each update of the weights.

...read moreread less