scispace - formally typeset
Open AccessJournal ArticleDOI

A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics

Reads0
Chats0
TLDR
In this article, a geometric understanding of stochastic gradient descent (SGD) was developed, in which the trajectories of dynamical systems are described via geodesics of a family of metrics arising from a certain diffusion matrix.
Abstract
This paper is a step towards developing a geometric understanding of a popular algorithm for training deep neural networks named stochastic gradient descent (SGD). We built upon a recent result which observed that the noise in SGD while training typical networks is highly non-isotropic. That motivated a deterministic model in which the trajectories of our dynamical systems are described via geodesics of a family of metrics arising from a certain diffusion matrix; namely, the covariance of the stochastic gradients in SGD. Our model is analogous to models in general relativity: the role of the electromagnetic field in the latter is played by the gradient of the loss function of a deep network in the former.

read more

Citations
More filters
Journal ArticleDOI

Chaos and Complexity from Quantum Neural Network: A study with Diffusion Metric in Machine Learning

TL;DR: In this article, a Parameterized Quantum Circuits (PQCs) in the hybrid quantum-classical framework is introduced as a universal function approximator to perform optimization with Stochastic Gradient Descent (SGD).
Journal ArticleDOI

Chaos and Complexity from Quantum Neural Network: A study with Diffusion Metric in Machine Learning

TL;DR: This work establishes the parametrized version of Quantum Complexity and Quantum Chaos in terms of physically relevant quantities, which are not only essential in determining the stability, but also essential in providing a very significant lower bound to the generalization capability of QNN.
Posted Content

Geometry Perspective Of Estimating Learning Capability Of Neural Networks.

Ankan Dutta, +1 more
- 03 Nov 2020 - 
TL;DR: By correlating the principles of high-energy physics with the learning theory of neural networks, the paper establishes a variant of the Complexity-Action conjecture from an artificial neural network perspective.
Book ChapterDOI

On the Thermodynamic Interpretation of Deep Learning Systems

TL;DR: In this paper, a more conceptual approach involving contact dynamics and Lie Group Thermodynamics is proposed to study the time evolution of the parameters in deep learning systems, subject to optimization via stochastic gradient descent.
References
More filters
Journal ArticleDOI

Deep learning

TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
Journal ArticleDOI

Gradient-based learning applied to document recognition

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Journal ArticleDOI

Natural gradient works efficiently in learning

Shun-ichi Amari
- 15 Feb 1998 - 
TL;DR: In this paper, the authors used information geometry to calculate the natural gradients in the parameter space of perceptrons, the space of matrices (for blind source separation), and the spaces of linear dynamical systems for blind source deconvolution, and proved that Fisher efficient online learning has asymptotically the same performance as the optimal batch estimation of parameters.
Journal ArticleDOI

Entropy-SGD: biasing gradient descent into wide valleys*

TL;DR: In this article, a local-entropy-based objective function is proposed for training deep neural networks that is motivated by the local geometry of the energy landscape, where the gradient of the local entropy is computed before each update of the weights.
Related Papers (5)