Open AccessPosted Content
Quantitative Weak Convergence for Discrete Stochastic Processes
TLDR
This work shows that the iterates of these stochastic processes converge to an invariant distribution at a rate of $\tilde{O}\lrp{1/\sqrt{k}}$ where $k$ is the number of steps; this rate is provably tight up to log factors.Abstract:
In this paper, we quantitative convergence in $W_2$ for a family of Langevin-like stochastic processes that includes stochastic gradient descent and related gradient-based algorithms. Under certain regularity assumptions, we show that the iterates of these stochastic processes converge to an invariant distribution at a rate of $\tilde{O}\lrp{1/\sqrt{k}}$ where $k$ is the number of steps; this rate is provably tight up to log factors. Our result reduces to a quantitative form of the classical Central Limit Theorem in the special case when the potential is quadratic.read more
Citations
More filters
Posted Content
Where is the Information in a Deep Neural Network
TL;DR: A novel notion of effective information in the activations of a deep network is established, which is used to show that models with low (information) complexity not only generalize better, but are bound to learn invariant representations of future inputs.
Posted Content
Quantitative W 1 Convergence of Langevin-Like Stochastic Processes with Non-Convex Potential State-Dependent Noise.
TL;DR: In this article, the authors prove quantitative convergence rates at which discrete Langevin-like processes converge to the invariant distribution of a related stochastic differential equation and apply their theoretical findings to studying the convergence of Stochastic Gradient Descent (SGD) for non-convex problems.
Posted Content
Analytic expressions for the output evolution of a deep neural network
TL;DR: A novel methodology based on a Taylor expansion of the network output for obtaining analytical expressions for the expected value of thenetwork weights and output under stochastic training is presented.
References
More filters
Journal ArticleDOI
Acceleration of stochastic approximation by averaging
Boris T. Polyak,Anatoli Juditsky +1 more
TL;DR: Convergence with probability one is proved for a variety of classical optimization and identification problems and it is demonstrated for these problems that the proposed algorithm achieves the highest possible rate of convergence.
Journal ArticleDOI
Generalization of an Inequality by Talagrand and Links with the Logarithmic Sobolev Inequality
Felix Otto,Cédric Villani +1 more
TL;DR: In this paper, it was shown that transport inequalities, similar to the one derived by M. Talagrand (1996, Geom. Funct. Anal. 6, 587-600) for the Gaussian measure, are implied by logarithmic Sobolev inequalities.
Journal ArticleDOI
Stochastic Gradient Descent as Approximate Bayesian Inference
TL;DR: It is demonstrated that constant SGD gives rise to a new variational EM algorithm that optimizes hyperparameters in complex probabilistic models and a scalable approximate MCMC algorithm, the Averaged Stochastic Gradient Sampler is proposed.
Posted Content
Theoretical guarantees for approximate sampling from smooth and log-concave densities
TL;DR: This work establishes non‐asymptotic bounds for the error of approximating the target distribution by the distribution obtained by the Langevin Monte Carlo method and its variants and illustrates the effectiveness of the established guarantees.