Generalization Error Bounds for Noisy, Iterative Algorithms

doi:10.1109/ISIT.2018.8437571

Open AccessProceedings ArticleDOI

Generalization Error Bounds for Noisy, Iterative Algorithms

Ankit Pensia, +2 more

- pp 546-550

Chats0

TLDR

This paper derived generalization error bounds for a broad class of iterative algorithms that are characterized by bounded, noisy updates with Markovian structure, including stochastic gradient Langevin dynamics (SGLD) and variants of the SGHMC algorithm.

Abstract:

In statistical learning theory, generalization error is used to quantify the degree to which a supervised machine learning algorithm may overfit to training data. Recent work [Xu and Raginsky (2017)] has established a bound on the generalization error of empirical risk minimization based on the mutual information $I$ ( $S$ ; W) between the algorithm input $S$ and the algorithm output W, when the loss function is sub-Gaussian. We leverage these results to derive generalization error bounds for a broad class of iterative algorithms that are characterized by bounded, noisy updates with Markovian structure. Our bounds are very general and are applicable to numerous settings of interest, including stochastic gradient Langevin dynamics (SGLD) and variants of the stochastic gradient Hamiltonian Monte Carlo (SGHMC) algorithm. Furthermore, our error bounds hold for any output function computed over the path of iterates, including the last iterate of the algorithm or the average of subsets of iterates, and also allow for non-uniform sampling of data in successive updates of the algorithm.

Citations

PDF

Open Access

More filters

Proceedings Article

On the Power of Over-parametrization in Neural Networks with Quadratic Activation

Simon S. Du, +1 more

TL;DR: The authors showed that over-parametrization enables local search algorithms to find a globally optimal solution for general smooth and convex loss functions, and showed that the solution also generalizes well if the data is sampled from a regular distribution such as Gaussian.

...read moreread less

Proceedings Article

Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence

Fengxiang He, +2 more

TL;DR: A PAC-Bayes generalization bound for neural networks trained by SGD is proved, which has a positive correlation with the ratio of batch size to learning rate, which builds the theoretical foundation of the training strategy.

...read moreread less

Proceedings ArticleDOI

Tightening Mutual Information Based Bounds on Generalization Error

Yuheng Bu, +2 more

TL;DR: Application to noisy and iterative algorithms, e.g., stochastic gradient Langevin dynamics (SGLD), is also studied, where the constructed bound provides a tighter characterization of the generalization error than existing results.

...read moreread less

Posted Content

Chaining Mutual Information and Tightening Generalization Bounds.

Amir R. Asadi, +2 more

- 11 Jun 2018 -

arXiv: Learning

TL;DR: This paper introduces a technique to combine the chaining and mutual information methods, to obtain a generalization bound that is both algorithm-dependent and that exploits the dependencies between the hypotheses.

...read moreread less

Posted Content

Where is the Information in a Deep Neural Network

Alessandro Achille, +1 more

- 29 May 2019 -

arXiv: Learning

TL;DR: A novel notion of effective information in the activations of a deep network is established, which is used to show that models with low (information) complexity not only generalize better, but are bound to learn invariant representations of future inputs.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Statistical learning theory

Vladimir Vapnik

TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

...read moreread less

Book

Understanding Machine Learning: From Theory To Algorithms

Shai Shalev-Shwartz, +1 more

TL;DR: The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way in an advanced undergraduate or beginning graduate course.

...read moreread less

Book

Concentration Inequalities: A Nonasymptotic Theory of Independence

Stéphane Boucheron, +2 more

TL;DR: Deep connections with isoperimetric problems are revealed whilst special attention is paid to applications to the supremum of empirical processes.

...read moreread less

Proceedings Article

Bayesian Learning via Stochastic Gradient Langevin Dynamics

Max Welling, +1 more

TL;DR: This paper proposes a new framework for learning from large scale datasets based on iterative learning from small mini-batches by adding the right amount of noise to a standard stochastic gradient optimization algorithm and shows that the iterates will converge to samples from the true posterior distribution as the authors anneal the stepsize.

...read moreread less

Journal ArticleDOI

Stability and generalization

Olivier Bousquet, +1 more

- 01 Mar 2002 -

Journal of Machine Learning Research

TL;DR: These notions of stability for learning algorithms are defined and it is shown how to use these notions to derive generalization error bounds based on the empirical error and the leave-one-out error.

...read moreread less

Generalization Error Bounds for Noisy, Iterative Algorithms

Citations

On the Power of Over-parametrization in Neural Networks with Quadratic Activation

Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence

Tightening Mutual Information Based Bounds on Generalization Error

Chaining Mutual Information and Tightening Generalization Bounds.

Where is the Information in a Deep Neural Network

References

Statistical learning theory

Understanding Machine Learning: From Theory To Algorithms

Concentration Inequalities: A Nonasymptotic Theory of Independence

Bayesian Learning via Stochastic Gradient Langevin Dynamics

Stability and generalization

Related Papers (5)

Stability and generalization

Bayesian Learning via Stochastic Gradient Langevin Dynamics

Train faster, generalize better: stability of stochastic gradient descent

Concentration Inequalities: A Nonasymptotic Theory of Independence

Learning Multiple Layers of Features from Tiny Images