Generalization Error Bounds for Noisy, Iterative Algorithms
Ankit Pensia,Varun Jog,Po-Ling Loh +2 more
- pp 546-550
Reads0
Chats0
TLDR
This paper derived generalization error bounds for a broad class of iterative algorithms that are characterized by bounded, noisy updates with Markovian structure, including stochastic gradient Langevin dynamics (SGLD) and variants of the SGHMC algorithm.Abstract:
In statistical learning theory, generalization error is used to quantify the degree to which a supervised machine learning algorithm may overfit to training data. Recent work [Xu and Raginsky (2017)] has established a bound on the generalization error of empirical risk minimization based on the mutual information $I$ ( $S$ ; W) between the algorithm input $S$ and the algorithm output W, when the loss function is sub-Gaussian. We leverage these results to derive generalization error bounds for a broad class of iterative algorithms that are characterized by bounded, noisy updates with Markovian structure. Our bounds are very general and are applicable to numerous settings of interest, including stochastic gradient Langevin dynamics (SGLD) and variants of the stochastic gradient Hamiltonian Monte Carlo (SGHMC) algorithm. Furthermore, our error bounds hold for any output function computed over the path of iterates, including the last iterate of the algorithm or the average of subsets of iterates, and also allow for non-uniform sampling of data in successive updates of the algorithm.read more
Citations
More filters
Proceedings Article
On the Power of Over-parametrization in Neural Networks with Quadratic Activation
Simon S. Du,Jason D. Lee +1 more
TL;DR: The authors showed that over-parametrization enables local search algorithms to find a globally optimal solution for general smooth and convex loss functions, and showed that the solution also generalizes well if the data is sampled from a regular distribution such as Gaussian.
Proceedings Article
Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence
TL;DR: A PAC-Bayes generalization bound for neural networks trained by SGD is proved, which has a positive correlation with the ratio of batch size to learning rate, which builds the theoretical foundation of the training strategy.
Proceedings ArticleDOI
Tightening Mutual Information Based Bounds on Generalization Error
TL;DR: Application to noisy and iterative algorithms, e.g., stochastic gradient Langevin dynamics (SGLD), is also studied, where the constructed bound provides a tighter characterization of the generalization error than existing results.
Posted Content
Chaining Mutual Information and Tightening Generalization Bounds.
TL;DR: This paper introduces a technique to combine the chaining and mutual information methods, to obtain a generalization bound that is both algorithm-dependent and that exploits the dependencies between the hypotheses.
Posted Content
Where is the Information in a Deep Neural Network
TL;DR: A novel notion of effective information in the activations of a deep network is established, which is used to show that models with low (information) complexity not only generalize better, but are bound to learn invariant representations of future inputs.
References
More filters
Statistical learning theory
TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
Book
Understanding Machine Learning: From Theory To Algorithms
TL;DR: The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way in an advanced undergraduate or beginning graduate course.
Book
Concentration Inequalities: A Nonasymptotic Theory of Independence
TL;DR: Deep connections with isoperimetric problems are revealed whilst special attention is paid to applications to the supremum of empirical processes.
Proceedings Article
Bayesian Learning via Stochastic Gradient Langevin Dynamics
Max Welling,Yee Whye Teh +1 more
TL;DR: This paper proposes a new framework for learning from large scale datasets based on iterative learning from small mini-batches by adding the right amount of noise to a standard stochastic gradient optimization algorithm and shows that the iterates will converge to samples from the true posterior distribution as the authors anneal the stepsize.
Journal ArticleDOI
Stability and generalization
Olivier Bousquet,André Elisseeff +1 more
TL;DR: These notions of stability for learning algorithms are defined and it is shown how to use these notions to derive generalization error bounds based on the empirical error and the leave-one-out error.