On the performance of random reshuffling in stochastic learning
Bicheng Ying,Kun Yuan,Stefan Vlaski,Ali H. Sayed +3 more
- pp 1-5
TLDR
The analysis establishes analytically that random reshuffling outperforms independent sampling by showing that the iterate at the end of each run approaches a smaller neighborhood of size O( μ2) around the minimizer rather than O(μ).Abstract:
In empirical risk optimization, it has been observed that gradient descent implementations that rely on random reshuffling of the data achieve better performance than implementations that rely on sampling the data randomly and independently of each other. Recent works have pursued justifications for this behavior by examining the convergence rate of the learning process under diminishing step-sizes. Some of these justifications rely on loose bounds, or their conclusions are dependent on the sample size which is problematic for large datasets. This work focuses on constant step-size adaptation, where the agent is continuously learning. In this case, convergence is only guaranteed to a small neighborhood of the optimizer albeit at a linear rate. The analysis establishes analytically that random reshuffling outperforms independent sampling by showing that the iterate at the end of each run approaches a smaller neighborhood of size O(μ2) around the minimizer rather than O(μ). Simulation results illustrate the theoretical findings.read more
Citations
More filters
Journal ArticleDOI
Stochastic Learning Under Random Reshuffling With Constant Step-Sizes
TL;DR: In this article, the authors show that random reshuffling outperforms uniform sampling by showing explicitly that iterates approach a smaller neighborhood of size around the minimizer rather than $O(mu)$.
Posted Content
Variance-Reduced Stochastic Learning under Random Reshuffling
TL;DR: In this article, the authors provided the first theoretical guarantee of linear convergence under random reshuffling for SAGA and proposed a new amortized variance-reduced gradient (AVRG) algorithm with constant storage requirements and balanced gradient computations compared to SVRG.
Journal ArticleDOI
Variance-Reduced Stochastic Learning Under Random Reshuffling
TL;DR: A theoretical guarantee of linear convergence under random reshuffling for SAGA in the mean-square sense is provided and a new amortized variance-reduced gradient (AVRG) algorithm with constant storage requirements and balanced gradient computations compared to SVRG is proposed.
Proceedings Article
Random Reshuffling is Not Always Better
De Sa,M Christopher +1 more
TL;DR: This work gives a counterexample to the Operator Inequality of Noncommutative Arithmetic and Geometric Means, a longstanding conjecture that relates to the performance of random reshuffle in learning algorithms, and gives an example of a learning task and algorithm for which with-replacement random sampling outperforms random reshuffling.
Posted Content
Convergence of Variance-Reduced Stochastic Learning under Random Reshuffling.
TL;DR: This paper provides the first theoretical guarantee of linear convergence under random reshuffling for SAGA and proposes a new amortized variance-reduced gradient (AVRG) algorithm with constant storage requirements and balanced gradient computations compared to SVRG.
References
More filters
Book
Parallel and Distributed Computation: Numerical Methods
TL;DR: This work discusses parallel and distributed architectures, complexity measures, and communication and synchronization issues, and it presents both Jacobi and Gauss-Seidel iterations, which serve as algorithms of reference for many of the computational approaches addressed later.
Book ChapterDOI
Large-Scale Machine Learning with Stochastic Gradient Descent
TL;DR: A more precise analysis uncovers qualitatively different tradeoffs for the case of small-scale and large-scale learning problems.
Book
An introduction to optimization
TL;DR: This review discusses mathematics, linear programming, and set--Constrained and Unconstrained Optimization, as well as methods of Proof and Some Notation, and problems with Equality Constraints.
Journal ArticleDOI
Acceleration of stochastic approximation by averaging
Boris T. Polyak,Anatoli Juditsky +1 more
TL;DR: Convergence with probability one is proved for a variety of classical optimization and identification problems and it is demonstrated for these problems that the proposed algorithm achieves the highest possible rate of convergence.
Proceedings Article
The Tradeoffs of Large Scale Learning
Olivier Bousquet,Léon Bottou +1 more
TL;DR: This contribution develops a theoretical framework that takes into account the effect of approximate optimization on learning algorithms and shows distinct tradeoffs for the case of small-scale and large-scale learning problems.
Related Papers (5)
Convergence of Online Adaptive and Recurrent Optimization Algorithms
Pierre-Yves Massé,Yann Ollivier +1 more