scispace - formally typeset
Open AccessProceedings ArticleDOI

On the performance of random reshuffling in stochastic learning

TLDR
The analysis establishes analytically that random reshuffling outperforms independent sampling by showing that the iterate at the end of each run approaches a smaller neighborhood of size O( μ2) around the minimizer rather than O(μ).
Abstract
In empirical risk optimization, it has been observed that gradient descent implementations that rely on random reshuffling of the data achieve better performance than implementations that rely on sampling the data randomly and independently of each other. Recent works have pursued justifications for this behavior by examining the convergence rate of the learning process under diminishing step-sizes. Some of these justifications rely on loose bounds, or their conclusions are dependent on the sample size which is problematic for large datasets. This work focuses on constant step-size adaptation, where the agent is continuously learning. In this case, convergence is only guaranteed to a small neighborhood of the optimizer albeit at a linear rate. The analysis establishes analytically that random reshuffling outperforms independent sampling by showing that the iterate at the end of each run approaches a smaller neighborhood of size O(μ2) around the minimizer rather than O(μ). Simulation results illustrate the theoretical findings.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Stochastic Learning Under Random Reshuffling With Constant Step-Sizes

TL;DR: In this article, the authors show that random reshuffling outperforms uniform sampling by showing explicitly that iterates approach a smaller neighborhood of size around the minimizer rather than $O(mu)$.
Posted Content

Variance-Reduced Stochastic Learning under Random Reshuffling

TL;DR: In this article, the authors provided the first theoretical guarantee of linear convergence under random reshuffling for SAGA and proposed a new amortized variance-reduced gradient (AVRG) algorithm with constant storage requirements and balanced gradient computations compared to SVRG.
Journal ArticleDOI

Variance-Reduced Stochastic Learning Under Random Reshuffling

TL;DR: A theoretical guarantee of linear convergence under random reshuffling for SAGA in the mean-square sense is provided and a new amortized variance-reduced gradient (AVRG) algorithm with constant storage requirements and balanced gradient computations compared to SVRG is proposed.
Proceedings Article

Random Reshuffling is Not Always Better

De Sa, +1 more
TL;DR: This work gives a counterexample to the Operator Inequality of Noncommutative Arithmetic and Geometric Means, a longstanding conjecture that relates to the performance of random reshuffle in learning algorithms, and gives an example of a learning task and algorithm for which with-replacement random sampling outperforms random reshuffling.
Posted Content

Convergence of Variance-Reduced Stochastic Learning under Random Reshuffling.

TL;DR: This paper provides the first theoretical guarantee of linear convergence under random reshuffling for SAGA and proposes a new amortized variance-reduced gradient (AVRG) algorithm with constant storage requirements and balanced gradient computations compared to SVRG.
References
More filters
Book

Parallel and Distributed Computation: Numerical Methods

TL;DR: This work discusses parallel and distributed architectures, complexity measures, and communication and synchronization issues, and it presents both Jacobi and Gauss-Seidel iterations, which serve as algorithms of reference for many of the computational approaches addressed later.
Book ChapterDOI

Large-Scale Machine Learning with Stochastic Gradient Descent

Léon Bottou
TL;DR: A more precise analysis uncovers qualitatively different tradeoffs for the case of small-scale and large-scale learning problems.
Book

An introduction to optimization

TL;DR: This review discusses mathematics, linear programming, and set--Constrained and Unconstrained Optimization, as well as methods of Proof and Some Notation, and problems with Equality Constraints.
Journal ArticleDOI

Acceleration of stochastic approximation by averaging

TL;DR: Convergence with probability one is proved for a variety of classical optimization and identification problems and it is demonstrated for these problems that the proposed algorithm achieves the highest possible rate of convergence.
Proceedings Article

The Tradeoffs of Large Scale Learning

TL;DR: This contribution develops a theoretical framework that takes into account the effect of approximate optimization on learning algorithms and shows distinct tradeoffs for the case of small-scale and large-scale learning problems.
Related Papers (5)
Trending Questions (1)
Why random reshuffling beats stochastic gradient descent?

The provided paper does not explicitly state why random reshuffling beats stochastic gradient descent.