On the performance of random reshuffling in stochastic learning

doi:10.1109/ITA.2017.8023470

Open AccessProceedings ArticleDOI

On the performance of random reshuffling in stochastic learning

- pp 1-5

TLDR

The analysis establishes analytically that random reshuffling outperforms independent sampling by showing that the iterate at the end of each run approaches a smaller neighborhood of size O( μ2) around the minimizer rather than O(μ).

Abstract:

In empirical risk optimization, it has been observed that gradient descent implementations that rely on random reshuffling of the data achieve better performance than implementations that rely on sampling the data randomly and independently of each other. Recent works have pursued justifications for this behavior by examining the convergence rate of the learning process under diminishing step-sizes. Some of these justifications rely on loose bounds, or their conclusions are dependent on the sample size which is problematic for large datasets. This work focuses on constant step-size adaptation, where the agent is continuously learning. In this case, convergence is only guaranteed to a small neighborhood of the optimizer albeit at a linear rate. The analysis establishes analytically that random reshuffling outperforms independent sampling by showing that the iterate at the end of each run approaches a smaller neighborhood of size O(μ2) around the minimizer rather than O(μ). Simulation results illustrate the theoretical findings.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Stochastic Learning Under Random Reshuffling With Constant Step-Sizes

Bicheng Ying, +3 more

- 15 Jan 2019 -

IEEE Transactions on Signal Processing

TL;DR: In this article, the authors show that random reshuffling outperforms uniform sampling by showing explicitly that iterates approach a smaller neighborhood of size around the minimizer rather than $O(mu)$.

...read moreread less

Posted Content

Variance-Reduced Stochastic Learning under Random Reshuffling

Bicheng Ying, +2 more

- 04 Aug 2017 -

arXiv: Learning

TL;DR: In this article, the authors provided the first theoretical guarantee of linear convergence under random reshuffling for SAGA and proposed a new amortized variance-reduced gradient (AVRG) algorithm with constant storage requirements and balanced gradient computations compared to SVRG.

...read moreread less

Journal ArticleDOI

Variance-Reduced Stochastic Learning Under Random Reshuffling

Bicheng Ying, +2 more

- 12 Feb 2020 -

IEEE Transactions on Signal Processing

TL;DR: A theoretical guarantee of linear convergence under random reshuffling for SAGA in the mean-square sense is provided and a new amortized variance-reduced gradient (AVRG) algorithm with constant storage requirements and balanced gradient computations compared to SVRG is proposed.

...read moreread less

Proceedings Article

Random Reshuffling is Not Always Better

De Sa, +1 more

TL;DR: This work gives a counterexample to the Operator Inequality of Noncommutative Arithmetic and Geometric Means, a longstanding conjecture that relates to the performance of random reshuffle in learning algorithms, and gives an example of a learning task and algorithm for which with-replacement random sampling outperforms random reshuffling.

...read moreread less

Posted Content

Convergence of Variance-Reduced Stochastic Learning under Random Reshuffling.

Bicheng Ying, +2 more

TL;DR: This paper provides the first theoretical guarantee of linear convergence under random reshuffling for SAGA and proposes a new amortized variance-reduced gradient (AVRG) algorithm with constant storage requirements and balanced gradient computations compared to SVRG.

...read moreread less

References

PDF

Open Access

More filters

Book

Parallel and Distributed Computation: Numerical Methods

Dimitri P. Bertsekas, +1 more

TL;DR: This work discusses parallel and distributed architectures, complexity measures, and communication and synchronization issues, and it presents both Jacobi and Gauss-Seidel iterations, which serve as algorithms of reference for many of the computational approaches addressed later.

...read moreread less

Book ChapterDOI

Large-Scale Machine Learning with Stochastic Gradient Descent

Léon Bottou

TL;DR: A more precise analysis uncovers qualitatively different tradeoffs for the case of small-scale and large-scale learning problems.

...read moreread less

Book

An introduction to optimization

Edwin K. P. Chong, +1 more

TL;DR: This review discusses mathematics, linear programming, and set--Constrained and Unconstrained Optimization, as well as methods of Proof and Some Notation, and problems with Equality Constraints.

...read moreread less

Journal ArticleDOI

Acceleration of stochastic approximation by averaging

Boris T. Polyak, +1 more

- 01 Jul 1992 -

Siam Journal on Control and Optimization

TL;DR: Convergence with probability one is proved for a variety of classical optimization and identification problems and it is demonstrated for these problems that the proposed algorithm achieves the highest possible rate of convergence.

...read moreread less

Proceedings Article

The Tradeoffs of Large Scale Learning

Olivier Bousquet, +1 more

TL;DR: This contribution develops a theoretical framework that takes into account the effect of approximate optimization on learning algorithms and shows distinct tradeoffs for the case of small-scale and large-scale learning problems.

...read moreread less