Semi-Stochastic Gradient Descent Methods

doi:10.3389/FAMS.2017.00009

Open AccessJournal ArticleDOI

Semi-Stochastic Gradient Descent Methods

Jakub Konečný, +1 more

- 01 May 2017 -

Frontiers in Applied Mathematics and Sta...

- Vol. 3

Chats0

TLDR

Semi-Stochastic Gradient Descent (S2GD) as mentioned in this paper runs for one or several epochs in each of which a single full gradient and a random number of stochastic gradients is computed, following a geometric law.

Abstract:

In this paper we study the problem of minimizing the average of a large number of smooth convex loss functions. We propose a new method, S2GD (Semi-Stochastic Gradient Descent), which runs for one or several epochs in each of which a single full gradient and a random number of stochastic gradients is computed, following a geometric law. For strongly convex objectives, the method converges linearly. The total work needed for the method to output an epsilon-accurate solution in expectation, measured in the number of passes over data, is proportional to the condition number of the problem and inversely proportional to the number of functions forming the average. This is achieved by running the method with number of stochastic gradient evaluations per epoch proportional to conditioning of the problem. The SVRG method of Johnson and Zhang arises as a special case. To illustrate our theoretical results, S2GD only needs the workload equivalent to about 2.1 full gradient evaluations to find a 10e-6 accurate solution for a problem with 10e9 functions and a condition number of 10e3.

Citations

PDF

Open Access

More filters

Posted Content

Federated Optimization: Distributed Machine Learning for On-Device Intelligence

Jakub Konečný, +3 more

- 08 Oct 2016 -

arXiv: Learning

TL;DR: A new and increasingly relevant setting for distributed optimization in machine learning, where the data defining the optimization are unevenly distributed over an extremely large number of nodes, is introduced, to train a high-quality centralized model.

...read moreread less

Posted Content

Minimizing Finite Sums with the Stochastic Average Gradient

Mark Schmidt, +2 more

- 10 Sep 2013 -

arXiv: Optimization and Control

TL;DR: In this paper, the stochastic average gradient (SAG) method was proposed to optimize the sum of a finite number of smooth convex functions, which achieves a faster convergence rate than black-box SG methods.

...read moreread less

Posted Content

Federated Optimization: Distributed Optimization Beyond the Datacenter

Jakub Konečný, +2 more

- 11 Nov 2015 -

arXiv: Learning

TL;DR: This work introduces a new and increasingly relevant setting for distributed optimization in machine learning, where the data defining the optimization are distributed over an extremely large number of nodes, but the goal remains to train a high-quality centralized model.

...read moreread less

Posted Content

Stochastic Variance Reduction for Nonconvex Optimization

Sashank J. Reddi, +4 more

- 19 Mar 2016 -

arXiv: Optimization and Control

TL;DR: In this paper, stochastic variance reduced gradient (SVRG) methods for nonconvex finite-sum problems were studied and the authors proved nonasymptotic rates of convergence to stationary points.

...read moreread less

Proceedings Article

SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient

Lam M. Nguyen, +3 more

TL;DR: In this paper, the authors proposed a StochAstic Recursive Gradient Algorithm for finite-sum minimization (SARAH), which admits a simple recursive framework for updating stochastic gradient estimates.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Introductory Lectures on Convex Optimization: A Basic Course

I︠u︡. E. Nesterov

TL;DR: A polynomial-time interior-point method for linear optimization was proposed in this paper, where the complexity bound was not only in its complexity, but also in the theoretical pre- diction of its high efficiency was supported by excellent computational results.

...read moreread less

Proceedings Article

Accelerating Stochastic Gradient Descent using Predictive Variance Reduction

Rie Johnson, +1 more

TL;DR: It is proved that this method enjoys the same fast convergence rate as those of stochastic dual coordinate ascent (SDCA) and Stochastic Average Gradient (SAG), but the analysis is significantly simpler and more intuitive.

...read moreread less

Journal ArticleDOI

Robust Stochastic Approximation Approach to Stochastic Programming

Arkadi Nemirovski, +3 more

- 01 Dec 2008 -

Siam Journal on Optimization

TL;DR: It is intended to demonstrate that a properly modified SA approach can be competitive and even significantly outperform the SAA method for a certain class of convex stochastic problems.

...read moreread less

Proceedings ArticleDOI

Solving large scale linear prediction problems using stochastic gradient descent algorithms

Tong Zhang

TL;DR: Stochastic gradient descent algorithms on regularized forms of linear prediction methods, related to online algorithms such as perceptron, are studied, and numerical rate of convergence for such algorithms is obtained.

...read moreread less

Journal Article

On the complexity of best-arm identification in multi-armed bandit models

Emilie Kaufmann, +2 more

- 01 Jan 2016 -

Journal of Machine Learning Research

TL;DR: This work introduces generic notions of complexity for the two dominant frameworks considered in the literature: fixed-budget and fixed-confidence settings, and provides the first known distribution-dependent lower bound on the complexity that involves information-theoretic quantities and holds when m ≥ 1 under general assumptions.

...read moreread less