scispace - formally typeset
Open AccessJournal ArticleDOI

Semi-Stochastic Gradient Descent Methods

Reads0
Chats0
TLDR
Semi-Stochastic Gradient Descent (S2GD) as mentioned in this paper runs for one or several epochs in each of which a single full gradient and a random number of stochastic gradients is computed, following a geometric law.
Abstract
In this paper we study the problem of minimizing the average of a large number of smooth convex loss functions. We propose a new method, S2GD (Semi-Stochastic Gradient Descent), which runs for one or several epochs in each of which a single full gradient and a random number of stochastic gradients is computed, following a geometric law. For strongly convex objectives, the method converges linearly. The total work needed for the method to output an epsilon-accurate solution in expectation, measured in the number of passes over data, is proportional to the condition number of the problem and inversely proportional to the number of functions forming the average. This is achieved by running the method with number of stochastic gradient evaluations per epoch proportional to conditioning of the problem. The SVRG method of Johnson and Zhang arises as a special case. To illustrate our theoretical results, S2GD only needs the workload equivalent to about 2.1 full gradient evaluations to find a 10e-6 accurate solution for a problem with 10e9 functions and a condition number of 10e3.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Federated Optimization: Distributed Machine Learning for On-Device Intelligence

TL;DR: A new and increasingly relevant setting for distributed optimization in machine learning, where the data defining the optimization are unevenly distributed over an extremely large number of nodes, is introduced, to train a high-quality centralized model.
Posted Content

Minimizing Finite Sums with the Stochastic Average Gradient

TL;DR: In this paper, the stochastic average gradient (SAG) method was proposed to optimize the sum of a finite number of smooth convex functions, which achieves a faster convergence rate than black-box SG methods.
Posted Content

Federated Optimization: Distributed Optimization Beyond the Datacenter

TL;DR: This work introduces a new and increasingly relevant setting for distributed optimization in machine learning, where the data defining the optimization are distributed over an extremely large number of nodes, but the goal remains to train a high-quality centralized model.
Posted Content

Stochastic Variance Reduction for Nonconvex Optimization

TL;DR: In this paper, stochastic variance reduced gradient (SVRG) methods for nonconvex finite-sum problems were studied and the authors proved nonasymptotic rates of convergence to stationary points.
Proceedings Article

SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient

TL;DR: In this paper, the authors proposed a StochAstic Recursive Gradient Algorithm for finite-sum minimization (SARAH), which admits a simple recursive framework for updating stochastic gradient estimates.
References
More filters
Book

Introductory Lectures on Convex Optimization: A Basic Course

TL;DR: A polynomial-time interior-point method for linear optimization was proposed in this paper, where the complexity bound was not only in its complexity, but also in the theoretical pre- diction of its high efficiency was supported by excellent computational results.
Proceedings Article

Accelerating Stochastic Gradient Descent using Predictive Variance Reduction

TL;DR: It is proved that this method enjoys the same fast convergence rate as those of stochastic dual coordinate ascent (SDCA) and Stochastic Average Gradient (SAG), but the analysis is significantly simpler and more intuitive.
Journal ArticleDOI

Robust Stochastic Approximation Approach to Stochastic Programming

TL;DR: It is intended to demonstrate that a properly modified SA approach can be competitive and even significantly outperform the SAA method for a certain class of convex stochastic problems.
Proceedings ArticleDOI

Solving large scale linear prediction problems using stochastic gradient descent algorithms

Tong Zhang
TL;DR: Stochastic gradient descent algorithms on regularized forms of linear prediction methods, related to online algorithms such as perceptron, are studied, and numerical rate of convergence for such algorithms is obtained.
Journal Article

On the complexity of best-arm identification in multi-armed bandit models

TL;DR: This work introduces generic notions of complexity for the two dominant frameworks considered in the literature: fixed-budget and fixed-confidence settings, and provides the first known distribution-dependent lower bound on the complexity that involves information-theoretic quantities and holds when m ≥ 1 under general assumptions.
Related Papers (5)