Semi-Stochastic Gradient Descent Methods
Jakub Konečný,Peter Richtárik +1 more
Reads0
Chats0
TLDR
Semi-Stochastic Gradient Descent (S2GD) as mentioned in this paper runs for one or several epochs in each of which a single full gradient and a random number of stochastic gradients is computed, following a geometric law.Abstract:
In this paper we study the problem of minimizing the average of a large number of smooth convex loss functions. We propose a new method, S2GD (Semi-Stochastic Gradient Descent), which runs for one or several epochs in each of which a single full gradient and a random number of stochastic gradients is computed, following a geometric law. For strongly convex objectives, the method converges linearly. The total work needed for the method to output an epsilon-accurate solution in expectation, measured in the number of passes over data, is proportional to the condition number of the problem and inversely proportional to the number of functions forming the average. This is achieved by running the method with number of stochastic gradient evaluations per epoch proportional to conditioning of the problem. The SVRG method of Johnson and Zhang arises as a special case. To illustrate our theoretical results, S2GD only needs the workload equivalent to about 2.1 full gradient evaluations to find a 10e-6 accurate solution for a problem with 10e9 functions and a condition number of 10e3.read more
Citations
More filters
Posted Content
Federated Optimization: Distributed Machine Learning for On-Device Intelligence
TL;DR: A new and increasingly relevant setting for distributed optimization in machine learning, where the data defining the optimization are unevenly distributed over an extremely large number of nodes, is introduced, to train a high-quality centralized model.
Posted Content
Minimizing Finite Sums with the Stochastic Average Gradient
TL;DR: In this paper, the stochastic average gradient (SAG) method was proposed to optimize the sum of a finite number of smooth convex functions, which achieves a faster convergence rate than black-box SG methods.
Posted Content
Federated Optimization: Distributed Optimization Beyond the Datacenter
TL;DR: This work introduces a new and increasingly relevant setting for distributed optimization in machine learning, where the data defining the optimization are distributed over an extremely large number of nodes, but the goal remains to train a high-quality centralized model.
Posted Content
Stochastic Variance Reduction for Nonconvex Optimization
TL;DR: In this paper, stochastic variance reduced gradient (SVRG) methods for nonconvex finite-sum problems were studied and the authors proved nonasymptotic rates of convergence to stationary points.
Proceedings Article
SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient
TL;DR: In this paper, the authors proposed a StochAstic Recursive Gradient Algorithm for finite-sum minimization (SARAH), which admits a simple recursive framework for updating stochastic gradient estimates.
References
More filters
Book
Introductory Lectures on Convex Optimization: A Basic Course
TL;DR: A polynomial-time interior-point method for linear optimization was proposed in this paper, where the complexity bound was not only in its complexity, but also in the theoretical pre- diction of its high efficiency was supported by excellent computational results.
Proceedings Article
Accelerating Stochastic Gradient Descent using Predictive Variance Reduction
Rie Johnson,Tong Zhang +1 more
TL;DR: It is proved that this method enjoys the same fast convergence rate as those of stochastic dual coordinate ascent (SDCA) and Stochastic Average Gradient (SAG), but the analysis is significantly simpler and more intuitive.
Journal ArticleDOI
Robust Stochastic Approximation Approach to Stochastic Programming
TL;DR: It is intended to demonstrate that a properly modified SA approach can be competitive and even significantly outperform the SAA method for a certain class of convex stochastic problems.
Proceedings ArticleDOI
Solving large scale linear prediction problems using stochastic gradient descent algorithms
TL;DR: Stochastic gradient descent algorithms on regularized forms of linear prediction methods, related to online algorithms such as perceptron, are studied, and numerical rate of convergence for such algorithms is obtained.
Journal Article
On the complexity of best-arm identification in multi-armed bandit models
TL;DR: This work introduces generic notions of complexity for the two dominant frameworks considered in the literature: fixed-budget and fixed-confidence settings, and provides the first known distribution-dependent lower bound on the complexity that involves information-theoretic quantities and holds when m ≥ 1 under general assumptions.
Related Papers (5)
Accelerating Stochastic Gradient Descent using Predictive Variance Reduction
Rie Johnson,Tong Zhang +1 more