Katyusha: the first direct acceleration of stochastic gradient methods
Zeyuan Allen-Zhu
- pp 1200-1205
TLDR
Katyusha as discussed by the authors is a primal-only stochastic gradient method with negative momentum on top of Nesterov's momentum that can be incorporated into a variance reduction based algorithm and speed it up.Abstract:
Nesterov's momentum trick is famously known for accelerating gradient descent, and has been proven useful in building fast iterative algorithms. However, in the stochastic setting, counterexamples exist and prevent Nesterov's momentum from providing similar acceleration, even if the underlying problem is convex. We introduce Katyusha, a direct, primal-only stochastic gradient method to fix this issue. It has a provably accelerated convergence rate in convex (off-line) stochastic optimization. The main ingredient is Katyusha momentum, a novel "negative momentum" on top of Nesterov's momentum that can be incorporated into a variance-reduction based algorithm and speed it up. Since variance reduction has been successfully applied to a growing list of practical problems, our paper suggests that in each of such cases, one could potentially give Katyusha a hug.read more
Citations
More filters
Posted Content
Lookahead Optimizer: k steps forward, 1 step back
TL;DR: Lookahead improves the learning stability and lowers the variance of its inner optimizer with negligible computation and memory cost, and can significantly improve the performance of SGD and Adam, even with their default hyperparameter settings.
Journal Article
Stochastic primal-dual coordinate method for regularized empirical risk minimization
Yuchen Zhang,Lin Xiao +1 more
TL;DR: This work proposes a stochastic primal-dual coordinate method, which alternates between maximizing over one (or more) randomly chosen dual variable and minimizing over the primal variable, and develops an extension to non-smooth and nonstrongly convex loss functions.
Journal Article
Federated Learning of a Mixture of Global and Local Models
Filip Hanzely,Peter Richtárik +1 more
TL;DR: This work proposes a new optimization formulation for training federated learning models that seeks an explicit trade-off between this traditional global model and the local models, which can be learned by each device from its own private data without any communication.
Posted Content
Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron
TL;DR: It is proved that constant step-size stochastic gradient descent (SGD) with Nesterov acceleration matches the convergence rate of the deterministic accelerated method for both convex and strongly-convex functions.
Journal ArticleDOI
Accelerated Distributed Nesterov Gradient Descent
Guannan Qu,Na Li +1 more
TL;DR: In this article, an accelerated distributed Nesterov gradient descent method was proposed for distributed optimization over a network, where the objective is to optimize a global function formed by a sum of local functions, using only local computation and communication.
References
More filters
Book
Introductory Lectures on Convex Optimization: A Basic Course
TL;DR: A polynomial-time interior-point method for linear optimization was proposed in this paper, where the complexity bound was not only in its complexity, but also in the theoretical pre- diction of its high efficiency was supported by excellent computational results.
Journal ArticleDOI
Smooth minimization of non-smooth functions
TL;DR: A new approach for constructing efficient schemes for non-smooth convex optimization is proposed, based on a special smoothing technique, which can be applied to functions with explicit max-structure, and can be considered as an alternative to black-box minimization.
Proceedings Article
Accelerating Stochastic Gradient Descent using Predictive Variance Reduction
Rie Johnson,Tong Zhang +1 more
TL;DR: It is proved that this method enjoys the same fast convergence rate as those of stochastic dual coordinate ascent (SDCA) and Stochastic Average Gradient (SAG), but the analysis is significantly simpler and more intuitive.
Proceedings Article
SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives
TL;DR: SAGA as discussed by the authors improves on the theory behind SAG and SVRG, with better theoretical convergence rates, and has support for composite objectives where a proximal operator is used on the regulariser.
Journal ArticleDOI
Efficiency of coordinate descent methods on huge-scale optimization problems
TL;DR: Surprisingly enough, for certain classes of objective functions, the proposed methods for solving huge-scale optimization problems are better than the standard worst-case bounds for deterministic algorithms.
Related Papers (5)
Accelerating Stochastic Gradient Descent using Predictive Variance Reduction
Rie Johnson,Tong Zhang +1 more