scispace - formally typeset
Open AccessProceedings ArticleDOI

Katyusha: the first direct acceleration of stochastic gradient methods

Zeyuan Allen-Zhu
- pp 1200-1205
TLDR
Katyusha as discussed by the authors is a primal-only stochastic gradient method with negative momentum on top of Nesterov's momentum that can be incorporated into a variance reduction based algorithm and speed it up.
Abstract
Nesterov's momentum trick is famously known for accelerating gradient descent, and has been proven useful in building fast iterative algorithms. However, in the stochastic setting, counterexamples exist and prevent Nesterov's momentum from providing similar acceleration, even if the underlying problem is convex. We introduce Katyusha, a direct, primal-only stochastic gradient method to fix this issue. It has a provably accelerated convergence rate in convex (off-line) stochastic optimization. The main ingredient is Katyusha momentum, a novel "negative momentum" on top of Nesterov's momentum that can be incorporated into a variance-reduction based algorithm and speed it up. Since variance reduction has been successfully applied to a growing list of practical problems, our paper suggests that in each of such cases, one could potentially give Katyusha a hug.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Lookahead Optimizer: k steps forward, 1 step back

TL;DR: Lookahead improves the learning stability and lowers the variance of its inner optimizer with negligible computation and memory cost, and can significantly improve the performance of SGD and Adam, even with their default hyperparameter settings.
Journal Article

Stochastic primal-dual coordinate method for regularized empirical risk minimization

TL;DR: This work proposes a stochastic primal-dual coordinate method, which alternates between maximizing over one (or more) randomly chosen dual variable and minimizing over the primal variable, and develops an extension to non-smooth and nonstrongly convex loss functions.
Journal Article

Federated Learning of a Mixture of Global and Local Models

TL;DR: This work proposes a new optimization formulation for training federated learning models that seeks an explicit trade-off between this traditional global model and the local models, which can be learned by each device from its own private data without any communication.
Posted Content

Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron

TL;DR: It is proved that constant step-size stochastic gradient descent (SGD) with Nesterov acceleration matches the convergence rate of the deterministic accelerated method for both convex and strongly-convex functions.
Journal ArticleDOI

Accelerated Distributed Nesterov Gradient Descent

TL;DR: In this article, an accelerated distributed Nesterov gradient descent method was proposed for distributed optimization over a network, where the objective is to optimize a global function formed by a sum of local functions, using only local computation and communication.
References
More filters
Book

Introductory Lectures on Convex Optimization: A Basic Course

TL;DR: A polynomial-time interior-point method for linear optimization was proposed in this paper, where the complexity bound was not only in its complexity, but also in the theoretical pre- diction of its high efficiency was supported by excellent computational results.
Journal ArticleDOI

Smooth minimization of non-smooth functions

TL;DR: A new approach for constructing efficient schemes for non-smooth convex optimization is proposed, based on a special smoothing technique, which can be applied to functions with explicit max-structure, and can be considered as an alternative to black-box minimization.
Proceedings Article

Accelerating Stochastic Gradient Descent using Predictive Variance Reduction

TL;DR: It is proved that this method enjoys the same fast convergence rate as those of stochastic dual coordinate ascent (SDCA) and Stochastic Average Gradient (SAG), but the analysis is significantly simpler and more intuitive.
Proceedings Article

SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives

TL;DR: SAGA as discussed by the authors improves on the theory behind SAG and SVRG, with better theoretical convergence rates, and has support for composite objectives where a proximal operator is used on the regulariser.
Journal ArticleDOI

Efficiency of coordinate descent methods on huge-scale optimization problems

TL;DR: Surprisingly enough, for certain classes of objective functions, the proposed methods for solving huge-scale optimization problems are better than the standard worst-case bounds for deterministic algorithms.
Related Papers (5)