SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient

Open AccessPosted Content

SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient

- 01 Mar 2017 -

TLDR

A StochAstic Recursive grAdient algoritHm (SARAH), as well as its practical variant SARAH+, as a novel approach to the finite-sum minimization problems is proposed, and a linear convergence rate is proven under strong convexity assumption.

Abstract:

In this paper, we propose a StochAstic Recursive grAdient algoritHm (SARAH), as well as its practical variant SARAH+, as a novel approach to the finite-sum minimization problems. Different from the vanilla SGD and other modern stochastic methods such as SVRG, S2GD, SAG and SAGA, SARAH admits a simple recursive framework for updating stochastic gradient estimates; when comparing to SAG/SAGA, SARAH does not require a storage of past gradients. The linear convergence rate of SARAH is proven under strong convexity assumption. We also prove a linear convergence rate (in the strongly convex case) for an inner loop of SARAH, the property that SVRG does not possess. Numerical experiments demonstrate the efficiency of our algorithm.

Citations

PDF

Open Access

More filters

Posted Content

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator

Cong Fang, +3 more

- 04 Jul 2018 -

arXiv: Optimization and Control

TL;DR: This paper proposes a new technique named SPIDER, which can be used to track many deterministic quantities of interest with significantly reduced computational cost and proves that SPIDER-SFO nearly matches the algorithmic lower bound for finding approximate first-order stationary points under the gradient Lipschitz assumption in the finite-sum setting.

...read moreread less

Posted Content

An Investigation of Newton-Sketch and Subsampled Newton Methods

Albert S. Berahas, +2 more

- 17 May 2017 -

arXiv: Optimization and Control

TL;DR: This study focuses on practical versions of the two methods in which the resulting linear systems of equations are solved approximately, at every iteration, using an iterative solver: Hessian subsampling and randomized Hadamard transformations.

...read moreread less

Journal Article

Stochastic, distributed and federated optimization for machine learning

Jakub Konečný

- 30 Nov 2017 -

arXiv: Learning

TL;DR: This work proposes novel variants of stochastic gradient descent with a variance reduction property that enables linear convergence for strongly convex objectives in distributed setting, and introduces the concept of Federated Optimization/Learning, where the main motivation comes from industry when handling user-generated data.

...read moreread less

Proceedings ArticleDOI

An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods

Yanli Liu, +3 more

TL;DR: This paper revisits and improves the convergence of policy gradient, natural PG (NPG) methods, and their variance-reduced variants, under general smooth policy parametrizations, and proposes SRVR-NPG, which incorporates variancereduction into the NPG update.

...read moreread less

Posted Content

Quasi-Newton Methods for Deep Learning: Forget the Past, Just Sample.

Albert S. Berahas, +2 more

- 28 Jan 2019 -

arXiv: Optimization and Control

TL;DR: Numerical tests on a toy classification problem as well as on popular benchmarking neural network training tasks reveal that the sampled quasi-Newton methods outperform their classical variants.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Accelerating Stochastic Gradient Descent using Predictive Variance Reduction

Rie Johnson, +1 more

TL;DR: It is proved that this method enjoys the same fast convergence rate as those of stochastic dual coordinate ascent (SDCA) and Stochastic Average Gradient (SAG), but the analysis is significantly simpler and more intuitive.

...read moreread less

Journal ArticleDOI

Pegasos: primal estimated sub-gradient solver for SVM

Shai Shalev-Shwartz, +3 more

- 01 Mar 2011 -

Mathematical Programming

TL;DR: A simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines, which is particularly well suited for large text classification problems, and demonstrates an order-of-magnitude speedup over previous SVM learning methods.

...read moreread less

Journal Article

Stochastic dual coordinate ascent methods for regularized loss

Shai Shalev-Shwartz, +1 more

- 01 Jan 2013 -

Journal of Machine Learning Research

TL;DR: In this article, a convergence analysis of stochastic dual coordinate coordinate ascent (SDCA) is presented, showing that this class of methods enjoy strong theoretical guarantees that are comparable or better than SGD.

...read moreread less

Proceedings Article

A Stochastic Gradient Method with an Exponential Convergence _Rate for Finite Training Sets

Nicolas Le Roux, +2 more

TL;DR: In this paper, a new stochastic gradient method was proposed to optimize the sum of a finite set of smooth functions, where the sum is strongly convex, with a memory of previous gradient values in order to achieve a linear convergence rate.

...read moreread less

Journal ArticleDOI

A proximal stochastic gradient method with progressive variance reduction

Lin Xiao, +2 more

- 10 Dec 2014 -

Siam Journal on Optimization

TL;DR: This work proposes and analyzes a new proximal stochastic gradient method, which uses a multistage scheme to progressively reduce the variance of the stochastics gradient.

...read moreread less