scispace - formally typeset
Open AccessPosted Content

SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient

TLDR
A StochAstic Recursive grAdient algoritHm (SARAH), as well as its practical variant SARAH+, as a novel approach to the finite-sum minimization problems is proposed, and a linear convergence rate is proven under strong convexity assumption.
Abstract
In this paper, we propose a StochAstic Recursive grAdient algoritHm (SARAH), as well as its practical variant SARAH+, as a novel approach to the finite-sum minimization problems. Different from the vanilla SGD and other modern stochastic methods such as SVRG, S2GD, SAG and SAGA, SARAH admits a simple recursive framework for updating stochastic gradient estimates; when comparing to SAG/SAGA, SARAH does not require a storage of past gradients. The linear convergence rate of SARAH is proven under strong convexity assumption. We also prove a linear convergence rate (in the strongly convex case) for an inner loop of SARAH, the property that SVRG does not possess. Numerical experiments demonstrate the efficiency of our algorithm.

read more

Citations
More filters
Posted Content

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator

TL;DR: This paper proposes a new technique named SPIDER, which can be used to track many deterministic quantities of interest with significantly reduced computational cost and proves that SPIDER-SFO nearly matches the algorithmic lower bound for finding approximate first-order stationary points under the gradient Lipschitz assumption in the finite-sum setting.
Posted Content

An Investigation of Newton-Sketch and Subsampled Newton Methods

TL;DR: This study focuses on practical versions of the two methods in which the resulting linear systems of equations are solved approximately, at every iteration, using an iterative solver: Hessian subsampling and randomized Hadamard transformations.
Journal Article

Stochastic, distributed and federated optimization for machine learning

Jakub Konečný
- 30 Nov 2017 - 
TL;DR: This work proposes novel variants of stochastic gradient descent with a variance reduction property that enables linear convergence for strongly convex objectives in distributed setting, and introduces the concept of Federated Optimization/Learning, where the main motivation comes from industry when handling user-generated data.
Proceedings ArticleDOI

An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods

TL;DR: This paper revisits and improves the convergence of policy gradient, natural PG (NPG) methods, and their variance-reduced variants, under general smooth policy parametrizations, and proposes SRVR-NPG, which incorporates variancereduction into the NPG update.
Posted Content

Quasi-Newton Methods for Deep Learning: Forget the Past, Just Sample.

TL;DR: Numerical tests on a toy classification problem as well as on popular benchmarking neural network training tasks reveal that the sampled quasi-Newton methods outperform their classical variants.
References
More filters
Proceedings Article

Accelerating Stochastic Gradient Descent using Predictive Variance Reduction

TL;DR: It is proved that this method enjoys the same fast convergence rate as those of stochastic dual coordinate ascent (SDCA) and Stochastic Average Gradient (SAG), but the analysis is significantly simpler and more intuitive.
Journal ArticleDOI

Pegasos: primal estimated sub-gradient solver for SVM

TL;DR: A simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines, which is particularly well suited for large text classification problems, and demonstrates an order-of-magnitude speedup over previous SVM learning methods.
Journal Article

Stochastic dual coordinate ascent methods for regularized loss

TL;DR: In this article, a convergence analysis of stochastic dual coordinate coordinate ascent (SDCA) is presented, showing that this class of methods enjoy strong theoretical guarantees that are comparable or better than SGD.
Proceedings Article

A Stochastic Gradient Method with an Exponential Convergence _Rate for Finite Training Sets

TL;DR: In this paper, a new stochastic gradient method was proposed to optimize the sum of a finite set of smooth functions, where the sum is strongly convex, with a memory of previous gradient values in order to achieve a linear convergence rate.
Journal ArticleDOI

A proximal stochastic gradient method with progressive variance reduction

TL;DR: This work proposes and analyzes a new proximal stochastic gradient method, which uses a multistage scheme to progressively reduce the variance of the stochastics gradient.
Related Papers (5)