scispace - formally typeset
Open AccessProceedings Article

A simple proximal stochastic gradient method for nonsmooth nonconvex optimization

Zhize Li, +1 more
- Vol. 31, pp 5569-5579
TLDR
ProxSVRG+ as discussed by the authors is a proximal stochastic gradient algorithm based on variance reduction, which can automatically switch to the faster linear convergence in some regions as long as the objective function satisfies the Polyak-Łojasiewicz condition locally in these regions.
Abstract
We analyze stochastic gradient algorithms for optimizing nonconvex, nonsmooth finite-sum problems. In particular, the objective function is given by the summation of a differentiable (possibly nonconvex) component, together with a possibly non-differentiable but convex component. We propose a proximal stochastic gradient algorithm based on variance reduction, called ProxSVRG+. Our main contribution lies in the analysis of ProxSVRG+. It recovers several existing convergence results and improves/generalizes them (in terms of the number of stochastic gradient oracle calls and proximal oracle calls). In particular, ProxSVRG+ generalizes the best results given by the SCSG algorithm, recently proposed by [Lei et al., 2017] for the smooth nonconvex case. ProxSVRG+ is also more straightforward than SCSG and yields simpler analysis. Moreover, ProxSVRG+ outperforms the deterministic proximal gradient descent (ProxGD) for a wide range of minibatch sizes, which partially solves an open problem proposed in [Reddi et al., 2016]. Also, ProxSVRG+ uses much less proximal oracle calls than ProxSVRG [Reddi et al., 2016]. Moreover, for nonconvex functions satisfied Polyak-Łojasiewicz condition, we prove that ProxSVRG+ achieves a global linear convergence rate without restart unlike ProxSVRG. Thus, it can automatically switch to the faster linear convergence in some regions as long as the objective function satisfies the PL condition locally in these regions. Finally, we conduct several experiments and the experimental results are consistent with the theoretical results.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

SpiderBoost: A Class of Faster Variance-reduced Algorithms for Nonconvex Optimization.

TL;DR: SpiderBoost is proposed as an improved scheme that allows much larger stepsize without sacrificing the convergence rate, and hence runs substantially faster in practice, and extends much more easily to proximal algorithms with guaranteed convergence for solving composite optimization problems.
Posted Content

ProxSARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex Optimization

TL;DR: A new stochastic first-order algorithmic framework to solve stochastically composite nonconvex optimization problems that covers both finite-sum and expectation settings and new constant and adaptive step-sizes that help to achieve desired complexity bounds while improving practical performance are proposed.
Posted Content

PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization

TL;DR: The results demonstrate that PAGE not only converges much faster than SGD in training but also achieves the higher test accuracy, validating the theoretical results and confirming the practical superiority of PAGE.
Posted Content

An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient

TL;DR: In this paper, the authors revisited the stochastic variance-reduced policy gradient (SVRPG) method proposed by Papini et al. and showed that it can find an approximate stationary point of the performance function within $O(1/\epsilon^{5/3})$ trajectories.
Posted Content

Stochastic AUC Maximization with Deep Neural Networks.

TL;DR: Stochastic AUC maximization problem with a deep neural network as the predictive model is considered and Polyak-Łojasiewicz (PL) condition is explored, which enables us to develop new stochastic algorithms with even faster convergence rate and more practical step size scheme.
References
More filters
Book

Introductory Lectures on Convex Optimization: A Basic Course

TL;DR: A polynomial-time interior-point method for linear optimization was proposed in this paper, where the complexity bound was not only in its complexity, but also in the theoretical pre- diction of its high efficiency was supported by excellent computational results.
Proceedings Article

Accelerating Stochastic Gradient Descent using Predictive Variance Reduction

TL;DR: It is proved that this method enjoys the same fast convergence rate as those of stochastic dual coordinate ascent (SDCA) and Stochastic Average Gradient (SAG), but the analysis is significantly simpler and more intuitive.
Proceedings Article

SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives

TL;DR: SAGA as discussed by the authors improves on the theory behind SAG and SVRG, with better theoretical convergence rates, and has support for composite objectives where a proximal operator is used on the regulariser.
Posted Content

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

TL;DR: In this paper, the authors investigate the cause of the generalization drop in the large batch regime and present numerical evidence that supports the view that large-batch methods tend to converge to sharp minima of the training and testing functions.
Journal ArticleDOI

A proximal stochastic gradient method with progressive variance reduction

TL;DR: This work proposes and analyzes a new proximal stochastic gradient method, which uses a multistage scheme to progressively reduce the variance of the stochastics gradient.
Related Papers (5)