scispace - formally typeset
Open AccessPosted Content

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey.

TLDR
A unified algorithmic framework is introduced for incremental methods for minimizing a sum P m=1 fi(x) consisting of a large number of convex component functions fi, including the advantages offered by randomization in the selection of components.
Abstract
We survey incremental methods for minimizing a sum P m=1 fi(x) consisting of a large number of convex component functions fi. Our methods consist of iterations applied to single components, and have proved very effective in practice. We introduce a unified algorithmic framework for a variety of such methods, some involving gradient and subgradient iterations, which are known, and some involving combinations of subgradient and proximal methods, which are new and offer greater flexibility in exploiting the special structure of fi. We provide an analysis of the convergence and rate of convergence properties of these methods, including the advantages offered by randomization in the selection of components. We also survey applications in inference/machine learning, signal processing, and large-scale and distributed optimization.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Practical recommendations for gradient-based training of deep architectures

TL;DR: Overall, this chapter describes elements of the practice used to successfully and efficiently train and debug large-scale and often deep multi-layer neural networks and closes with open questions about the training difficulties observed with deeper architectures.
Book ChapterDOI

Practical recommendations for gradient-based training of deep architectures

TL;DR: In this article, the authors present a practical guide with recommendations for some of the most commonly used hyperparameters, in particular in the context of learning algorithms based on back-propagated gradient and gradient-based optimization.
Journal ArticleDOI

A proximal stochastic gradient method with progressive variance reduction

TL;DR: This work proposes and analyzes a new proximal stochastic gradient method, which uses a multistage scheme to progressively reduce the variance of the stochastics gradient.
Journal ArticleDOI

Diffusion Adaptation Strategies for Distributed Optimization and Learning Over Networks

TL;DR: An adaptive diffusion mechanism to optimize global cost functions in a distributed manner over a network of nodes, which endow networks with adaptation abilities that enable the individual nodes to continue learning even when the cost function changes with time.
References
More filters
Book

Convex Optimization

TL;DR: In this article, the focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them, and a comprehensive introduction to the subject is given. But the focus of this book is not on the optimization problem itself, but on the problem of finding the appropriate technique to solve it.
Book

Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.
Journal ArticleDOI

A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems

TL;DR: A new fast iterative shrinkage-thresholding algorithm (FISTA) which preserves the computational simplicity of ISTA but with a global rate of convergence which is proven to be significantly better, both theoretically and practically.
Proceedings Article

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.

TL;DR: Adaptive subgradient methods as discussed by the authors dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning, which allows us to find needles in haystacks in the form of very predictive but rarely seen features.
Related Papers (5)