First-order Methods Almost Always Avoid Saddle Points

Open AccessPosted Content

First-order Methods Almost Always Avoid Saddle Points

- 20 Oct 2017 -

TLDR

In this article, it was shown that first-order methods avoid saddle points for almost all initializations, and that neither access to second-order derivative information nor randomness beyond initialization is necessary to provably avoid saddle point.

Abstract:

We establish that first-order methods avoid saddle points for almost all initializations. Our results apply to a wide variety of first-order methods, including gradient descent, block coordinate descent, mirror descent and variants thereof. The connecting thread is that such algorithms can be studied from a dynamical systems perspective in which appropriate instantiations of the Stable Manifold Theorem allow for a global stability analysis. Thus, neither access to second-order derivative information nor randomness beyond initialization is necessary to provably avoid saddle points.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A high-bias, low-variance introduction to Machine Learning for physicists

Pankaj Mehta, +6 more

- 30 May 2019 -

Physics Reports

TL;DR: The review begins by covering fundamental concepts in ML and modern statistics such as the bias-variance tradeoff, overfitting, regularization, generalization, and gradient descent before moving on to more advanced topics in both supervised and unsupervised learning.

...read moreread less

Journal ArticleDOI

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Yuejie Chi, +2 more

- 15 Oct 2019 -

IEEE Transactions on Signal Processing

TL;DR: This tutorial-style overview highlights the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees and reviews two contrasting approaches: two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and global landscape analysis and initialization-free algorithms.

...read moreread less

Journal ArticleDOI

Denoising Prior Driven Deep Neural Network for Image Restoration

Weisheng Dong, +5 more

- 01 Oct 2019 -

IEEE Transactions on Pattern Analysis an...

TL;DR: Zhang et al. as mentioned in this paper proposed a convolutional neural network (CNN) based denoiser that can exploit the multi-scale redundancies of natural images and leverages the prior of the observation model.

...read moreread less

Proceedings Article

A Lyapunov-based Approach to Safe Reinforcement Learning

Yinlam Chow, +3 more

TL;DR: In this paper, the authors propose a method for constructing Lyapunov functions, which provide an effective way to guarantee the global safety of a behavior policy during training via a set of local linear constraints.

...read moreread less

Posted Content

Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile

Panayotis Mertikopoulos, +10 more

- 07 Jul 2018 -

arXiv: Learning

TL;DR: This paper showed that mirror descent may fail to converge even in bilinear models with a unique solution, but this deficiency is mitigated by optimism: by taking an extra-gradient step, optimistic mirror descent (OMD) converges in all coherent problems.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Understanding Machine Learning: From Theory To Algorithms

Shai Shalev-Shwartz, +1 more

TL;DR: The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way in an advanced undergraduate or beginning graduate course.

...read moreread less

Book

Optimization Algorithms on Matrix Manifolds

Pierre-Antoine Absil, +2 more

TL;DR: Optimization Algorithms on Matrix Manifolds offers techniques with broad applications in linear algebra, signal processing, data mining, computer vision, and statistical analysis and will be of interest to applied mathematicians, engineers, and computer scientists.

...read moreread less

Book

Differential Equations and Dynamical Systems

Lawrence Perko

TL;DR: In this paper, the Third Edition of the Third edition of Linear Systems: Local Theory and Nonlinear Systems: Global Theory (LTLT) is presented, along with an extended version of the second edition.

...read moreread less

Journal ArticleDOI

Matrix Completion From a Few Entries

Raghunandan H. Keshavan, +2 more

- 01 Jun 2010 -

IEEE Transactions on Information Theory

TL;DR: OptimSpace as mentioned in this paper reconstructs an n? × n matrix from a uniformly random subset of its entries with probability larger than 1 - 1/n3, which is a generalization of the result of Friedman-Kahn-Szemeredi and Feige-Ofek.

...read moreread less

Journal ArticleDOI

Some NP-complete problems in quadratic and nonlinear programming

Katta G. Murty, +1 more

- 01 Nov 1987 -

Mathematical Programming

TL;DR: A special class of indefinite quadratic programs is constructed, with simple constraints and integer data, and it is shown that checking (a) or (b) on this class is NP-complete.

...read moreread less

Collapse

Mathematical Programming

Generative Adversarial Nets

Ian Goodfellow, +7 more

First-order Methods Almost Always Avoid Saddle Points

Citations

A high-bias, low-variance introduction to Machine Learning for physicists

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Denoising Prior Driven Deep Neural Network for Image Restoration

A Lyapunov-based Approach to Safe Reinforcement Learning

Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile

References

Understanding Machine Learning: From Theory To Algorithms

Optimization Algorithms on Matrix Manifolds

Differential Equations and Dynamical Systems

Matrix Completion From a Few Entries

Some NP-complete problems in quadratic and nonlinear programming

Related Papers (5)

Gradient Descent Only Converges to Minimizers

Escaping From Saddle Points --- Online Stochastic Gradient for Tensor Decomposition

Global Stability of Dynamical Systems

Cubic regularization of Newton method and its global performance

Generative Adversarial Nets