scispace - formally typeset
Open AccessPosted Content

First-order Methods Almost Always Avoid Saddle Points

TLDR
In this article, it was shown that first-order methods avoid saddle points for almost all initializations, and that neither access to second-order derivative information nor randomness beyond initialization is necessary to provably avoid saddle point.
Abstract
We establish that first-order methods avoid saddle points for almost all initializations. Our results apply to a wide variety of first-order methods, including gradient descent, block coordinate descent, mirror descent and variants thereof. The connecting thread is that such algorithms can be studied from a dynamical systems perspective in which appropriate instantiations of the Stable Manifold Theorem allow for a global stability analysis. Thus, neither access to second-order derivative information nor randomness beyond initialization is necessary to provably avoid saddle points.

read more

Citations
More filters
Journal ArticleDOI

A high-bias, low-variance introduction to Machine Learning for physicists

TL;DR: The review begins by covering fundamental concepts in ML and modern statistics such as the bias-variance tradeoff, overfitting, regularization, generalization, and gradient descent before moving on to more advanced topics in both supervised and unsupervised learning.
Journal ArticleDOI

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

TL;DR: This tutorial-style overview highlights the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees and reviews two contrasting approaches: two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and global landscape analysis and initialization-free algorithms.
Journal ArticleDOI

Denoising Prior Driven Deep Neural Network for Image Restoration

TL;DR: Zhang et al. as mentioned in this paper proposed a convolutional neural network (CNN) based denoiser that can exploit the multi-scale redundancies of natural images and leverages the prior of the observation model.
Proceedings Article

A Lyapunov-based Approach to Safe Reinforcement Learning

TL;DR: In this paper, the authors propose a method for constructing Lyapunov functions, which provide an effective way to guarantee the global safety of a behavior policy during training via a set of local linear constraints.
Posted Content

Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile

TL;DR: This paper showed that mirror descent may fail to converge even in bilinear models with a unique solution, but this deficiency is mitigated by optimism: by taking an extra-gradient step, optimistic mirror descent (OMD) converges in all coherent problems.
References
More filters
Book

Understanding Machine Learning: From Theory To Algorithms

TL;DR: The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way in an advanced undergraduate or beginning graduate course.
Book

Optimization Algorithms on Matrix Manifolds

TL;DR: Optimization Algorithms on Matrix Manifolds offers techniques with broad applications in linear algebra, signal processing, data mining, computer vision, and statistical analysis and will be of interest to applied mathematicians, engineers, and computer scientists.
Book

Differential Equations and Dynamical Systems

TL;DR: In this paper, the Third Edition of the Third edition of Linear Systems: Local Theory and Nonlinear Systems: Global Theory (LTLT) is presented, along with an extended version of the second edition.
Journal ArticleDOI

Matrix Completion From a Few Entries

TL;DR: OptimSpace as mentioned in this paper reconstructs an n? × n matrix from a uniformly random subset of its entries with probability larger than 1 - 1/n3, which is a generalization of the result of Friedman-Kahn-Szemeredi and Feige-Ofek.
Journal ArticleDOI

Some NP-complete problems in quadratic and nonlinear programming

TL;DR: A special class of indefinite quadratic programs is constructed, with simple constraints and integer data, and it is shown that checking (a) or (b) on this class is NP-complete.
Related Papers (5)