scispace - formally typeset
Open AccessProceedings Article

On the Optimization Landscape of Tensor Decompositions

Rong Ge, +1 more
- Vol. 30, pp 3653-3663
Reads0
Chats0
TLDR
For the random over-complete tensor decomposition problem, this article showed that for any small constant ε > 0, among the set of points with function values ε ≥ 0, all the local minima are approximate global minima.
Abstract
Non-convex optimization with local search heuristics has been widely used in machine learning, achieving many state-of-art results. It becomes increasingly important to understand why they can work for these NP-hard problems on typical data. The landscape of many objective functions in learning has been conjectured to have the geometric property that ``all local optima are (approximately) global optima'', and thus they can be solved efficiently by local search algorithms. However, establishing such property can be very difficult. In this paper, we analyze the optimization landscape of the random over-complete tensor decomposition problem, which has many applications in unsupervised leaning, especially in learning latent variable models. In practice, it can be efficiently solved by gradient ascent on a non-convex objective. We show that for any small constant $\epsilon > 0$, among the set of points with function values $(1+\epsilon)$-factor larger than the expectation of the function, all the local maxima are approximate global maxima. Previously, the best-known result only characterizes the geometry in small neighborhoods around the true components. Our result implies that even with an initialization that is barely better than the random guess, the gradient ascent algorithm is guaranteed to solve this problem. Our main technique uses Kac-Rice formula and random matrix theory. To our best knowledge, this is the first time when Kac-Rice formula is successfully applied to counting the number of local minima of a highly-structured random polynomial with dependent coefficients.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

TL;DR: This tutorial-style overview highlights the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees and reviews two contrasting approaches: two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and global landscape analysis and initialization-free algorithms.
Proceedings ArticleDOI

Finding approximate local minima faster than gradient descent

TL;DR: In this paper, a non-convex second-order optimization algorithm is proposed that is guaranteed to return an approximate local minimum in time which scales linearly in the underlying dimension and the number of training examples.
Journal ArticleDOI

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution

TL;DR: In this paper, the authors show that gradient descent can achieve near-optimal statistical and computational guarantees without explicit regularization for phase retrieval, low-rank matrix completion, and blind deconvolution.
Journal ArticleDOI

Image Reconstruction: From Sparsity to Data-Adaptive Methods and Machine Learning

TL;DR: The field of medical image reconstruction has seen roughly four types of methods: analytical methods, such as filtered backprojection (FBP) for X-ray computed tomography (CT) and the inverse Fourier transform for magnetic resonance imaging (MRI), based on simple mathematical models for the imaging systems.
Journal ArticleDOI

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution

TL;DR: By marrying statistical modeling with generic optimization theory, a general recipe for analyzing the trajectories of iterative algorithms via a leave-one-out perturbation argument is developed, establishing that gradient descent achieves near-optimal statistical and computational guarantees without explicit regularization.
Related Papers (5)