Implicit Regularization in Matrix Factorization

Open AccessProceedings Article

Implicit Regularization in Matrix Factorization

Suriya Gunasekar, +4 more

- Vol. 30, pp 6151-6159

Chats0

TLDR

In this article, the authors studied implicit regularization when optimizing an underdetermined quadratic objective over a matrix $X$ with gradient descent on a factorization of X, and provided empirical and theoretical evidence that with small enough step sizes and initialization close enough to the origin, gradient descent converges to the minimum nuclear norm solution.

Abstract:

We study implicit regularization when optimizing an underdetermined quadratic objective over a matrix $X$ with gradient descent on a factorization of X. We conjecture and provide empirical and theoretical evidence that with small enough step sizes and initialization close enough to the origin, gradient descent on a full dimensional factorization converges to the minimum nuclear norm solution.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Reconciling modern machine-learning practice and the classical bias-variance trade-off.

Mikhail Belkin, +3 more

- 06 Aug 2019 -

Proceedings of the National Academy of S...

TL;DR: This work shows how classical theory and modern practice can be reconciled within a single unified performance curve and proposes a mechanism underlying its emergence, and provides evidence for the existence and ubiquity of double descent for a wide spectrum of models and datasets.

...read moreread less

Posted Content

Reconciling modern machine learning practice and the bias-variance trade-off

Mikhail Belkin, +3 more

- 28 Dec 2018 -

arXiv: Machine Learning

TL;DR: This paper reconciles the classical understanding and the modern practice within a unified performance curve that subsumes the textbook U-shaped bias-variance trade-off curve by showing how increasing model capacity beyond the point of interpolation results in improved performance.

...read moreread less

Journal ArticleDOI

The implicit bias of gradient descent on separable data

Daniel Soudry, +4 more

- 01 Jan 2018 -

Journal of Machine Learning Research

TL;DR: In this paper, the authors examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets, and show that the predictor converges to the direction of the max-margin (hard margin SVM) solution.

...read moreread less

Posted Content

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

Yuanzhi Li, +1 more

- 03 Aug 2018 -

arXiv: Learning

TL;DR: In this article, the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization is studied.

...read moreread less

Posted Content

On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

Lénaïc Chizat, +1 more

- 24 May 2018 -

arXiv: Optimization and Control

TL;DR: It is shown that, when initialized correctly and in the many-particle limit, this gradient flow, although non-convex, converges to global minimizers and involves Wasserstein gradient flows, a by-product of optimal transport theory.

...read moreread less

Collapse

References

PDF

Open Access

More filters

SciPy: Open Source Scientific Tools for Python

Eric Jones, +2 more

Journal Article

Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization

Benjamin Recht, +2 more

- 01 Aug 2010 -

Siam Journal on Control and Optimization

TL;DR: In this paper, it was shown that if a certain restricted isometry property holds for the linear transformation defining the constraints, the minimum-rank solution can be recovered by solving a convex optimization problem, namely, the minimization of the nuclear norm over the given affine space.

...read moreread less

Posted Content

Understanding deep learning requires rethinking generalization

Chiyuan Zhang, +4 more

- 10 Nov 2016 -

arXiv: Learning

TL;DR: The authors showed that deep neural networks can fit a random labeling of the training data, and that this phenomenon is qualitatively unaffected by explicit regularization, and occurs even if the true images are replaced by completely unstructured random noise.

...read moreread less

Journal ArticleDOI

Exact matrix completion via convex optimization

Emmanuel J. Candès, +1 more

- 01 Jun 2012 -

Communications of The ACM

TL;DR: In this paper, a convex programming problem is used to find the matrix with the minimum nuclear norm that is consistent with the observed entries in a low-rank matrix, which is then used to recover all the missing entries from most sufficiently large subsets.

...read moreread less

Proceedings Article

Multi-Task Feature Learning

Andreas Argyriou, +2 more

TL;DR: The method builds upon the well-known 1-norm regularization problem using a new regularizer which controls the number of learned features common for all the tasks, and develops an iterative algorithm for solving it.

...read moreread less

Related Papers (5)

The implicit bias of gradient descent on separable data

Daniel Soudry, +4 more

- 01 Jan 2018 -

Journal of Machine Learning Research

arXiv: Learning

Implicit Regularization in Matrix Factorization

Citations

Reconciling modern machine-learning practice and the classical bias-variance trade-off.

Reconciling modern machine learning practice and the bias-variance trade-off

The implicit bias of gradient descent on separable data

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

References

SciPy: Open Source Scientific Tools for Python

Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization

Understanding deep learning requires rethinking generalization

Exact matrix completion via convex optimization

Multi-Task Feature Learning

Related Papers (5)

The implicit bias of gradient descent on separable data

Neural Tangent Kernel: Convergence and Generalization in Neural Networks

Understanding deep learning requires rethinking generalization.

Deep Residual Learning for Image Recognition

Gradient Descent Provably Optimizes Over-parameterized Neural Networks