Open AccessProceedings Article
Implicit Regularization in Matrix Factorization
Suriya Gunasekar,Blake Woodworth,Srinadh Bhojanapalli,Behnam Neyshabur,Nathan Srebro +4 more
- Vol. 30, pp 6151-6159
Reads0
Chats0
TLDR
In this article, the authors studied implicit regularization when optimizing an underdetermined quadratic objective over a matrix $X$ with gradient descent on a factorization of X, and provided empirical and theoretical evidence that with small enough step sizes and initialization close enough to the origin, gradient descent converges to the minimum nuclear norm solution.Abstract:
We study implicit regularization when optimizing an underdetermined quadratic objective over a matrix $X$ with gradient descent on a factorization of X. We conjecture and provide empirical and theoretical evidence that with small enough step sizes and initialization close enough to the origin, gradient descent on a full dimensional factorization converges to the minimum nuclear norm solution.read more
Citations
More filters
Journal ArticleDOI
Reconciling modern machine-learning practice and the classical bias-variance trade-off.
TL;DR: This work shows how classical theory and modern practice can be reconciled within a single unified performance curve and proposes a mechanism underlying its emergence, and provides evidence for the existence and ubiquity of double descent for a wide spectrum of models and datasets.
Posted Content
Reconciling modern machine learning practice and the bias-variance trade-off
TL;DR: This paper reconciles the classical understanding and the modern practice within a unified performance curve that subsumes the textbook U-shaped bias-variance trade-off curve by showing how increasing model capacity beyond the point of interpolation results in improved performance.
Journal ArticleDOI
The implicit bias of gradient descent on separable data
TL;DR: In this paper, the authors examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets, and show that the predictor converges to the direction of the max-margin (hard margin SVM) solution.
Posted Content
Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data
Yuanzhi Li,Yingyu Liang +1 more
TL;DR: In this article, the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization is studied.
Posted Content
On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport
Lénaïc Chizat,Francis Bach +1 more
TL;DR: It is shown that, when initialized correctly and in the many-particle limit, this gradient flow, although non-convex, converges to global minimizers and involves Wasserstein gradient flows, a by-product of optimal transport theory.
References
More filters
Journal Article
Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization
TL;DR: In this paper, it was shown that if a certain restricted isometry property holds for the linear transformation defining the constraints, the minimum-rank solution can be recovered by solving a convex optimization problem, namely, the minimization of the nuclear norm over the given affine space.
Posted Content
Understanding deep learning requires rethinking generalization
TL;DR: The authors showed that deep neural networks can fit a random labeling of the training data, and that this phenomenon is qualitatively unaffected by explicit regularization, and occurs even if the true images are replaced by completely unstructured random noise.
Journal ArticleDOI
Exact matrix completion via convex optimization
TL;DR: In this paper, a convex programming problem is used to find the matrix with the minimum nuclear norm that is consistent with the observed entries in a low-rank matrix, which is then used to recover all the missing entries from most sufficiently large subsets.
Proceedings Article
Multi-Task Feature Learning
TL;DR: The method builds upon the well-known 1-norm regularization problem using a new regularizer which controls the number of learned features common for all the tasks, and develops an iterative algorithm for solving it.