scispace - formally typeset
Open AccessProceedings Article

Implicit Regularization in Matrix Factorization

Reads0
Chats0
TLDR
In this article, the authors studied implicit regularization when optimizing an underdetermined quadratic objective over a matrix $X$ with gradient descent on a factorization of X, and provided empirical and theoretical evidence that with small enough step sizes and initialization close enough to the origin, gradient descent converges to the minimum nuclear norm solution.
Abstract
We study implicit regularization when optimizing an underdetermined quadratic objective over a matrix $X$ with gradient descent on a factorization of X. We conjecture and provide empirical and theoretical evidence that with small enough step sizes and initialization close enough to the origin, gradient descent on a full dimensional factorization converges to the minimum nuclear norm solution.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Reconciling modern machine-learning practice and the classical bias-variance trade-off.

TL;DR: This work shows how classical theory and modern practice can be reconciled within a single unified performance curve and proposes a mechanism underlying its emergence, and provides evidence for the existence and ubiquity of double descent for a wide spectrum of models and datasets.
Posted Content

Reconciling modern machine learning practice and the bias-variance trade-off

TL;DR: This paper reconciles the classical understanding and the modern practice within a unified performance curve that subsumes the textbook U-shaped bias-variance trade-off curve by showing how increasing model capacity beyond the point of interpolation results in improved performance.
Journal ArticleDOI

The implicit bias of gradient descent on separable data

TL;DR: In this paper, the authors examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets, and show that the predictor converges to the direction of the max-margin (hard margin SVM) solution.
Posted Content

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

TL;DR: In this article, the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization is studied.
Posted Content

On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

TL;DR: It is shown that, when initialized correctly and in the many-particle limit, this gradient flow, although non-convex, converges to global minimizers and involves Wasserstein gradient flows, a by-product of optimal transport theory.
References
More filters
Journal Article

Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization

TL;DR: In this paper, it was shown that if a certain restricted isometry property holds for the linear transformation defining the constraints, the minimum-rank solution can be recovered by solving a convex optimization problem, namely, the minimization of the nuclear norm over the given affine space.
Posted Content

Understanding deep learning requires rethinking generalization

TL;DR: The authors showed that deep neural networks can fit a random labeling of the training data, and that this phenomenon is qualitatively unaffected by explicit regularization, and occurs even if the true images are replaced by completely unstructured random noise.
Journal ArticleDOI

Exact matrix completion via convex optimization

TL;DR: In this paper, a convex programming problem is used to find the matrix with the minimum nuclear norm that is consistent with the observed entries in a low-rank matrix, which is then used to recover all the missing entries from most sufficiently large subsets.
Proceedings Article

Multi-Task Feature Learning

TL;DR: The method builds upon the well-known 1-norm regularization problem using a new regularizer which controls the number of learned features common for all the tasks, and develops an iterative algorithm for solving it.
Related Papers (5)