Understanding deep learning requires rethinking generalization.

Open AccessProceedings Article

Understanding deep learning requires rethinking generalization.

Chats0

TLDR

This article showed that deep neural networks can fit a random labeling of the training data, and that this phenomenon is qualitatively unaffected by explicit regularization, and occurs even if the true images are replaced by completely unstructured random noise.

Abstract:

Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family, or to the regularization techniques used during training. Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice. Specifically, our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization, and occurs even if we replace the true images by completely unstructured random noise. We corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points as it usually does in practice. We interpret our experimental findings by comparison with traditional models.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Recent advances in convolutional neural networks

Jiuxiang Gu, +10 more

- 01 May 2018 -

Pattern Recognition

TL;DR: A broad survey of the recent advances in convolutional neural networks can be found in this article, where the authors discuss the improvements of CNN on different aspects, namely, layer design, activation function, loss function, regularization, optimization and fast computation.

...read moreread less

Posted Content

On Calibration of Modern Neural Networks

Chuan Guo, +3 more

- 14 Jun 2017 -

arXiv: Learning

TL;DR: It is discovered that modern neural networks, unlike those from a decade ago, are poorly calibrated, and on most datasets, temperature scaling -- a single-parameter variant of Platt Scaling -- is surprisingly effective at calibrating predictions.

...read moreread less

Journal ArticleDOI

Understanding deep learning (still) requires rethinking generalization

Chiyuan Zhang, +4 more

- 22 Feb 2021 -

Communications of The ACM

TL;DR: These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.

...read moreread less

Posted Content

mixup: Beyond Empirical Risk Minimization

Hongyi Zhang, +3 more

- 25 Oct 2017 -

arXiv: Learning

TL;DR: Mixup as discussed by the authors trains a neural network on convex combinations of pairs of examples and their labels, and regularizes the neural network to favor simple linear behavior in between training examples, which improves the generalization of state-of-the-art neural network architectures.

...read moreread less

Proceedings ArticleDOI

Deep Image Prior

Victor Lempitsky, +2 more

TL;DR: It is shown that a randomly-initialized neural network can be used as a handcrafted prior with excellent results in standard inverse problems such as denoising, superresolution, and inpainting.

...read moreread less

Collapse

Understanding deep learning requires rethinking generalization.

Citations

Recent advances in convolutional neural networks

On Calibration of Modern Neural Networks

Understanding deep learning (still) requires rethinking generalization

mixup: Beyond Empirical Risk Minimization

Deep Image Prior

Related Papers (5)

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet: A large-scale hierarchical image database

Dropout: a simple way to prevent neural networks from overfitting

Trending Questions (1)