Sensitivity and Generalization in Neural Networks: an Empirical Study

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

[...]

Jaehoon Lee¹, Lechao Xiao¹, Samuel S. Schoenholz¹, Yasaman Bahri¹, Roman Novak¹, Jascha Sohl-Dickstein¹, Jeffrey Pennington¹ - Show less +3 more•Institutions (1)

Google¹

18 Feb 2019-arXiv: Machine Learning

TL;DR: In this article, the authors show that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters.

...read moreread less

Abstract: A longstanding goal in deep learning research has been to precisely characterize training and generalization. However, the often complex loss landscapes of neural networks have made a theory of learning dynamics elusive. In this work, we show that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters. Furthermore, mirroring the correspondence between wide Bayesian neural networks and Gaussian processes, gradient-based training of wide neural networks with a squared loss produces test set predictions drawn from a Gaussian process with a particular compositional kernel. While these theoretical results are only exact in the infinite width limit, we nevertheless find excellent empirical agreement between the predictions of the original network and those of the linearized version even for finite practically-sized networks. This agreement is robust across different architectures, optimization methods, and loss functions.

...read moreread less

738 citations

Posted Content•

On the Spectral Bias of Neural Networks

[...]

Nasim Rahaman¹, Aristide Baratin², Devansh Arpit³, Felix Draxler¹, Min Lin⁴, Fred A. Hamprecht¹, Yoshua Bengio², Aaron Courville² - Show less +4 more•Institutions (4)

Heidelberg University¹, Université de Montréal², Salesforce.com³, National University of Singapore⁴

22 Jun 2018-arXiv: Machine Learning

TL;DR: This work shows that deep ReLU networks are biased towards low frequency functions, and studies the robustness of the frequency components with respect to parameter perturbation, to develop the intuition that the parameters must be finely tuned to express high frequency functions.

...read moreread less

Abstract: Neural networks are known to be a class of highly expressive functions able to fit even random input-output mappings with $100\%$ accuracy. In this work, we present properties of neural networks that complement this aspect of expressivity. By using tools from Fourier analysis, we show that deep ReLU networks are biased towards low frequency functions, meaning that they cannot have local fluctuations without affecting their global behavior. Intuitively, this property is in line with the observation that over-parameterized networks find simple patterns that generalize across data samples. We also investigate how the shape of the data manifold affects expressivity by showing evidence that learning high frequencies gets \emph{easier} with increasing manifold complexity, and present a theoretical understanding of this behavior. Finally, we study the robustness of the frequency components with respect to parameter perturbation, to develop the intuition that the parameters must be finely tuned to express high frequency functions.

...read moreread less

486 citations

Posted Content•

Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting

[...]

Jun Shu¹, Qi Xie¹, Lixuan Yi¹, Qian Zhao¹, Sanping Zhou¹, Zongben Xu¹, Deyu Meng¹ - Show less +3 more•Institutions (1)

Xi'an Jiaotong University¹

20 Feb 2019-arXiv: Learning

TL;DR: Synthetic and real experiments substantiate the capability of the method for achieving proper weighting functions in class imbalance and noisy label cases, fully complying with the common settings in traditional methods, and more complicated scenarios beyond conventional cases.

...read moreread less

Abstract: Current deep neural networks (DNNs) can easily overfit to biased training data with corrupted labels or class imbalance. Sample re-weighting strategy is commonly used to alleviate this issue by designing a weighting function mapping from training loss to sample weight, and then iterating between weight recalculating and classifier updating. Current approaches, however, need manually pre-specify the weighting function as well as its additional hyper-parameters. It makes them fairly hard to be generally applied in practice due to the significant variation of proper weighting schemes relying on the investigated problem and training data. To address this issue, we propose a method capable of adaptively learning an explicit weighting function directly from data. The weighting function is an MLP with one hidden layer, constituting a universal approximator to almost any continuous functions, making the method able to fit a wide range of weighting functions including those assumed in conventional research. Guided by a small amount of unbiased meta-data, the parameters of the weighting function can be finely updated simultaneously with the learning process of the classifiers. Synthetic and real experiments substantiate the capability of our method for achieving proper weighting functions in class imbalance and noisy label cases, fully complying with the common settings in traditional methods, and more complicated scenarios beyond conventional cases. This naturally leads to its better accuracy than other state-of-the-art methods.

...read moreread less

331 citations

Proceedings Article•

The role of over-parametrization in generalization of neural networks

[...]

Behnam Neyshabur, Zhiyuan Li¹, Srinadh Bhojanapalli², Yann LeCun³, Nathan Srebro² - Show less +1 more•Institutions (3)

Princeton University¹, Toyota Technological Institute at Chicago², New York University³

01 Jan 2019

289 citations

Proceedings Article•

Robustness and Generalization.

[...]

Huan Xu¹, Shie Mannor²•Institutions (2)

University of Texas at Austin¹, Technion – Israel Institute of Technology²

01 Jan 2010

TL;DR: In this article, the authors derive generalization bounds for learning algorithms based on their robustness: the property that if a testing sample is "similar" to a training sample, then the testing error is close to the training error.

...read moreread less

Abstract: We derive generalization bounds for learning algorithms based on their robustness: the property that if a testing sample is "similar" to a training sample, then the testing error is close to the training error. This provides a novel approach, different from complexity or stability arguments, to study generalization of learning algorithms. One advantage of the robustness approach, compared to previous methods, is the geometric intuition it conveys. Consequently, robustness-based analysis is easy to extend to learning in non-standard setups such as Markovian samples or quantile loss. We further show that a weak notion of robustness is both sufficient and necessary for generalizability, which implies that robustness is a fundamental property that is required for learning algorithms to work.

...read moreread less

252 citations

Collapse

Sensitivity and Generalization in Neural Networks: an Empirical Study

Citations

References

Related Papers (5)