scispace - formally typeset
Open AccessPosted Content

Manifold Mixup: Better Representations by Interpolating Hidden States.

Reads0
Chats0
TLDR
Manifold Mixup, a simple regularizer that encourages neural networks to predict less confidently on interpolations of hidden representations, improves strong baselines in supervised learning, robustness to single-step adversarial attacks, and test log-likelihood.
Abstract
Deep neural networks excel at learning the training data, but often provide incorrect and confident predictions when evaluated on slightly different test examples. This includes distribution shifts, outliers, and adversarial examples. To address these issues, we propose Manifold Mixup, a simple regularizer that encourages neural networks to predict less confidently on interpolations of hidden representations. Manifold Mixup leverages semantic interpolations as additional training signal, obtaining neural networks with smoother decision boundaries at multiple levels of representation. As a result, neural networks trained with Manifold Mixup learn class-representations with fewer directions of variance. We prove theory on why this flattening happens under ideal conditions, validate it on practical situations, and connect it to previous works on information theory and generalization. In spite of incurring no significant computation and being implemented in a few lines of code, Manifold Mixup improves strong baselines in supervised learning, robustness to single-step adversarial attacks, and test log-likelihood.

read more

Citations
More filters
Proceedings ArticleDOI

Interpolation consistency training for semi-supervised learning.

TL;DR: Interpolation Consistency Training (ICT) as mentioned in this paper encourages the prediction at an interpolation of unlabeled points to be consistent with the interpolations of the predictions at those points.
Posted Content

Puzzle Mix: Exploiting Saliency and Local Statistics for Optimal Mixup

TL;DR: The experiments show Puzzle Mix achieves the state of the art generalization and the adversarial robustness results compared to other mixup methods on CIFAR-100, Tiny-ImageNet, and ImageNet datasets.
Posted Content

REMIND Your Neural Network to Prevent Catastrophic Forgetting

TL;DR: REMIND is trained in an online manner, meaning it learns one example at a time, which is closer to how humans learn, and outperforms other methods for incremental class learning on the ImageNet ILSVRC-2012 dataset.
Posted Content

GraphMix: Improved Training of GNNs for Semi-Supervised Learning

TL;DR: GraphMix is presented, a regularization method for Graph Neural Network based semi-supervised object classification, whereby it is proposed to train a fully-connected network jointly with the graph neural network via parameter sharing and interpolation-based regularization.
Posted Content

SaliencyMix: A Saliency Guided Data Augmentation Strategy for Better Regularization

TL;DR: This work proposes SaliencyMix, a new state-of-the-art top-1 error-reducing model that carefully selects a representative image patch with the help of a saliency map and mixes this indicative patch with a target image that leads the model to learn more appropriate feature representation.
References
More filters
Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Book ChapterDOI

Visualizing and Understanding Convolutional Networks

TL;DR: A novel visualization technique is introduced that gives insight into the function of intermediate feature layers and the operation of the classifier in large Convolutional Network models, used in a diagnostic role to find model architectures that outperform Krizhevsky et al on the ImageNet classification benchmark.
Journal ArticleDOI

Approximation by superpositions of a sigmoidal function

TL;DR: It is demonstrated that finite linear combinations of compositions of a fixed, univariate function and a set of affine functionals can uniformly approximate any continuous function ofn real variables with support in the unit hypercube.
Proceedings Article

Intriguing properties of neural networks

TL;DR: It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.
Proceedings Article

Efficient Estimation of Word Representations in Vector Space

TL;DR: Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.