Residual Connections Encourage Iterative Inference

Open AccessPosted Content

Residual Connections Encourage Iterative Inference

Stanisław Jastrzębski, +5 more

- 13 Oct 2017 -

arXiv: Computer Vision and Pattern Recog...

Chats0

TLDR

It is shown that residual connections naturally encourage features of residual blocks to move along the negative gradient of loss as the authors go from one block to the next, and empirical analysis suggests that Resnets are able to perform both representation learning and iterative refinement.

Abstract:

Residual networks (Resnets) have become a prominent architecture in deep learning. However, a comprehensive understanding of Resnets is still a topic of ongoing research. A recent view argues that Resnets perform iterative refinement of features. We attempt to further expose properties of this aspect. To this end, we study Resnets both analytically and empirically. We formalize the notion of iterative refinement in Resnets by showing that residual connections naturally encourage features of residual blocks to move along the negative gradient of loss as we go from one block to the next. In addition, our empirical analysis suggests that Resnets are able to perform both representation learning and iterative refinement. In general, a Resnet block tends to concentrate representation learning behavior in the first few layers while higher layers perform iterative refinement of features. Finally we observe that sharing residual layers naively leads to representation explosion and counterintuitively, overfitting, and we show that simple existing strategies can help alleviating this problem.

Citations

PDF

Open Access

More filters

Posted Content

A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay

Leslie N. Smith

- 26 Mar 2018 -

arXiv: Learning

TL;DR: This report shows how to examine the training validation/test loss function for subtle clues of underfitting and overfitting and suggests guidelines for moving toward the optimal balance point and discusses how to increase/decrease the learning rate/momentum to speed up training.

...read moreread less

Posted Content

Understanding intermediate layers using linear classifier probes

Guillaume Alain, +1 more

- 05 Oct 2016 -

arXiv: Machine Learning

TL;DR: In this paper, the authors use linear classifiers to monitor the features at every layer of a model and measure how suitable they are for classification, which can be used to develop a better intuition about models and to diagnose potential problems.

...read moreread less

Journal ArticleDOI

A mean-field optimal control formulation of deep learning

Weinan E, +3 more

- 01 Mar 2019 -

Research in the Mathematical Sciences

TL;DR: In this article, the authors introduced the mathematical formulation of the population risk minimization problem in deep learning as a mean-field optimal control problem and proved optimality conditions of both the Hamilton-Jacobi-Bellman type and the Pontryagin type.

...read moreread less

Posted Content

Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs

Jonas Kubilius, +13 more

- 13 Sep 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: CORnet-S, a compact, recurrent ANN, is established, a shallow ANN with four anatomically mapped areas and recurrent connectivity, guided by Brain-Score, a new large-scale composite of neural and behavioral benchmarks for quantifying the functional fidelity of models of the primate ventral visual stream.

...read moreread less

Posted Content

Multi-level Residual Networks from Dynamical Systems View

Bo Chang, +4 more

- 27 Oct 2017 -

arXiv: Machine Learning

TL;DR: In this article, the authors adopt the dynamical systems point of view, and analyze the lesioning properties of ResNet both theoretically and experimentally, and propose a novel method for accelerating ResNet training.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Posted Content

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 10 Dec 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

...read moreread less

Posted Content

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, +1 more

- 11 Feb 2015 -

arXiv: Learning

TL;DR: Batch Normalization as mentioned in this paper normalizes layer inputs for each training mini-batch to reduce the internal covariate shift in deep neural networks, and achieves state-of-the-art performance on ImageNet.

...read moreread less

Dissertation

Learning Multiple Layers of Features from Tiny Images

Alex Krizhevsky

TL;DR: In this paper, the authors describe how to train a multi-layer generative model of natural images, using a dataset of millions of tiny colour images, described in the next section.

...read moreread less

Neural Computation

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, +11 more

- 01 Dec 2015 -

International Journal of Computer Vision

Residual Connections Encourage Iterative Inference

Citations

A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay

Understanding intermediate layers using linear classifier probes

A mean-field optimal control formulation of deep learning

Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs

Multi-level Residual Networks from Dynamical Systems View

References

Long short-term memory

Very Deep Convolutional Networks for Large-Scale Image Recognition

Deep Residual Learning for Image Recognition

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Learning Multiple Layers of Features from Tiny Images

Related Papers (5)

Deep Residual Learning for Image Recognition

Densely Connected Convolutional Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Long short-term memory

ImageNet Large Scale Visual Recognition Challenge

Trending Questions (1)