Open AccessPosted Content
Residual Connections Encourage Iterative Inference
Reads0
Chats0
TLDR
It is shown that residual connections naturally encourage features of residual blocks to move along the negative gradient of loss as the authors go from one block to the next, and empirical analysis suggests that Resnets are able to perform both representation learning and iterative refinement.Abstract:
Residual networks (Resnets) have become a prominent architecture in deep learning. However, a comprehensive understanding of Resnets is still a topic of ongoing research.
A recent view argues that Resnets perform iterative refinement of features. We attempt to further expose properties of this aspect. To this end, we study Resnets both analytically and empirically. We formalize the notion of iterative refinement in Resnets by showing that residual connections naturally encourage features of residual blocks to move along the negative gradient of loss as we go from one block to the next. In addition, our empirical analysis suggests that Resnets are able to perform both representation learning and iterative refinement. In general, a Resnet block tends to concentrate representation learning behavior in the first few layers while higher layers perform iterative refinement of features. Finally we observe that sharing residual layers naively leads to representation explosion and counterintuitively, overfitting, and we show that simple existing strategies can help alleviating this problem.read more
Citations
More filters
Posted Content
A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay
TL;DR: This report shows how to examine the training validation/test loss function for subtle clues of underfitting and overfitting and suggests guidelines for moving toward the optimal balance point and discusses how to increase/decrease the learning rate/momentum to speed up training.
Posted Content
Understanding intermediate layers using linear classifier probes
Guillaume Alain,Yoshua Bengio +1 more
TL;DR: In this paper, the authors use linear classifiers to monitor the features at every layer of a model and measure how suitable they are for classification, which can be used to develop a better intuition about models and to diagnose potential problems.
Journal ArticleDOI
A mean-field optimal control formulation of deep learning
TL;DR: In this article, the authors introduced the mathematical formulation of the population risk minimization problem in deep learning as a mean-field optimal control problem and proved optimality conditions of both the Hamilton-Jacobi-Bellman type and the Pontryagin type.
Posted Content
Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs
Jonas Kubilius,Martin Schrimpf,Kohitij Kar,Ha Hong,Najib J. Majaj,Rishi Rajalingham,Elias B. Issa,Pouya Bashivan,Jonathan Prescott-Roy,Kailyn Schmidt,Aran Nayebi,Daniel M. Bear,Daniel L. K. Yamins,James J. DiCarlo +13 more
TL;DR: CORnet-S, a compact, recurrent ANN, is established, a shallow ANN with four anatomically mapped areas and recurrent connectivity, guided by Brain-Score, a new large-scale composite of neural and behavioral benchmarks for quantifying the functional fidelity of models of the primate ventral visual stream.
Posted Content
Multi-level Residual Networks from Dynamical Systems View
TL;DR: In this article, the authors adopt the dynamical systems point of view, and analyze the lesioning properties of ResNet both theoretically and experimentally, and propose a novel method for accelerating ResNet training.
References
More filters
Journal ArticleDOI
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Posted Content
Deep Residual Learning for Image Recognition
TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Posted Content
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe,Christian Szegedy +1 more
TL;DR: Batch Normalization as mentioned in this paper normalizes layer inputs for each training mini-batch to reduce the internal covariate shift in deep neural networks, and achieves state-of-the-art performance on ImageNet.
Dissertation
Learning Multiple Layers of Features from Tiny Images
TL;DR: In this paper, the authors describe how to train a multi-layer generative model of natural images, using a dataset of millions of tiny colour images, described in the next section.