Deep Networks with Stochastic Depth

Open AccessPosted Content

Deep Networks with Stochastic Depth

- 30 Mar 2016 -

TLDR

Stochastic depth as discussed by the authors randomly drops a subset of layers during training and bypasses them with the identity function, which can increase the depth of residual networks even beyond 1200 layers and still yield meaningful improvements in test error.

Abstract:

Very deep convolutional networks with hundreds of layers have led to significant reductions in error on competitive benchmarks. Although the unmatched expressiveness of the many layers can be highly desirable at test time, training very deep networks comes with its own set of challenges. The gradients can vanish, the forward flow often diminishes, and the training time can be painfully slow. To address these problems, we propose stochastic depth, a training procedure that enables the seemingly contradictory setup to train short networks and use deep networks at test time. We start with very deep networks but during training, for each mini-batch, randomly drop a subset of layers and bypass them with the identity function. This simple approach complements the recent success of residual networks. It reduces training time substantially and improves the test error significantly on almost all data sets that we used for evaluation. With stochastic depth we can increase the depth of residual networks even beyond 1200 layers and still yield meaningful improvements in test error (4.91% on CIFAR-10).

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Squeeze-and-Excitation Networks

Jie Hu, +4 more

TL;DR: This work proposes a novel architectural unit, which is term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and finds that SE blocks produce significant performance improvements for existing state-of-the-art deep architectures at minimal additional computational cost.

...read moreread less

Posted Content

CBAM: Convolutional Block Attention Module

Sanghyun Woo, +3 more

- 17 Jul 2018 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: The proposed Convolutional Block Attention Module (CBAM), a simple yet effective attention module for feed-forward convolutional neural networks, can be integrated into any CNN architectures seamlessly with negligible overheads and is end-to-end trainable along with base CNNs.

...read moreread less

Proceedings ArticleDOI

Learning Transferable Architectures for Scalable Image Recognition

Barret Zoph, +3 more

TL;DR: NASNet as discussed by the authors proposes to search for an architectural building block on a small dataset and then transfer the block to a larger dataset, which enables transferability and achieves state-of-the-art performance.

...read moreread less

Posted Content

Wide Residual Networks

Sergey Zagoruyko, +1 more

- 23 May 2016 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Wide residual networks (WRNs) as mentioned in this paper decrease the depth and increase the width of residual networks, which achieves state-of-the-art results on CIFAR, SVHN, and ImageNet.

...read moreread less

Posted Content

SGDR: Stochastic Gradient Descent with Warm Restarts

Ilya Loshchilov, +1 more

- 13 Aug 2016 -

arXiv: Learning

TL;DR: In this paper, a simple warm restart technique for stochastic gradient descent was proposed to improve its anytime performance when training deep neural networks, which achieved state-of-the-art results on both the CIFAR-10 and CifAR-100 datasets.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

Jia Deng, +5 more

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

Collapse

Journal of Machine Learning Research

Going deeper with convolutions

Christian Szegedy, +8 more

Densely Connected Convolutional Networks

Gao Huang, +3 more

Deep Networks with Stochastic Depth

Citations

Squeeze-and-Excitation Networks

CBAM: Convolutional Block Attention Module

Learning Transferable Architectures for Scalable Image Recognition

Wide Residual Networks

SGDR: Stochastic Gradient Descent with Warm Restarts

References

Deep Residual Learning for Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

ImageNet: A large-scale hierarchical image database

Related Papers (5)

Deep Residual Learning for Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

Dropout: a simple way to prevent neural networks from overfitting

Going deeper with convolutions

Densely Connected Convolutional Networks