Rethinking the Inception Architecture for Computer Vision

doi:10.1109/CVPR.2016.308

Open AccessProceedings ArticleDOI

Rethinking the Inception Architecture for Computer Vision

Christian Szegedy, +4 more

- Vol. 2016, pp 2818-2826

Chats0

TLDR

In this article, the authors explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.

Abstract:

Convolutional networks are at the core of most state of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most tasks (as long as enough labeled data is provided for training), computational efficiency and low parameter count are still enabling factors for various use cases such as mobile vision and big-data scenarios. Here we are exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. We benchmark our methods on the ILSVRC 2012 classification challenge validation set demonstrate substantial gains over the state of the art: 21:2% top-1 and 5:6% top-5 error for single frame evaluation using a network with a computational cost of 5 billion multiply-adds per inference and with using less than 25 million parameters. With an ensemble of 4 models and multi-crop evaluation, we report 3:5% top-5 error and 17:3% top-1 error on the validation set and 3:6% top-5 error on the official test set.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Densely Connected Convolutional Networks

Gao Huang, +3 more

TL;DR: DenseNet as mentioned in this paper proposes to connect each layer to every other layer in a feed-forward fashion, which can alleviate the vanishing gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.

...read moreread less

Book ChapterDOI

SSD: Single Shot MultiBox Detector

Wei Liu, +6 more

TL;DR: The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.

...read moreread less

Journal ArticleDOI

Squeeze-and-Excitation Networks

Jie Hu, +4 more

TL;DR: This work proposes a novel architectural unit, which is term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and finds that SE blocks produce significant performance improvements for existing state-of-the-art deep architectures at minimal additional computational cost.

...read moreread less

Posted Content

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Andrew Howard, +7 more

- 17 Apr 2017 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work introduces two simple global hyper-parameters that efficiently trade off between latency and accuracy and demonstrates the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo-localization.

...read moreread less

Proceedings ArticleDOI

Xception: Deep Learning with Depthwise Separable Convolutions

François Chollet

TL;DR: This work proposes a novel deep convolutional neural network architecture inspired by Inception, where Inception modules have been replaced with depthwise separable convolutions, and shows that this architecture, dubbed Xception, slightly outperforms Inception V3 on the ImageNet dataset, and significantly outperforms it on a larger image classification dataset.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book ChapterDOI

Learning a Deep Convolutional Network for Image Super-Resolution

Chao Dong, +3 more

TL;DR: This work proposes a deep learning method for single image super-resolution (SR) that directly learns an end-to-end mapping between the low/high-resolution images and shows that traditional sparse-coding-based SR methods can also be viewed as a deep convolutional network.

...read moreread less

Proceedings Article

On the importance of initialization and momentum in deep learning

Ilya Sutskever, +3 more

TL;DR: It is shown that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs to levels of performance that were previously achievable only with Hessian-Free optimization.

...read moreread less

Proceedings ArticleDOI

DeepPose: Human Pose Estimation via Deep Neural Networks

Alexander Toshev, +1 more

TL;DR: The pose estimation is formulated as a DNN-based regression problem towards body joints and a cascade of such DNN regres- sors which results in high precision pose estimates.

...read moreread less

Proceedings Article

On the difficulty of training recurrent neural networks

Razvan Pascanu, +2 more

TL;DR: In this article, a gradient norm clipping strategy is proposed to deal with the vanishing and exploding gradient problems in recurrent neural networks. But the proposed solution is limited to the case of RNNs.

...read moreread less

Proceedings ArticleDOI

Scalable Object Detection Using Deep Neural Networks

Dumitru Erhan, +3 more

TL;DR: This work proposes a saliency-inspired neural network model for detection, which predicts a set of class-agnostic bounding boxes along with a single score for each box, corresponding to its likelihood of containing any object of interest.

...read moreread less

Rethinking the Inception Architecture for Computer Vision

Citations

Densely Connected Convolutional Networks

SSD: Single Shot MultiBox Detector

Squeeze-and-Excitation Networks

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Xception: Deep Learning with Depthwise Separable Convolutions

References

Learning a Deep Convolutional Network for Image Super-Resolution

On the importance of initialization and momentum in deep learning

DeepPose: Human Pose Estimation via Deep Neural Networks

On the difficulty of training recurrent neural networks

Scalable Object Detection Using Deep Neural Networks

Related Papers (5)

Deep Residual Learning for Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

Going deeper with convolutions

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet: A large-scale hierarchical image database