Rethinking the Inception Architecture for Computer Vision

doi:10.1109/CVPR.2016.308

Open AccessProceedings ArticleDOI

Rethinking the Inception Architecture for Computer Vision

Christian Szegedy, +4 more

- Vol. 2016, pp 2818-2826

Chats0

TLDR

In this article, the authors explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.

Abstract:

Convolutional networks are at the core of most state of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most tasks (as long as enough labeled data is provided for training), computational efficiency and low parameter count are still enabling factors for various use cases such as mobile vision and big-data scenarios. Here we are exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. We benchmark our methods on the ILSVRC 2012 classification challenge validation set demonstrate substantial gains over the state of the art: 21:2% top-1 and 5:6% top-5 error for single frame evaluation using a network with a computational cost of 5 billion multiply-adds per inference and with using less than 25 million parameters. With an ensemble of 4 models and multi-crop evaluation, we report 3:5% top-5 error and 17:3% top-1 error on the validation set and 3:6% top-5 error on the official test set.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Densely Connected Convolutional Networks

Gao Huang, +3 more

TL;DR: DenseNet as mentioned in this paper proposes to connect each layer to every other layer in a feed-forward fashion, which can alleviate the vanishing gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.

...read moreread less

Book ChapterDOI

SSD: Single Shot MultiBox Detector

Wei Liu, +6 more

TL;DR: The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.

...read moreread less

Journal ArticleDOI

Squeeze-and-Excitation Networks

Jie Hu, +4 more

TL;DR: This work proposes a novel architectural unit, which is term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and finds that SE blocks produce significant performance improvements for existing state-of-the-art deep architectures at minimal additional computational cost.

...read moreread less

Posted Content

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Andrew Howard, +7 more

- 17 Apr 2017 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work introduces two simple global hyper-parameters that efficiently trade off between latency and accuracy and demonstrates the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo-localization.

...read moreread less

Proceedings ArticleDOI

Xception: Deep Learning with Depthwise Separable Convolutions

François Chollet

TL;DR: This work proposes a novel deep convolutional neural network architecture inspired by Inception, where Inception modules have been replaced with depthwise separable convolutions, and shows that this architecture, dubbed Xception, slightly outperforms Inception V3 on the ImageNet dataset, and significantly outperforms it on a larger image classification dataset.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Learning a Deep Compact Image Representation for Visual Tracking

Naiyan Wang, +1 more

TL;DR: Comparison with the state-of-the-art trackers on some challenging benchmark video sequences shows that the deep learning tracker is more accurate while maintaining low computational cost with real-time performance when the MATLAB implementation of the tracker is used with a modest graphics processing unit (GPU).

...read moreread less

Proceedings ArticleDOI

Fast Algorithms for Convolutional Neural Networks

Andrew J. Lavin, +1 more

TL;DR: A new class of fast algorithms for convolutional neural networks is introduced using Winograd's minimal filtering algorithms, which compute minimal complexity convolution over small tiles, which makes them fast with small filters and small batch sizes.

...read moreread less

Proceedings Article

Compressing Neural Networks with the Hashing Trick

Wenlin Chen, +5 more

TL;DR: HashedNets as discussed by the authors uses a hash function to randomly group connection weights into hash buckets, and all connections within the same hash bucket share a single parameter value, which can be tuned to adjust to the weight sharing architecture with standard backprop during training.

...read moreread less

Proceedings ArticleDOI

Ontological supervision for fine grained classification of Street View storefronts

Yair Movshovitz-Attias, +5 more

TL;DR: This work utilizes an ontology of geographical concepts to automatically propagate business category information and create a large, multi label, training dataset for fine grained storefront classification and achieves human level accuracy.

...read moreread less

Journal ArticleDOI

SVD-NET: an algorithm that automatically selects network structure

Dimitris C. Psichogios, +1 more

- 01 May 1994 -

IEEE Transactions on Neural Networks

TL;DR: An algorithm is developed for training feedforward neural networks that uses singular value decomposition (SVD) to identify and eliminate redundant hidden nodes, producing models that generalize better and thus eliminate the need of using cross-validation to avoid overfitting.

...read moreread less

Rethinking the Inception Architecture for Computer Vision

Citations

Densely Connected Convolutional Networks

SSD: Single Shot MultiBox Detector

Squeeze-and-Excitation Networks

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Xception: Deep Learning with Depthwise Separable Convolutions

References

Learning a Deep Compact Image Representation for Visual Tracking

Fast Algorithms for Convolutional Neural Networks

Compressing Neural Networks with the Hashing Trick

Ontological supervision for fine grained classification of Street View storefronts

SVD-NET: an algorithm that automatically selects network structure

Related Papers (5)

Deep Residual Learning for Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

Going deeper with convolutions

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet: A large-scale hierarchical image database