scispace - formally typeset
Open AccessProceedings ArticleDOI

Rethinking the Inception Architecture for Computer Vision

Reads0
Chats0
TLDR
In this article, the authors explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.
Abstract
Convolutional networks are at the core of most state of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most tasks (as long as enough labeled data is provided for training), computational efficiency and low parameter count are still enabling factors for various use cases such as mobile vision and big-data scenarios. Here we are exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. We benchmark our methods on the ILSVRC 2012 classification challenge validation set demonstrate substantial gains over the state of the art: 21:2% top-1 and 5:6% top-5 error for single frame evaluation using a network with a computational cost of 5 billion multiply-adds per inference and with using less than 25 million parameters. With an ensemble of 4 models and multi-crop evaluation, we report 3:5% top-5 error and 17:3% top-1 error on the validation set and 3:6% top-5 error on the official test set.

read more

Citations
More filters
Proceedings ArticleDOI

Densely Connected Convolutional Networks

TL;DR: DenseNet as mentioned in this paper proposes to connect each layer to every other layer in a feed-forward fashion, which can alleviate the vanishing gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.
Book ChapterDOI

SSD: Single Shot MultiBox Detector

TL;DR: The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.
Journal ArticleDOI

Squeeze-and-Excitation Networks

TL;DR: This work proposes a novel architectural unit, which is term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and finds that SE blocks produce significant performance improvements for existing state-of-the-art deep architectures at minimal additional computational cost.
Posted Content

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

TL;DR: This work introduces two simple global hyper-parameters that efficiently trade off between latency and accuracy and demonstrates the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo-localization.
Proceedings ArticleDOI

Xception: Deep Learning with Depthwise Separable Convolutions

TL;DR: This work proposes a novel deep convolutional neural network architecture inspired by Inception, where Inception modules have been replaced with depthwise separable convolutions, and shows that this architecture, dubbed Xception, slightly outperforms Inception V3 on the ImageNet dataset, and significantly outperforms it on a larger image classification dataset.
References
More filters
Proceedings Article

Learning a Deep Compact Image Representation for Visual Tracking

TL;DR: Comparison with the state-of-the-art trackers on some challenging benchmark video sequences shows that the deep learning tracker is more accurate while maintaining low computational cost with real-time performance when the MATLAB implementation of the tracker is used with a modest graphics processing unit (GPU).
Proceedings ArticleDOI

Fast Algorithms for Convolutional Neural Networks

TL;DR: A new class of fast algorithms for convolutional neural networks is introduced using Winograd's minimal filtering algorithms, which compute minimal complexity convolution over small tiles, which makes them fast with small filters and small batch sizes.
Proceedings Article

Compressing Neural Networks with the Hashing Trick

TL;DR: HashedNets as discussed by the authors uses a hash function to randomly group connection weights into hash buckets, and all connections within the same hash bucket share a single parameter value, which can be tuned to adjust to the weight sharing architecture with standard backprop during training.
Proceedings ArticleDOI

Ontological supervision for fine grained classification of Street View storefronts

TL;DR: This work utilizes an ontology of geographical concepts to automatically propagate business category information and create a large, multi label, training dataset for fine grained storefront classification and achieves human level accuracy.
Journal ArticleDOI

SVD-NET: an algorithm that automatically selects network structure

TL;DR: An algorithm is developed for training feedforward neural networks that uses singular value decomposition (SVD) to identify and eliminate redundant hidden nodes, producing models that generalize better and thus eliminate the need of using cross-validation to avoid overfitting.
Related Papers (5)