Going deeper with convolutions

doi:10.1109/CVPR.2015.7298594

Open AccessProceedings ArticleDOI

Going deeper with convolutions

- pp 1-9

TLDR

Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

Abstract:

We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

MovieQA: Understanding Stories in Movies through Question-Answering

Makarand Tapaswi, +5 more

TL;DR: The MovieQA dataset as discussed by the authors consists of 14,944 questions about 408 movies with high semantic diversity, ranging from simpler "Who" did "What" to "Whom", to "Why" and "How" certain events occurred.

...read moreread less

Proceedings Article

Deep Double Descent: Where Bigger Models and More Data Hurt

Preetum Nakkiran, +5 more

TL;DR: The notion of model complexity allows us to identify certain regimes where increasing the number of train samples actually hurts test performance, and defines a new complexity measure called the effective model complexity and conjecture a generalized double descent with respect to this measure.

...read moreread less

Proceedings ArticleDOI

Multi-level Factorisation Net for Person Re-identification

Xiaobin Chang, +2 more

TL;DR: Multi-Level Factorization Net (MLFN) as discussed by the authors is a novel network architecture that factorises the visual appearance of a person into latent discriminative factors at multiple semantic levels without manual annotation.

...read moreread less

Proceedings ArticleDOI

Learning distributed representations of sentences from unlabelled data

Felix Hill, +2 more

TL;DR: In this article, a systematic comparison of models that learn distributed representations of words from unlabeled data is presented, and it is shown that shallow log-linear models work best for building representation spaces that can be decoded with simple spatial distance metrics.

...read moreread less

Journal ArticleDOI

Deep Multimodal Learning: A Survey on Recent Advances and Trends

Dhanesh Ramachandram, +1 more

- 09 Nov 2017 -

IEEE Signal Processing Magazine

TL;DR: This work first classify deep multimodal learning architectures and then discusses methods to fuse learned multi-modal representations in deep-learning architectures.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

Jia Deng, +5 more

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

Journal ArticleDOI

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

Journal ArticleDOI

Regression Shrinkage and Selection via the Lasso

Robert Tibshirani

- 01 Jan 1996 -

Journal of the royal statistical society...

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.

...read moreread less

Book ChapterDOI

Microsoft COCO: Common Objects in Context

Tsung-Yi Lin, +7 more

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.

...read moreread less

Collapse

Going deeper with convolutions

Citations

MovieQA: Understanding Stories in Movies through Question-Answering

Deep Double Descent: Where Bigger Models and More Data Hurt

Multi-level Factorisation Net for Person Re-identification

Learning distributed representations of sentences from unlabelled data

Deep Multimodal Learning: A Survey on Recent Advances and Trends

References

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet: A large-scale hierarchical image database

Gradient-based learning applied to document recognition

Regression Shrinkage and Selection via the Lasso

Microsoft COCO: Common Objects in Context

Related Papers (5)

Deep Residual Learning for Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

ImageNet: A large-scale hierarchical image database

Gradient-based learning applied to document recognition