Object Detectors Emerge in Deep Scene CNNs

Open AccessProceedings Article

Object Detectors Emerge in Deep Scene CNNs

Chats0

TLDR

This work demonstrates that the same network can perform both scene recognition and object localization in a single forward-pass, without ever having been explicitly taught the notion of objects.

Abstract:

With the success of new computational architectures for visual processing, such as convolutional neural networks (CNN) and access to image databases with millions of labeled examples (e.g., ImageNet, Places), the state of the art in computer vision is advancing rapidly. One important factor for continued progress is to understand the representations that are learned by the inner layers of these deep architectures. Here we show that object detectors emerge from training CNNs to perform scene classification. As scenes are composed of objects, the CNN for scene classification automatically discovers meaningful objects detectors, representative of the learned scene categories. With object detectors emerging as a result of learning to recognize scenes, our work demonstrates that the same network can perform both scene recognition and object localization in a single forward-pass, without ever having been explicitly taught the notion of objects.

Citations

PDF

Open Access

More filters

Book ChapterDOI

SSD: Single Shot MultiBox Detector

Wei Liu, +6 more

TL;DR: The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.

...read moreread less

Book ChapterDOI

SSD: Single Shot MultiBox Detector

Wei Liu, +6 more

- 08 Dec 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: SSD as mentioned in this paper discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, and combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes.

...read moreread less

Proceedings ArticleDOI

Pyramid Scene Parsing Network

Hengshuang Zhao, +4 more

TL;DR: This paper exploits the capability of global context information by different-region-based context aggregation through the pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet) to produce good quality results on the scene parsing task.

...read moreread less

Proceedings ArticleDOI

Deep Learning Face Attributes in the Wild

Ziwei Liu, +3 more

TL;DR: A novel deep learning framework for attribute prediction in the wild that cascades two CNNs, LNet and ANet, which are fine-tuned jointly with attribute tags, but pre-trained differently.

...read moreread less

Proceedings ArticleDOI

Learning Deep Features for Discriminative Localization

Bolei Zhou, +4 more

TL;DR: This work revisits the global average pooling layer proposed in [13], and sheds light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability despite being trained on imagelevel labels.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

Jia Deng, +5 more

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

Journal ArticleDOI

Learning Hierarchical Features for Scene Labeling

Clement Farabet, +3 more

- 01 Aug 2013 -

IEEE Transactions on Pattern Analysis an...

TL;DR: A method that uses a multiscale convolutional network trained from raw pixels to extract dense feature vectors that encode regions of multiple sizes centered on each pixel, alleviates the need for engineered features, and produces a powerful representation that captures texture, shape, and contextual information.

...read moreread less

Book ChapterDOI

Analyzing the Performance of Multilayer Neural Networks for Object Recognition

Pulkit Agrawal, +2 more

TL;DR: This paper experimentally probes several aspects of CNN feature learning in an attempt to help practitioners gain useful, evidence-backed intuitions about how to apply CNNs to computer vision problems.

...read moreread less

International Journal of Computer Vision

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

Going deeper with convolutions

Christian Szegedy, +8 more

Object Detectors Emerge in Deep Scene CNNs

Citations

SSD: Single Shot MultiBox Detector

SSD: Single Shot MultiBox Detector

Pyramid Scene Parsing Network

Deep Learning Face Attributes in the Wild

Learning Deep Features for Discriminative Localization

References

ImageNet: A large-scale hierarchical image database

Learning Hierarchical Features for Scene Labeling

Analyzing the Performance of Multilayer Neural Networks for Object Recognition

Related Papers (5)

Deep Residual Learning for Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet Large Scale Visual Recognition Challenge

Very Deep Convolutional Networks for Large-Scale Image Recognition

Going deeper with convolutions