SUN Database: Exploring a Large Collection of Scene Categories

doi:10.1007/S11263-014-0748-Y

Open AccessJournal ArticleDOI

SUN Database: Exploring a Large Collection of Scene Categories

Jianxiong Xiao, +4 more

- 01 Aug 2016 -

International Journal of Computer Vision

- Vol. 119, Iss: 1, pp 3-22

Chats0

TLDR

The Scene Understanding database is proposed, a nearly exhaustive collection of scenes categorized at the same level of specificity as human discourse that contains 908 distinct scene categories and 131,072 images.

Abstract:

Progress in scene understanding requires reasoning about the rich and diverse visual environments that make up our daily experience. To this end, we propose the Scene Understanding database, a nearly exhaustive collection of scenes categorized at the same level of specificity as human discourse. The database contains 908 distinct scene categories and 131,072 images. Given this data with both scene and object labels available, we perform in-depth analysis of co-occurrence statistics and the contextual relationship. To better understand this large scale taxonomy of scene categories, we perform two human experiments: we quantify human scene recognition accuracy, and we measure how typical each image is of its assigned scene category. Next, we perform computational experiments: scene recognition with global image features, indoor versus outdoor classification, and "scene detection," in which we relax the assumption that one image depicts only one scene category. Finally, we relate human experiments to machine performance and explore the relationship between human and machine recognition errors and the relationship between image "typicality" and machine recognition accuracy.

Figures

Fig. 22 For each feature, we plot the proportion of categories for which the largest incorrect (off-diagonal) confusion is the same category as the largest human confusion.

Fig. 7 Per-image object statistics in the SUN database.

Fig. 23 Selected SUN scene classification results using all features.

Fig. 1 Examples of scene categories in our SUN database.

Fig. 8 Average object scale compared to PASCAL VOC and ImageNet ILSVRC. SUN polygon is computed using the normalized area of the bounding polygon, and SUN box is computed using a the bounding box around the object. The blue text lists examples of object categories with very small areas in a typical image, with their average scale in per mil image pixels.

Fig. 14 Top row: SUN categories with the lowest human recognition rate. Below each of these categories, in the remaining three rows, are the most confusing classes for that category.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

ActivityNet: A large-scale video benchmark for human activity understanding

Fabian Caba Heilbron, +3 more

TL;DR: This paper introduces ActivityNet, a new large-scale video benchmark for human activity understanding that aims at covering a wide range of complex human activities that are of interest to people in their daily living.

...read moreread less

Journal ArticleDOI

Deep Learning for Generic Object Detection: A Survey

Li Liu, +7 more

- 01 Feb 2020 -

International Journal of Computer Vision

TL;DR: A comprehensive survey of the recent achievements in this field brought about by deep learning techniques, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics.

...read moreread less

Posted Content

YouTube-8M: A Large-Scale Video Classification Benchmark

Sami Abu-El-Haija, +6 more

- 27 Sep 2016 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: YouTube-8M is introduced, the largest multi-label video classification dataset, composed of ~8 million videos (500K hours of video), annotated with a vocabulary of 4800 visual entities, and various (modest) classification models are trained on the dataset.

...read moreread less

Posted Content

Object Detectors Emerge in Deep Scene CNNs

Bolei Zhou, +4 more

- 22 Dec 2014 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this paper, the authors show that object detectors emerge from training CNNs to perform scene classification, and demonstrate that the same network can perform both scene recognition and object localization in a single forward pass without ever having been explicitly taught the notion of objects.

...read moreread less

Proceedings ArticleDOI

COCO-Stuff: Thing and Stuff Classes in Context

Holger Caesar, +2 more

TL;DR: COCO-Stuff as mentioned in this paper augments all 164k images of the COCO 2017 dataset with pixel-wise annotations for 91 stuff classes, which leverages the original thing annotations.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

Jia Deng, +5 more

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

Proceedings ArticleDOI

Histograms of oriented gradients for human detection

Navneet Dalal, +1 more

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.

...read moreread less

Journal ArticleDOI

The Pascal Visual Object Classes (VOC) Challenge

Mark Everingham, +4 more

- 01 Jun 2010 -

International Journal of Computer Vision

TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.

...read moreread less

Journal ArticleDOI

Multiresolution gray-scale and rotation invariant texture classification with local binary patterns

Timo Ojala, +2 more

- 01 Jul 2002 -

IEEE Transactions on Pattern Analysis an...

TL;DR: A generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis.

...read moreread less

Journal ArticleDOI

WordNet : an electronic lexical database

Christiane Fellbaum

- 01 Sep 2000 -

Language

TL;DR: The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.

...read moreread less

Collapse

Related Papers (5)

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, +11 more

- 01 Dec 2015 -

International Journal of Computer Vision

SUN Database: Exploring a Large Collection of Scene Categories

Figures

Citations

ActivityNet: A large-scale video benchmark for human activity understanding

Deep Learning for Generic Object Detection: A Survey

YouTube-8M: A Large-Scale Video Classification Benchmark

Object Detectors Emerge in Deep Scene CNNs

COCO-Stuff: Thing and Stuff Classes in Context

References

ImageNet: A large-scale hierarchical image database

Histograms of oriented gradients for human detection

The Pascal Visual Object Classes (VOC) Challenge

Multiresolution gray-scale and rotation invariant texture classification with local binary patterns

WordNet : an electronic lexical database

Related Papers (5)

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet Large Scale Visual Recognition Challenge

ImageNet: A large-scale hierarchical image database

Deep Residual Learning for Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition