scispace - formally typeset
Open AccessJournal ArticleDOI

SUN Database: Exploring a Large Collection of Scene Categories

Reads0
Chats0
TLDR
The Scene Understanding database is proposed, a nearly exhaustive collection of scenes categorized at the same level of specificity as human discourse that contains 908 distinct scene categories and 131,072 images.
Abstract
Progress in scene understanding requires reasoning about the rich and diverse visual environments that make up our daily experience. To this end, we propose the Scene Understanding database, a nearly exhaustive collection of scenes categorized at the same level of specificity as human discourse. The database contains 908 distinct scene categories and 131,072 images. Given this data with both scene and object labels available, we perform in-depth analysis of co-occurrence statistics and the contextual relationship. To better understand this large scale taxonomy of scene categories, we perform two human experiments: we quantify human scene recognition accuracy, and we measure how typical each image is of its assigned scene category. Next, we perform computational experiments: scene recognition with global image features, indoor versus outdoor classification, and "scene detection," in which we relax the assumption that one image depicts only one scene category. Finally, we relate human experiments to machine performance and explore the relationship between human and machine recognition errors and the relationship between image "typicality" and machine recognition accuracy.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

ActivityNet: A large-scale video benchmark for human activity understanding

TL;DR: This paper introduces ActivityNet, a new large-scale video benchmark for human activity understanding that aims at covering a wide range of complex human activities that are of interest to people in their daily living.
Journal ArticleDOI

Deep Learning for Generic Object Detection: A Survey

TL;DR: A comprehensive survey of the recent achievements in this field brought about by deep learning techniques, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics.
Posted Content

YouTube-8M: A Large-Scale Video Classification Benchmark

TL;DR: YouTube-8M is introduced, the largest multi-label video classification dataset, composed of ~8 million videos (500K hours of video), annotated with a vocabulary of 4800 visual entities, and various (modest) classification models are trained on the dataset.
Posted Content

Object Detectors Emerge in Deep Scene CNNs

TL;DR: In this paper, the authors show that object detectors emerge from training CNNs to perform scene classification, and demonstrate that the same network can perform both scene recognition and object localization in a single forward pass without ever having been explicitly taught the notion of objects.
Proceedings ArticleDOI

COCO-Stuff: Thing and Stuff Classes in Context

TL;DR: COCO-Stuff as mentioned in this paper augments all 164k images of the COCO 2017 dataset with pixel-wise annotations for 91 stuff classes, which leverages the original thing annotations.
References
More filters
Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Proceedings ArticleDOI

Histograms of oriented gradients for human detection

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Journal ArticleDOI

The Pascal Visual Object Classes (VOC) Challenge

TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.
Journal ArticleDOI

Multiresolution gray-scale and rotation invariant texture classification with local binary patterns

TL;DR: A generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis.
Journal ArticleDOI

WordNet : an electronic lexical database

Christiane Fellbaum
- 01 Sep 2000 - 
TL;DR: The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.
Related Papers (5)