SUN Database: Exploring a Large Collection of Scene Categories
Reads0
Chats0
TLDR
The Scene Understanding database is proposed, a nearly exhaustive collection of scenes categorized at the same level of specificity as human discourse that contains 908 distinct scene categories and 131,072 images.Abstract:
Progress in scene understanding requires reasoning about the rich and diverse visual environments that make up our daily experience. To this end, we propose the Scene Understanding database, a nearly exhaustive collection of scenes categorized at the same level of specificity as human discourse. The database contains 908 distinct scene categories and 131,072 images. Given this data with both scene and object labels available, we perform in-depth analysis of co-occurrence statistics and the contextual relationship. To better understand this large scale taxonomy of scene categories, we perform two human experiments: we quantify human scene recognition accuracy, and we measure how typical each image is of its assigned scene category. Next, we perform computational experiments: scene recognition with global image features, indoor versus outdoor classification, and "scene detection," in which we relax the assumption that one image depicts only one scene category. Finally, we relate human experiments to machine performance and explore the relationship between human and machine recognition errors and the relationship between image "typicality" and machine recognition accuracy.read more
Citations
More filters
Proceedings ArticleDOI
ActivityNet: A large-scale video benchmark for human activity understanding
TL;DR: This paper introduces ActivityNet, a new large-scale video benchmark for human activity understanding that aims at covering a wide range of complex human activities that are of interest to people in their daily living.
Journal ArticleDOI
Deep Learning for Generic Object Detection: A Survey
Li Liu,Li Liu,Wanli Ouyang,Xiaogang Wang,Paul Fieguth,Jie Chen,Xinwang Liu,Matti Pietikäinen +7 more
TL;DR: A comprehensive survey of the recent achievements in this field brought about by deep learning techniques, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics.
Posted Content
YouTube-8M: A Large-Scale Video Classification Benchmark
Sami Abu-El-Haija,Nisarg Dilipkumar Kothari,Joonseok Lee,Apostol Natsev,George Toderici,Balakrishnan Varadarajan,Sudheendra Vijayanarasimhan +6 more
TL;DR: YouTube-8M is introduced, the largest multi-label video classification dataset, composed of ~8 million videos (500K hours of video), annotated with a vocabulary of 4800 visual entities, and various (modest) classification models are trained on the dataset.
Posted Content
Object Detectors Emerge in Deep Scene CNNs
TL;DR: In this paper, the authors show that object detectors emerge from training CNNs to perform scene classification, and demonstrate that the same network can perform both scene recognition and object localization in a single forward pass without ever having been explicitly taught the notion of objects.
Proceedings ArticleDOI
COCO-Stuff: Thing and Stuff Classes in Context
TL;DR: COCO-Stuff as mentioned in this paper augments all 164k images of the COCO 2017 dataset with pixel-wise annotations for 91 stuff classes, which leverages the original thing annotations.
References
More filters
Proceedings ArticleDOI
ImageNet: A large-scale hierarchical image database
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Proceedings ArticleDOI
Histograms of oriented gradients for human detection
Navneet Dalal,Bill Triggs +1 more
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Journal ArticleDOI
The Pascal Visual Object Classes (VOC) Challenge
TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.
Journal ArticleDOI
Multiresolution gray-scale and rotation invariant texture classification with local binary patterns
TL;DR: A generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis.
Journal ArticleDOI
WordNet : an electronic lexical database
TL;DR: The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.