Image retrieval using scene graphs

doi:10.1109/CVPR.2015.7298990

Proceedings ArticleDOI

Image retrieval using scene graphs

- pp 3668-3678

TLDR

A conditional random field model that reasons about possible groundings of scene graphs to test images and shows that the full model can be used to improve object localization compared to baseline methods and outperforms retrieval methods that use only objects or low-level image features.

Abstract:

This paper develops a novel framework for semantic image retrieval based on the notion of a scene graph. Our scene graphs represent objects (“man”, “boat”), attributes of objects (“boat is white”) and relationships between objects (“man standing on boat”). We use these scene graphs as queries to retrieve semantically related images. To this end, we design a conditional random field model that reasons about possible groundings of scene graphs to test images. The likelihoods of these groundings are used as ranking scores for retrieval. We introduce a novel dataset of 5,000 human-generated scene graphs grounded to images and use this dataset to evaluate our method for image retrieval. In particular, we evaluate retrieval using full scene graphs and small scene subgraphs, and show that our method outperforms retrieval methods that use only objects or low-level image features. In addition, we show that our full model can be used to improve object localization compared to baseline methods.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Ranjay Krishna, +11 more

- 01 May 2017 -

International Journal of Computer Vision

TL;DR: The Visual Genome dataset as mentioned in this paper contains over 108k images where each image has an average of $35$35 objects, $26$26 attributes, and $21$21 pairwise relationships between objects.

...read moreread less

Journal ArticleDOI

Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age

Cesar Cadena, +7 more

- 01 Dec 2016 -

IEEE Transactions on Robotics

TL;DR: Simultaneous localization and mapping (SLAM) as mentioned in this paper consists in the concurrent construction of a model of the environment (the map), and the estimation of the state of the robot moving within it.

...read moreread less

Journal ArticleDOI

Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age

Cesar Cadena, +7 more

- 19 Jun 2016 -

arXiv: Robotics

TL;DR: What is now the de-facto standard formulation for SLAM is presented, covering a broad set of topics including robustness and scalability in long-term mapping, metric and semantic representations for mapping, theoretical performance guarantees, active SLAM and exploration, and other new frontiers.

...read moreread less

Posted Content

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Ranjay Krishna, +11 more

- 23 Feb 2016 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: The Visual Genome dataset is presented, which contains over 108K images where each image has an average of $$35$$35 objects, $$26$$26 attributes, and $$21$$21 pairwise relationships between objects, and represents the densest and largest dataset of image descriptions, objects, attributes, relationships, and question answer pairs.

...read moreread less

Proceedings ArticleDOI

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

Justin Johnson, +5 more

TL;DR: In this paper, the authors present a diagnostic dataset that tests a range of visual reasoning abilities and provides insights into their abilities and limitations, and use this dataset to analyze a variety of modern visual reasoning systems.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

Jia Deng, +5 more

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

Journal ArticleDOI

Distinctive Image Features from Scale-Invariant Keypoints

David G. Lowe

- 01 Nov 2004 -

International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, +11 more

- 01 Dec 2015 -

International Journal of Computer Vision

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.

...read moreread less

Book ChapterDOI

Microsoft COCO: Common Objects in Context

Tsung-Yi Lin, +7 more

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.

...read moreread less

Collapse

Related Papers (5)

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Ranjay Krishna, +11 more

- 01 May 2017 -

International Journal of Computer Vision

Image retrieval using scene graphs

Citations

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age

Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

References

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet: A large-scale hierarchical image database

Distinctive Image Features from Scale-Invariant Keypoints

ImageNet Large Scale Visual Recognition Challenge

Microsoft COCO: Common Objects in Context

Related Papers (5)

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Deep Residual Learning for Image Recognition

Microsoft COCO: Common Objects in Context

Faster R-CNN: towards real-time object detection with region proposal networks

Deep visual-semantic alignments for generating image descriptions