Spatial Memory for Context Reasoning in Object Detection

doi:10.1109/ICCV.2017.440

Open AccessProceedings ArticleDOI

Spatial Memory for Context Reasoning in Object Detection

Xinlei Chen, +1 more

- pp 4106-4116

Chats0

TLDR

Spatial Memory Network (SMN) as mentioned in this paper assembles object instances back into a pseudo-image representation that is easy to be fed into another ConvNet for object-object context reasoning.

Abstract:

Modeling instance-level context and object-object relationships is extremely challenging. It requires reasoning about bounding boxes of different classes, locations etc. Above all, instance-level spatial reasoning inherently requires modeling conditional distributions on previous detections. Unfortunately, our current object detection systems do not have any memory to remember what to condition on! The state-of-the-art object detectors still detect all object in parallel followed by non-maximal suppression (NMS). While memory has been used for tasks such as captioning, they mostly use image-level memory cells without capturing the spatial layout. On the other hand, modeling object-object relationships requires spatial reasoning – not only do we need a memory to store the spatial layout, but also a effective reasoning module to extract spatial patterns. This paper presents a conceptually simple yet powerful solution – Spatial Memory Network (SMN), to model the instance-level context efficiently and effectively. Our spatial memory essentially assembles object instances back into a pseudo “image” representation that is easy to be fed into another ConvNet for object-object context reasoning. This leads to a new sequential reasoning architecture where image and memory are processed in parallel to obtain detections which update the memory again. We show our SMN direction is promising as it provides 2.2% improvement over baseline Faster RCNN on the COCO dataset with VGG161.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Deep Learning for Generic Object Detection: A Survey

Li Liu, +7 more

- 01 Feb 2020 -

International Journal of Computer Vision

TL;DR: A comprehensive survey of the recent achievements in this field brought about by deep learning techniques, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics.

...read moreread less

Proceedings ArticleDOI

Relation Networks for Object Detection

Han Hu, +4 more

TL;DR: In this article, the authors propose an object relation module to model relations between objects, which is shown effective on improving object recognition and duplicate removal steps in the modern object detection pipeline.

...read moreread less

Journal ArticleDOI

Recent Advances in Deep Learning for Object Detection

Xiongwei Wu, +3 more

- 05 Jul 2020 -

Neurocomputing

TL;DR: A comprehensive survey of recent advances in visual object detection with deep learning can be found in this article, where the authors systematically analyze the existing object detection frameworks and organize the survey into three major parts: detection components, learning strategies, and applications and benchmarks.

...read moreread less

Posted Content

Relation Networks for Object Detection

Han Hu, +4 more

- 30 Nov 2017 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: An object relation module is proposed that processes a set of objects simultaneously through interaction between their appearance feature and geometry, thus allowing modeling of their relations, which gives rise to the first fully end-to-end object detector.

...read moreread less

Journal ArticleDOI

Recent advances in small object detection based on deep learning: A review

Kang Tong, +2 more

- 01 May 2020 -

Image and Vision Computing

TL;DR: This work comprehensively review the existing small object detection methods based on deep learning from five aspects, including multi-scale feature learning, data augmentation, training strategy, context-based detection and GAN- based detection.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Where to Look: Focus Regions for Visual Question Answering

Kevin J. Shih, +2 more

TL;DR: A method that learns to answer visual questions by selecting image regions relevant to the text-based query that exhibits significant improvements in answering questions such as "what color", where it is necessary to evaluate a specific location, and "what room," where it selectively identifies informative image regions.

...read moreread less

Proceedings ArticleDOI

End-to-End People Detection in Crowded Scenes

Russell Stewart, +2 more

TL;DR: This work proposes a model that is based on decoding an image into a set of people detections, which takes an image as input and directly outputs aset of distinct detection hypotheses.

...read moreread less

Proceedings ArticleDOI

Active Object Localization with Deep Reinforcement Learning

Juan C. Caicedo, +1 more

TL;DR: In this paper, an active detection model is proposed for localizing objects in scenes, which allows an agent to focus attention on candidate regions for identifying the correct location of a target object.

...read moreread less

Book ChapterDOI

Grounding of Textual Phrases in Images by Reconstruction

Anna Rohrbach, +5 more

TL;DR: A novel approach which learns grounding by reconstructing a given phrase using an attention mechanism, which can be either latent or optimized directly, and demonstrates the effectiveness on the Flickr 30k Entities and ReferItGame datasets.

...read moreread less

Proceedings Article

Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes

Kevin Murphy, +2 more

TL;DR: This work presents a conditional random field for jointly solving the tasks of object detection and scene classification, and proposes to use the scene context as an extra source of (global) information, to help resolve local ambiguities.

...read moreread less

Collapse

International Journal of Computer Vision

Spatial Memory for Context Reasoning in Object Detection

Citations

Deep Learning for Generic Object Detection: A Survey

Relation Networks for Object Detection

Recent Advances in Deep Learning for Object Detection

Relation Networks for Object Detection

Recent advances in small object detection based on deep learning: A review

References

Where to Look: Focus Regions for Visual Question Answering

End-to-End People Detection in Crowded Scenes

Active Object Localization with Deep Reinforcement Learning

Grounding of Textual Phrases in Images by Reconstruction

Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes

Related Papers (5)

Deep Residual Learning for Image Recognition

Microsoft COCO: Common Objects in Context

SSD: Single Shot MultiBox Detector

Feature Pyramid Networks for Object Detection

The Pascal Visual Object Classes (VOC) Challenge