Inferring and Executing Programs for Visual Reasoning

doi:10.1109/ICCV.2017.325

Open AccessProceedings ArticleDOI

Inferring and Executing Programs for Visual Reasoning

Justin Johnson, +6 more

- pp 3008-3017

Chats0

TLDR

In this article, the authors propose a model for visual reasoning that consists of a program generator that constructs an explicit representation of the reasoning process to be performed, and an execution engine that executes the resulting program to produce an answer.

Abstract:

Existing methods for visual reasoning attempt to directly map inputs to outputs using black-box architectures without explicitly modeling the underlying reasoning processes. As a result, these black-box models often learn to exploit biases in the data rather than learning to perform visual reasoning. Inspired by module networks, this paper proposes a model for visual reasoning that consists of a program generator that constructs an explicit representation of the reasoning process to be performed, and an execution engine that executes the resulting program to produce an answer. Both the program generator and the execution engine are implemented by neural networks, and are trained using a combination of backpropagation and REINFORCE. Using the CLEVR benchmark for visual reasoning, we show that our model significantly outperforms strong baselines and generalizes better in a variety of settings.

Citations

PDF

Open Access

More filters

Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach

Georgios Tziafas, +1 more

TL;DR: In this article , a neuro-symbolic architecture for coupling language-guided visual reasoning with robot manipulation is presented, where a non-expert human user can prompt the robot using unconstrained natural language, providing a referring expression (this article), a question (VQA), or a grasp action instruction.

...read moreread less

Benchmarking Counterfactual Reasoning Abilities about Implicit Physical Properties

Maitreya Patel, +3 more

TL;DR: In this paper , the authors introduce a new video question-answering task for reasoning about the implicit physical properties of objects in a scene, from videos, and they introduce a dataset, CRIPP-VQA 1 , which contains videos of objects, annotated with hypothetical/counterfactual questions about the effect of actions (such as removing, adding, or replacing objects), questions about planning (choosing actions to perform in order to reach a particular goal).

...read moreread less

Proceedings Article

Neural Analogical Reasoning

Atharv Sonwane, +4 more

TL;DR: Neural analogical reasoning (NAR) as discussed by the authors is a modular approach where elementary neural transformations operate and compose on distributed representations of high-dimensional inputs, which can be viewed as program synthesis task and solved via symbolic search if represented in symbolic form.

...read moreread less

Book ChapterDOI

LRRA:A Transparent Neural-Symbolic Reasoning Framework for Real-World Visual Question Answering

Zhang Wan, +4 more

TL;DR: LRRA as discussed by the authors is a transparent neural-symbolic framework for visual question answering that solves the complicated problem in the real world step-by-step like humans and provides human-readable form of justification at each step.

...read moreread less

Posted Content

Deep Algorithmic Question Answering: Towards a Compositionally Hybrid AI for Algorithmic Reasoning

Kwabena Nuamah

TL;DR: Deep Algorithmic Question Answering (DAQA) as discussed by the authors uses a hybrid approach of symbolic and sub-symbolic methods including deep neural networks to reason in a step-by-step "algorithmic" manner.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings ArticleDOI

Fast R-CNN

Ross Girshick

TL;DR: Fast R-CNN as discussed by the authors proposes a Fast Region-based Convolutional Network method for object detection, which employs several innovations to improve training and testing speed while also increasing detection accuracy and achieves a higher mAP on PASCAL VOC 2012.

...read moreread less

Proceedings Article

Sequence to Sequence Learning with Neural Networks

Ilya Sutskever, +2 more

TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.

...read moreread less

Collapse

Inferring and Executing Programs for Visual Reasoning

Citations

Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach

Benchmarking Counterfactual Reasoning Abilities about Implicit Physical Properties

Neural Analogical Reasoning

LRRA:A Transparent Neural-Symbolic Reasoning Framework for Real-World Visual Question Answering

Deep Algorithmic Question Answering: Towards a Compositionally Hybrid AI for Algorithmic Reasoning

References

Deep Residual Learning for Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

Long short-term memory

Fast R-CNN

Sequence to Sequence Learning with Neural Networks

Related Papers (5)

Deep Residual Learning for Image Recognition

VQA: Visual Question Answering

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

Stacked Attention Networks for Image Question Answering

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering