scispace - formally typeset
Open AccessProceedings ArticleDOI

Inferring and Executing Programs for Visual Reasoning

Reads0
Chats0
TLDR
In this article, the authors propose a model for visual reasoning that consists of a program generator that constructs an explicit representation of the reasoning process to be performed, and an execution engine that executes the resulting program to produce an answer.
Abstract
Existing methods for visual reasoning attempt to directly map inputs to outputs using black-box architectures without explicitly modeling the underlying reasoning processes. As a result, these black-box models often learn to exploit biases in the data rather than learning to perform visual reasoning. Inspired by module networks, this paper proposes a model for visual reasoning that consists of a program generator that constructs an explicit representation of the reasoning process to be performed, and an execution engine that executes the resulting program to produce an answer. Both the program generator and the execution engine are implemented by neural networks, and are trained using a combination of backpropagation and REINFORCE. Using the CLEVR benchmark for visual reasoning, we show that our model significantly outperforms strong baselines and generalizes better in a variety of settings.

read more

Citations
More filters
Book ChapterDOI

A Background Reasoning Framework for External Force Damage Detection in Distribution Network

TL;DR: In this article , a method for the external hidden dangers of transmission lines detection approach based on objection detection and background reasoning is proposed, and the experimental results demonstrate that the proposed method can provide a relatively stable and efficient external hidden danger detection result for distribution network.
Journal ArticleDOI

Infusing Lattice Symmetry Priors in Attention Mechanisms for Sample-Efficient Abstract Geometric Reasoning

TL;DR: Latformer as discussed by the authors incorporates lattice symmetry priors in attention masks and shows that for any transformation of the hypercubic lattice, there exists a binary attention mask that implements that group action.

A Confidence-based Multipath Neural-symbolic Approach for Visual Question Answering

Yajie Bao, +1 more
TL;DR: Zhang et al. as discussed by the authors proposed a confidence-based neural-symbolic (CBNS) framework to evaluate the confidence of the NN modules based on uncertainty quantification and make inferences based on the confidence evaluations.
Posted Content

Improved RAMEN: Towards Domain Generalization for Visual Question Answering.

TL;DR: In this article, two major improvements to the early/late fusion module and aggregation module of the RAMEN architecture are proposed with the objective of further strengthening domain generalization. And the results of both the improvements on the domain generalisation problem are analyzed.
Book ChapterDOI

CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations

TL;DR: The CLEVR-X dataset as mentioned in this paper contains multiple structured textual explanations which are derived from the original scene graphs and describe the reasoning and visual information that is necessary to answer a given question.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Journal ArticleDOI

Long short-term memory

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings ArticleDOI

Fast R-CNN

TL;DR: Fast R-CNN as discussed by the authors proposes a Fast Region-based Convolutional Network method for object detection, which employs several innovations to improve training and testing speed while also increasing detection accuracy and achieves a higher mAP on PASCAL VOC 2012.
Proceedings Article

Sequence to Sequence Learning with Neural Networks

TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.
Related Papers (5)