Visual Abductive Reasoning
Reads0
Chats0
TLDR
In this paper , the authors proposed a new task and dataset, Visual Abductive Reasoning (VAR), for examining abductive reasoning ability of machine intelligence in everyday visual situations.Abstract:
Abductive reasoning seeks the likeliest possible explanation for partial observations. Although abduction is frequently employed in human daily reasoning, it is rarely explored in computer vision literature. In this paper, we propose a new task and dataset, Visual Abductive Reasoning (VAR), for examining abductive reasoning ability of machine intelligence in everyday visual situations. Given an incomplete set of visual events, AI systems are required to not only describe what is observed, but also infer the hypothesis that can best explain the visual premise. Based on our large-scale VAR dataset, we devise a strong baseline model, REASONER (causal-and-cascaded reasoning Transformer). First, to capture the causal structure of the observations, a contextualized directional position embedding strategy is adopted in the encoder, that yields discriminative represen-tations for the premise and hypothesis. Then, multiple de-coders are cascaded to generate and progressively refine the premise and hypothesis sentences. The prediction scores of the sentences are used to guide cross-sentence information flow in the cascaded reasoning procedure. Our VAR bench-marking results show that REASONER surpasses many famous video-language models, while still being far behind human performance. This work is expected to foster future efforts in the reasoning-beyond-observation paradigm. read more
Citations
More filters
Journal ArticleDOI
Local-Global Context Aware Transformer for Language-Guided Video Segmentation
TL;DR: In this article , the Transformer architecture is augmented with a finite memory for language-guided video segmentation (LVS), which is designed to persistently preserve global video content and dynamically gather local temporal context and segmentation history.
Journal ArticleDOI
Cascade-refine model for cephalometric landmark detection in high-resolution orthodontic images
TL;DR: Zhang et al. as mentioned in this paper interpreted the cascade-connected neural network (CCNN) as the discrete approximation of ordinary differential equations and proposed a cascade-refine model, which takes advantage of CCNNs and makes it possible to overcome the limitations of number and depth by sharing parameters among stacked network backbones.
Journal ArticleDOI
Cross-modal transformer with language query for referring image segmentation
TL;DR: Zhang et al. as discussed by the authors proposed a cross-modal transformer (CMT) with language queries for referring image segmentation, which combines the mutual guidance of vision and language.
Journal ArticleDOI
Boundary-constrained interpretable image reconstruction network for deep compressive sensing
TL;DR: The edge guided interpretable image compressive sensing network (EGINet) as mentioned in this paper proposes an edge-aware feature extraction module, an edge guided intermediate variable updating module and an intermediate-variable guided image reconstruction module.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings ArticleDOI
Bleu: a Method for Automatic Evaluation of Machine Translation
TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
Journal ArticleDOI
A learning algorithm for continually running fully recurrent neural networks
Ronald J. Williams,David Zipser +1 more
TL;DR: The exact form of a gradient-following learning algorithm for completely recurrent networks running in continually sampled time is derived and used as the basis for practical algorithms for temporal supervised learning tasks.
Proceedings ArticleDOI
CIDEr: Consensus-based image description evaluation
TL;DR: A novel paradigm for evaluating image descriptions that uses human consensus is proposed and a new automated metric that captures human judgment of consensus better than existing metrics across sentences generated by various sources is evaluated.
Journal ArticleDOI
Cloze procedure: a new tool for measuring readability
TL;DR: This is the first comprehensive statement of a research method and its theory and findings from three pilot studies and two experiments in which “cloze procedure” results are compared with those of two readability formulas.