Visual Abductive Reasoning

doi:10.1109/cvpr52688.2022.01512

Open AccessProceedings ArticleDOI

Visual Abductive Reasoning

Chats0

TLDR

In this paper , the authors proposed a new task and dataset, Visual Abductive Reasoning (VAR), for examining abductive reasoning ability of machine intelligence in everyday visual situations.

Abstract:

Abductive reasoning seeks the likeliest possible explanation for partial observations. Although abduction is frequently employed in human daily reasoning, it is rarely explored in computer vision literature. In this paper, we propose a new task and dataset, Visual Abductive Reasoning (VAR), for examining abductive reasoning ability of machine intelligence in everyday visual situations. Given an incomplete set of visual events, AI systems are required to not only describe what is observed, but also infer the hypothesis that can best explain the visual premise. Based on our large-scale VAR dataset, we devise a strong baseline model, REASONER (causal-and-cascaded reasoning Transformer). First, to capture the causal structure of the observations, a contextualized directional position embedding strategy is adopted in the encoder, that yields discriminative represen-tations for the premise and hypothesis. Then, multiple de-coders are cascaded to generate and progressively refine the premise and hypothesis sentences. The prediction scores of the sentences are used to guide cross-sentence information flow in the cascaded reasoning procedure. Our VAR bench-marking results show that REASONER surpasses many famous video-language models, while still being far behind human performance. This work is expected to foster future efforts in the reasoning-beyond-observation paradigm.

Visual Abductive Reasoning

Citations

Local-Global Context Aware Transformer for Language-Guided Video Segmentation

Cascade-refine model for cephalometric landmark detection in high-resolution orthodontic images

Cross-modal transformer with language query for referring image segmentation

Boundary-constrained interpretable image reconstruction network for deep compressive sensing

Co-attention graph convolutional network for visual question answering

References

Deep Residual Learning for Image Recognition

Bleu: a Method for Automatic Evaluation of Machine Translation

A learning algorithm for continually running fully recurrent neural networks

CIDEr: Consensus-based image description evaluation

Cloze procedure: a new tool for measuring readability

Related Papers (5)

Influencing nonmonotonic reasoning by modifier strength manipulation

Visual Abductive Reasoning

The mixture of incomplete reasoing with inexact reasoning and its application to a real expert system

Visual Abductive Reasoning

Backward P-reasoning and Attribution Residual Discovery-Application