scispace - formally typeset
Open AccessBook ChapterDOI

Grounding Visual Explanations

Reads0
Chats0
TLDR
This paper propose a phrase-critic model to refine generated candidate explanations augmented with flipped phrases which are used as negative examples while training, which improves the textual explanation quality of fine-grained classification decisions by mentioning phrases that are grounded in the image.
Abstract
Existing visual explanation generating agents learn to fluently justify a class prediction. However, they may mention visual attributes which reflect a strong class prior, although the evidence may not actually be in the image. This is particularly concerning as ultimately such agents fail in building trust with human users. To overcome this limitation, we propose a phrase-critic model to refine generated candidate explanations augmented with flipped phrases which we use as negative examples while training. At inference time, our phrase-critic model takes an image and a candidate explanation as input and outputs a score indicating how well the candidate explanation is grounded in the image. Our explainable AI agent is capable of providing counter arguments for an alternative prediction, i.e. counterfactuals, along with explanations that justify the correct classification decisions. Our model improves the textual explanation quality of fine-grained classification decisions on the CUB dataset by mentioning phrases that are grounded in the image. Moreover, on the FOIL tasks, our agent detects when there is a mistake in the sentence, grounds the incorrect phrase and corrects it significantly better than other models.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

From Recognition to Cognition: Visual Commonsense Reasoning

TL;DR: To move towards cognition-level understanding, a new reasoning engine is presented, Recognition to Cognition Networks (R2C), that models the necessary layered inferences for grounding, contextualization, and reasoning.
Journal ArticleDOI

A Survey of Contrastive and Counterfactual Explanation Generation Methods for Explainable Artificial Intelligence

TL;DR: In this article, a systematic literature review of contrastive and counterfactual explanations of artificial intelligence algorithms is presented, which provides readers with a thorough and reproducible analysis of the interdisciplinary research field under study.
Posted Content

Counterfactual VQA: A Cause-Effect Look at Language Bias

TL;DR: A novel counterfactual inference framework is proposed, which enables the language bias to be captured as the direct causal effect of questions on answers and reduced by subtracting the direct language effect from the total causal effect.
Journal ArticleDOI

The challenge of crafting intelligible intelligence

TL;DR: In this paper, the behavior of complex AI algorithms, especially in mission-critical settings, must be made intelligible to the user, and they must be trusted to make decisions intelligibly.
Book ChapterDOI

Textual Explanations for Self-Driving Vehicles

TL;DR: A new approach to introspective explanations is proposed which uses a visual (spatial) attention model to train a convolutional network end-to-end from images to the vehicle control commands, and two approaches to attention alignment, strong- and weak-alignment are explored.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Journal ArticleDOI

Long short-term memory

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings ArticleDOI

Fast R-CNN

TL;DR: Fast R-CNN as discussed by the authors proposes a Fast Region-based Convolutional Network method for object detection, which employs several innovations to improve training and testing speed while also increasing detection accuracy and achieves a higher mAP on PASCAL VOC 2012.
Posted Content

Fast R-CNN

TL;DR: This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection that builds on previous work to efficiently classify object proposals using deep convolutional networks.
Book ChapterDOI

Visualizing and Understanding Convolutional Networks

TL;DR: A novel visualization technique is introduced that gives insight into the function of intermediate feature layers and the operation of the classifier in large Convolutional Network models, used in a diagnostic role to find model architectures that outperform Krizhevsky et al on the ImageNet classification benchmark.
Related Papers (5)