Grounding Visual Explanations

doi:10.1007/978-3-030-01216-8_17

Open AccessBook ChapterDOI

Grounding Visual Explanations

Lisa Anne Hendricks, +3 more

- pp 269-286

Chats0

TLDR

This paper propose a phrase-critic model to refine generated candidate explanations augmented with flipped phrases which are used as negative examples while training, which improves the textual explanation quality of fine-grained classification decisions by mentioning phrases that are grounded in the image.

Abstract:

Existing visual explanation generating agents learn to fluently justify a class prediction. However, they may mention visual attributes which reflect a strong class prior, although the evidence may not actually be in the image. This is particularly concerning as ultimately such agents fail in building trust with human users. To overcome this limitation, we propose a phrase-critic model to refine generated candidate explanations augmented with flipped phrases which we use as negative examples while training. At inference time, our phrase-critic model takes an image and a candidate explanation as input and outputs a score indicating how well the candidate explanation is grounded in the image. Our explainable AI agent is capable of providing counter arguments for an alternative prediction, i.e. counterfactuals, along with explanations that justify the correct classification decisions. Our model improves the textual explanation quality of fine-grained classification decisions on the CUB dataset by mentioning phrases that are grounded in the image. Moreover, on the FOIL tasks, our agent detects when there is a mistake in the sentence, grounds the incorrect phrase and corrects it significantly better than other models.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

From Recognition to Cognition: Visual Commonsense Reasoning

Rowan Zellers, +3 more

TL;DR: To move towards cognition-level understanding, a new reasoning engine is presented, Recognition to Cognition Networks (R2C), that models the necessary layered inferences for grounding, contextualization, and reasoning.

...read moreread less

Journal ArticleDOI

A Survey of Contrastive and Counterfactual Explanation Generation Methods for Explainable Artificial Intelligence

Ilia Stepin, +3 more

- 13 Jan 2021 -

IEEE Access

TL;DR: In this article, a systematic literature review of contrastive and counterfactual explanations of artificial intelligence algorithms is presented, which provides readers with a thorough and reproducible analysis of the interdisciplinary research field under study.

...read moreread less

Posted Content

Counterfactual VQA: A Cause-Effect Look at Language Bias

Yulei Niu, +5 more

- 08 Jun 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A novel counterfactual inference framework is proposed, which enables the language bias to be captured as the direct causal effect of questions on answers and reduced by subtracting the direct language effect from the total causal effect.

...read moreread less

Journal ArticleDOI

The challenge of crafting intelligible intelligence

Daniel S. Weld, +1 more

- 21 May 2019 -

Communications of The ACM

TL;DR: In this paper, the behavior of complex AI algorithms, especially in mission-critical settings, must be made intelligible to the user, and they must be trusted to make decisions intelligibly.

...read moreread less

Book ChapterDOI

Textual Explanations for Self-Driving Vehicles

Jinkyu Kim, +4 more

TL;DR: A new approach to introspective explanations is proposed which uses a visual (spatial) attention model to train a convolutional network end-to-end from images to the vehicle control commands, and two approaches to attention alignment, strong- and weak-alignment are explored.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings ArticleDOI

Fast R-CNN

Ross Girshick

TL;DR: Fast R-CNN as discussed by the authors proposes a Fast Region-based Convolutional Network method for object detection, which employs several innovations to improve training and testing speed while also increasing detection accuracy and achieves a higher mAP on PASCAL VOC 2012.

...read moreread less

Posted Content

Fast R-CNN

Ross Girshick

- 30 Apr 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection that builds on previous work to efficiently classify object proposals using deep convolutional networks.

...read moreread less

Book ChapterDOI

Visualizing and Understanding Convolutional Networks

Matthew D. Zeiler, +1 more

TL;DR: A novel visualization technique is introduced that gives insight into the function of intermediate feature layers and the operation of the classifier in large Convolutional Network models, used in a diagnostic role to find model architectures that outperform Krizhevsky et al on the ImageNet classification benchmark.

...read moreread less

Collapse

Grounding Visual Explanations

Citations

From Recognition to Cognition: Visual Commonsense Reasoning

A Survey of Contrastive and Counterfactual Explanation Generation Methods for Explainable Artificial Intelligence

Counterfactual VQA: A Cause-Effect Look at Language Bias

The challenge of crafting intelligible intelligence

Textual Explanations for Self-Driving Vehicles

References

Deep Residual Learning for Image Recognition

Long short-term memory

Fast R-CNN

Fast R-CNN

Visualizing and Understanding Convolutional Networks

Related Papers (5)

Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

Visualizing and Understanding Convolutional Networks

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Deep Residual Learning for Image Recognition