Ablation-CAM: Visual Explanations for Deep Convolutional Network via Gradient-free Localization

doi:10.1109/WACV45572.2020.9093360

Home
/
Papers
/
Ablation-CAM: Visual Explanations for Deep Convolutional Network via Gradient-free Localization

Proceedings Article•DOI•

Ablation-CAM: Visual Explanations for Deep Convolutional Network via Gradient-free Localization

Saurabh Desai¹, Harish G. Ramaswamy¹•Institutions (1)

Indian Institute of Technology Madras¹

01 Mar 2020-pp 983-991

TL;DR: This approach – Ablation-based Class Activation Mapping (Ablation CAM) uses ablation analysis to determine the importance of individual feature map units w.r.t. class to produce a coarse localization map highlighting the important regions in the image for predicting the concept.

read less

Abstract: In response to recent criticism of gradient-based visualization techniques, we propose a new methodology to generate visual explanations for deep Convolutional Neural Networks (CNN) - based models. Our approach – Ablation-based Class Activation Mapping (Ablation CAM) uses ablation analysis to determine the importance (weights) of individual feature map units w.r.t. class. Further, this is used to produce a coarse localization map highlighting the important regions in the image for predicting the concept. Our objective and subjective evaluations show that this gradient-free approach works better than state-of-the-art Grad-CAM technique. Moreover, further experiments are carried out to show that Ablation-CAM is class discriminative as well as can be used to evaluate trust in a model.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Explainable deep learning for efficient and robust pattern recognition: A survey of recent developments

[...]

Xiao Bai¹, Xiang Wang¹, Xianglong Liu¹, Qiang Liu², Jingkuan Song³, Niculae Sebe⁴, Been Kim⁵ - Show less +3 more•Institutions (5)

Beihang University¹, University of Texas at Austin², University of Electronic Science and Technology of China³, University of Trento⁴, Google⁵

01 Dec 2021-Pattern Recognition

TL;DR: In this article, explainable deep learning methods are grouped into three main categories: efficient deep learning via model compression and acceleration, as well as robustness and stability in deep learning.

...read moreread less

101 citations

Posted Content•

Axiom-based Grad-CAM: Towards Accurate Visualization and Explanation of CNNs

[...]

Ruigang Fu¹, Qingyong Hu², Xiaohu Dong¹, Yulan Guo¹, Yinghui Gao¹, Biao Li¹ - Show less +2 more•Institutions (2)

National University of Defense Technology¹, University of Oxford²

05 Aug 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper introduces two axioms -- Conservation and Sensitivity -- to the visualization paradigm of the CAM methods and proposes a dedicated Axiom-based Grad-CAM (XGrad-Cam) that is able to achieve better visualization performance and be class-discriminative and easy-to-implement compared with Grad-cAM++ and Ablation-C AM.

...read moreread less

Abstract: To have a better understanding and usage of Convolution Neural Networks (CNNs), the visualization and interpretation of CNNs has attracted increasing attention in recent years. In particular, several Class Activation Mapping (CAM) methods have been proposed to discover the connection between CNN's decision and image regions. In spite of the reasonable visualization, lack of clear and sufficient theoretical support is the main limitation of these methods. In this paper, we introduce two axioms -- Conservation and Sensitivity -- to the visualization paradigm of the CAM methods. Meanwhile, a dedicated Axiom-based Grad-CAM (XGrad-CAM) is proposed to satisfy these axioms as much as possible. Experiments demonstrate that XGrad-CAM is an enhanced version of Grad-CAM in terms of conservation and sensitivity. It is able to achieve better visualization performance than Grad-CAM, while also be class-discriminative and easy-to-implement compared with Grad-CAM++ and Ablation-CAM. The code is available at this https URL.

...read moreread less

85 citations

Cites background or methods from "Ablation-CAM: Visual Explanations f..."

...Besides, they also break the axiom of implementation invariance since they are layer sensitive [4]....
[...]
..., Grad-CAM [23], Grad-CAM++ [3] and Ablation-CAM [4])....
[...]
...[4] proposed Ablation-CAM to remove the dependence on gradients but this method is quite time-consuming since it has to run forward propagation for hundreds of times per image....
[...]
...Note that the original weight of each feature map in Ablation-CAM [4] is defined as Sc(F )−Sc(F\F) ||Flk|| ....
[...]
...This definition is inspired by CAM [32] and further improved by other works, such as Grad-CAM++ [3] and Ablation-CAM [4]....
[...]

Journal Article•DOI•

Review: Deep Learning in Electron Microscopy

[...]

Jeffrey M. Ede¹•Institutions (1)

University of Warwick¹

17 Sep 2020-arXiv: Image and Video Processing

TL;DR: In this paper, a review of deep learning in electron microscopy is presented, with a focus on hardware and software needed to get started with deep learning and interface with electron microscopes.

...read moreread less

Abstract: Deep learning is transforming most areas of science and technology, including electron microscopy. This review paper offers a practical perspective aimed at developers with limited familiarity. For context, we review popular applications of deep learning in electron microscopy. Following, we discuss hardware and software needed to get started with deep learning and interface with electron microscopes. We then review neural network components, popular architectures, and their optimization. Finally, we discuss future directions of deep learning in electron microscopy.

...read moreread less

59 citations

Posted Content•

Deep weakly-supervised learning methods for classification and localization in histology images: a survey.

[...]

Jérôme Rony, Soufiane Belharbi, Jose Dolz, Ismail Ben Ayed, Luke McCaffrey, Eric Granger - Show less +2 more

08 Sep 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: Results indicate that several deep learning models, and in particular WILDCAT and deep MIL can provide a high level of classification accuracy, although pixel-wise localization of cancer regions remains an issue for such images.

...read moreread less

Abstract: Using state-of-the-art deep learning models for cancer diagnosis presents several challenges related to the nature and availability of labeled histology images. In particular, cancer grading and localization in these images normally relies on both image- and pixel-level labels, the latter requiring a costly annotation process. In this survey, deep weakly-supervised learning (WSL) models are investigated to identify and locate diseases in histology images, without the need for pixel-level annotations. Given training data with global image-level labels, these models allow to simultaneously classify histology images and yield pixel-wise localization scores, thereby identifying the corresponding regions of interest (ROI). Since relevant WSL models have mainly been investigated within the computer vision community, and validated on natural scene images, we assess the extent to which they apply to histology images which have challenging properties, e.g. very large size, similarity between foreground/background, highly unstructured regions, stain heterogeneity, and noisy/ambiguous labels. The most relevant models for deep WSL are compared experimentally in terms of accuracy (classification and pixel-wise localization) on several public benchmark histology datasets for breast and colon cancer -- BACH ICIAR 2018, BreaKHis, CAMELYON16, and GlaS. Furthermore, for large-scale evaluation of WSL models on histology images, we propose a protocol to construct WSL datasets from Whole Slide Imaging. Results indicate that several deep learning models can provide a high level of classification accuracy, although accurate pixel-wise localization of cancer regions remains an issue for such images. Code is publicly available.

...read moreread less

48 citations

Posted Content•

SS-CAM: Smoothed Score-CAM for Sharper Visual Feature Localization.

[...]

Haofan Wang¹, Rakshit Naidu², Joy Michael², Soumya Snigdha Kundu•Institutions (2)

Carnegie Mellon University¹, Manipal University²

25 Jun 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper introduces an enhanced visual explanation in terms of visual sharpness called SS-CAM, which produces centralized localization of object features within an image through a smooth operation, which outperforms Score-C CAM on both faithfulness and localization tasks.

...read moreread less

Abstract: Interpretation of the underlying mechanisms of Deep Convolutional Neural Networks has become an important aspect of research in the field of deep learning due to their applications in high-risk environments To explain these black-box architectures there have been many methods applied so the internal decisions can be analyzed and understood In this paper, built on the top of Score-CAM, we introduce an enhanced visual explanation in terms of visual sharpness called SS-CAM, which produces centralized localization of object features within an image through a smooth operation We evaluate our method on the ILSVRC 2012 Validation dataset, which outperforms Score-CAM on both faithfulness and localization tasks

...read moreread less

37 citations

Cites background from "Ablation-CAM: Visual Explanations f..."

...They can be divided into two branches, one is gradient-based CAMs [2], [15], which represent the linear weights corresponding to internal activation maps by gradient information....
[...]
...As the output layer is a non-linear function, gradient-based CAMs tend to diminish the backpropagating gradients which cause gradient saturation thereby making it difficult to provide concrete explanations....
[...]
...These categories are known as Class Activation Maps (CAMs)....
[...]
...The other is gradient-free CAMs [4], [23] which capture the importance of each activation map by the target score in forward propagation....
[...]
...The generalisation of CAMs take place with Grad-CAM [15]....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization

[...]

Ramprasaath R. Selvaraju¹, Michael Cogswell², Abhishek Das², Ramakrishna Vedantam¹, Devi Parikh³, Dhruv Batra¹ - Show less +2 more•Institutions (3)

Virginia Tech¹, Georgia Institute of Technology², Facebook³

01 Oct 2017

TL;DR: This work combines existing fine-grained visualizations to create a high-resolution class-discriminative visualization, Guided Grad-CAM, and applies it to image classification, image captioning, and visual question answering (VQA) models, including ResNet-based architectures.

...read moreread less

Abstract: We propose a technique for producing ‘visual explanations’ for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent. Our approach – Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concept (say logits for ‘dog’ or even a caption), flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept. Unlike previous approaches, Grad- CAM is applicable to a wide variety of CNN model-families: (1) CNNs with fully-connected layers (e.g. VGG), (2) CNNs used for structured outputs (e.g. captioning), (3) CNNs used in tasks with multi-modal inputs (e.g. visual question answering) or reinforcement learning, without architectural changes or re-training. We combine Grad-CAM with existing fine-grained visualizations to create a high-resolution class-discriminative visualization, Guided Grad-CAM, and apply it to image classification, image captioning, and visual question answering (VQA) models, including ResNet-based architectures. In the context of image classification models, our visualizations (a) lend insights into failure modes of these models (showing that seemingly unreasonable predictions have reasonable explanations), (b) outperform previous methods on the ILSVRC-15 weakly-supervised localization task, (c) are more faithful to the underlying model, and (d) help achieve model generalization by identifying dataset bias. For image captioning and VQA, our visualizations show even non-attention based models can localize inputs. Finally, we design and conduct human studies to measure if Grad-CAM explanations help users establish appropriate trust in predictions from deep networks and show that Grad-CAM helps untrained users successfully discern a ‘stronger’ deep network from a ‘weaker’ one even when both make identical predictions. Our code is available at https: //github.com/ramprs/grad-cam/ along with a demo on CloudCV [2] and video at youtu.be/COjUB9Izk6E.

...read moreread less

7,556 citations

Proceedings Article•DOI•

Learning Deep Features for Discriminative Localization

[...]

Bolei Zhou¹, Aditya Khosla¹, Agata Lapedriza¹, Aude Oliva¹, Antonio Torralba¹ - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

27 Jun 2016

TL;DR: This work revisits the global average pooling layer proposed in [13], and sheds light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability despite being trained on imagelevel labels.

...read moreread less

Abstract: In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability despite being trained on imagelevel labels. While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that exposes the implicit attention of CNNs on an image. Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014 without training on any bounding box annotation. We demonstrate in a variety of experiments that our network is able to localize the discriminative image regions despite just being trained for solving classification task1.

...read moreread less

5,978 citations

"Ablation-CAM: Visual Explanations f..." refers methods in this paper

...For CNNs with Global Average Pooling (GAP) layer as penultimate layer, Class Activation Mapping (CAM) [21] produces classdiscriminative visualization maps....
[...]

Proceedings Article•DOI•

Show and tell: A neural image caption generator

[...]

Oriol Vinyals¹, Alexander Toshev¹, Samy Bengio¹, Dumitru Erhan¹•Institutions (1)

Google¹

07 Jun 2015

TL;DR: In this paper, a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation is proposed to generate natural sentences describing an image, which can be used to automatically describe the content of an image.

...read moreread less

Abstract: Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. The model is trained to maximize the likelihood of the target description sentence given the training image. Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. Our model is often quite accurate, which we verify both qualitatively and quantitatively. For instance, while the current state-of-the-art BLEU-1 score (the higher the better) on the Pascal dataset is 25, our approach yields 59, to be compared to human performance around 69. We also show BLEU-1 score improvements on Flickr30k, from 56 to 66, and on SBU, from 19 to 28. Lastly, on the newly released COCO dataset, we achieve a BLEU-4 of 27.7, which is the current state-of-the-art.

...read moreread less

5,095 citations

Posted Content•

Learning Deep Features for Discriminative Localization

[...]

Bolei Zhou¹, Aditya Khosla¹, Agata Lapedriza¹, Aude Oliva¹, Antonio Torralba¹ - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

14 Dec 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, the authors revisited the global average pooling layer and shed light on how it explicitly enables the convolutional neural network to have remarkable localization ability despite being trained on image-level labels.

...read moreread less

Abstract: In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network to have remarkable localization ability despite being trained on image-level labels. While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that can be applied to a variety of tasks. Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014, which is remarkably close to the 34.2% top-5 error achieved by a fully supervised CNN approach. We demonstrate that our network is able to localize the discriminative image regions on a variety of tasks despite not being trained for them

...read moreread less

5,065 citations

Proceedings Article•

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

[...]

Karen Simonyan¹, Andrea Vedaldi¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

23 Dec 2013

TL;DR: In this paper, the gradient of the class score with respect to the input image is computed to compute a class saliency map, which can be used for weakly supervised object segmentation using classification ConvNets.

...read moreread less

Abstract: This paper addresses the visualisation of image classification models, learnt using deep Convolutional Networks (ConvNets). We consider two visualisation techniques, based on computing the gradient of the class score with respect to the input image. The first one generates an image, which maximises the class score [5], thus visualising the notion of the class, captured by a ConvNet. The second technique computes a class saliency map, specific to a given image and class. We show that such maps can be employed for weakly supervised object segmentation using classification ConvNets. Finally, we establish the connection between the gradient-based ConvNet visualisation methods and deconvolutional networks [13].

...read moreread less

4,959 citations