Ablation-CAM: Visual Explanations for Deep Convolutional Network via Gradient-free Localization

doi:10.1109/WACV45572.2020.9093360

Home
/
Papers
/
Ablation-CAM: Visual Explanations for Deep Convolutional Network via Gradient-free Localization

Proceedings Article•DOI•

Ablation-CAM: Visual Explanations for Deep Convolutional Network via Gradient-free Localization

Saurabh Desai¹, Harish G. Ramaswamy¹•Institutions (1)

Indian Institute of Technology Madras¹

01 Mar 2020-pp 983-991

TL;DR: This approach – Ablation-based Class Activation Mapping (Ablation CAM) uses ablation analysis to determine the importance of individual feature map units w.r.t. class to produce a coarse localization map highlighting the important regions in the image for predicting the concept.

read less

Abstract: In response to recent criticism of gradient-based visualization techniques, we propose a new methodology to generate visual explanations for deep Convolutional Neural Networks (CNN) - based models. Our approach – Ablation-based Class Activation Mapping (Ablation CAM) uses ablation analysis to determine the importance (weights) of individual feature map units w.r.t. class. Further, this is used to produce a coarse localization map highlighting the important regions in the image for predicting the concept. Our objective and subjective evaluations show that this gradient-free approach works better than state-of-the-art Grad-CAM technique. Moreover, further experiments are carried out to show that Ablation-CAM is class discriminative as well as can be used to evaluate trust in a model.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Deep learning in electron microscopy

[...]

Jeffrey M. Ede¹•Institutions (1)

University of Warwick¹

31 Mar 2021

TL;DR: This review paper offers a practical perspective aimed at developers with limited familiarity of deep learning in electron microscopy that discusses hardware and software needed to get started with deep learning and interface with electron microscopes.

...read moreread less

Abstract: Deep learning is transforming most areas of science and technology, including electron microscopy. This review paper offers a practical perspective aimed at developers with limited familiarity. For context, we review popular applications of deep learning in electron microscopy. Following, we discuss hardware and software needed to get started with deep learning and interface with electron microscopes. We then review neural network components, popular architectures, and their optimization. Finally, we discuss future directions of deep learning in electron microscopy.

...read moreread less

22 citations

Posted Content•

Explaining Convolutional Neural Networks through Attribution-Based Input Sampling and Block-Wise Feature Aggregation

[...]

Sam Sattarzadeh¹, Mahesh Sudhakar¹, Anthony Lem¹, Shervin Mehryar¹, Konstantinos N. Plataniotis¹, Jongseong Jang, Hyunwoo Kim, Yeonjeong Jeong, Sangmin Lee, Kyunghoon Bae - Show less +6 more•Institutions (1)

University of Toronto¹

01 Oct 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work collects visualization maps from multiple layers of the model based on an attribution-based input sampling technique and aggregate them to reach a fine-grained and complete explanation, and proposes a layer selection strategy that applies to the whole family of CNN-based models.

...read moreread less

Abstract: As an emerging field in Machine Learning, Explainable AI (XAI) has been offering remarkable performance in interpreting the decisions made by Convolutional Neural Networks (CNNs). To achieve visual explanations for CNNs, methods based on class activation mapping and randomized input sampling have gained great popularity. However, the attribution methods based on these techniques provide lower resolution and blurry explanation maps that limit their explanation power. To circumvent this issue, visualization based on various layers is sought. In this work, we collect visualization maps from multiple layers of the model based on an attribution-based input sampling technique and aggregate them to reach a fine-grained and complete explanation. We also propose a layer selection strategy that applies to the whole family of CNN-based models, based on which our extraction framework is applied to visualize the last layers of each convolutional block of the model. Moreover, we perform an empirical analysis of the efficacy of derived lower-level information to enhance the represented attributions. Comprehensive experiments conducted on shallow and deep models trained on natural and industrial datasets, using both ground-truth and model-truth based evaluation metrics validate our proposed algorithm by meeting or outperforming the state-of-the-art methods in terms of explanation ability and visual quality, demonstrating that our method shows stability regardless of the size of objects or instances to be explained.

...read moreread less

21 citations

Journal Article•DOI•

Application of deep learning algorithm to detect and visualize vertebral fractures on plain frontal radiographs.

[...]

Hsuan-Yu Chen¹, Benny Wei Yun Hsu², Yu Kai Yin², Feng-Huei Lin¹, Tsung-Han Yang¹, Rong-Sen Yang¹, Chih Kuo Lee¹, Vincent S. Tseng² - Show less +4 more•Institutions (2)

National Taiwan University¹, National Chiao Tung University²

28 Jan 2021-PLOS ONE

TL;DR: In this article, a deep convolutional neural network (DCNN) was used to identify the feasibility for the screening, detection, and localization of vertebral fractures using PARs.

...read moreread less

Abstract: Background Identification of vertebral fractures (VFs) is critical for effective secondary fracture prevention owing to their association with the increasing risks of future fractures. Plain abdominal frontal radiographs (PARs) are a common investigation method performed for a variety of clinical indications and provide an ideal platform for the opportunistic identification of VF. This study uses a deep convolutional neural network (DCNN) to identify the feasibility for the screening, detection, and localization of VFs using PARs. Methods A DCNN was pretrained using ImageNet and retrained with 1306 images from the PARs database obtained between August 2015 and December 2018. The accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC) were evaluated. The visualization algorithm gradient-weighted class activation mapping (Grad-CAM) was used for model interpretation. Results Only 46.6% (204/438) of the VFs were diagnosed in the original PARs reports. The algorithm achieved 73.59% accuracy, 73.81% sensitivity, 73.02% specificity, and an AUC of 0.72 in the VF identification. Conclusion Computer driven solutions integrated with the DCNN have the potential to identify VFs with good accuracy when used opportunistically on PARs taken for a variety of clinical purposes. The proposed model can help clinicians become more efficient and economical in the current clinical pathway of fragile fracture treatment.

...read moreread less

17 citations

Proceedings Article•DOI•

Revisiting The Evaluation of Class Activation Mapping for Explainability: A Novel Metric and Experimental Analysis

[...]

Samuele Poppi¹, Marcella Cornia¹, Lorenzo Baraldi¹, Rita Cucchiara¹•Institutions (1)

University of Modena and Reggio Emilia¹

19 Jun 2021

TL;DR: In this paper, a set of metrics are proposed to quantify explanation maps, which show better effectiveness and simplify comparisons between approaches, and compare different CAM-based visualization methods on the entire ImageNet validation set, fostering proper comparisons and reproducibility.

...read moreread less

Abstract: As the request for deep learning solutions increases, the need for explainability is even more fundamental. In this setting, particular attention has been given to visualization techniques, that try to attribute the right relevance to each input pixel with respect to the output of the network. In this paper, we focus on Class Activation Mapping (CAM) approaches, which provide an effective visualization by taking weighted averages of the activation maps. To enhance the evaluation and the reproducibility of such approaches, we propose a novel set of metrics to quantify explanation maps, which show better effectiveness and simplify comparisons between approaches. To evaluate the appropriateness of the proposal, we compare different CAM-based visualization methods on the entire ImageNet validation set, fostering proper comparisons and reproducibility.

...read moreread less

15 citations

Journal Article•DOI•

Explainable, automated urban interventions to improve pedestrian and vehicle safety

[...]

Cristina Bustos¹, Daniel Rhoads¹, Albert Solé-Ribalta¹, David Masip¹, Alex Arenas², Agata Lapedriza³, Agata Lapedriza¹, Javier Borge-Holthoefer¹ - Show less +4 more•Institutions (3)

Open University of Catalonia¹, Rovira i Virgili University², Massachusetts Institute of Technology³

01 Apr 2021-Transportation Research Part C-emerging Technologies

TL;DR: This paper combines public data sources, large-scale street imagery and computer vision techniques to approach pedestrian and vehicle safety with an automated, relatively simple, and universally-applied data-processing scheme.

...read moreread less

Abstract: At the moment, urban mobility research and governmental initiatives are mostly focused on motor-related issues, e.g. the problems of congestion and pollution. And yet, we cannot disregard the most vulnerable elements in the urban landscape: pedestrians, exposed to higher risks than other road users. Indeed, safe, accessible, and sustainable transport systems in cities are a core target of the UN’s 2030 Agenda. Thus, there is an opportunity to apply advanced computational tools to the problem of traffic safety, in regards especially to pedestrians, who have been often overlooked in the past. This paper combines public data sources, large-scale street imagery and computer vision techniques to approach pedestrian and vehicle safety with an automated, relatively simple, and universally-applicable data-processing scheme. The steps involved in this pipeline include the adaptation and training of a Residual Convolutional Neural Network to determine a hazard index for each given urban scene, as well as an interpretability analysis based on image segmentation and class activation mapping on those same images. Combined, the outcome of this computational approach is a fine-grained map of hazard levels across a city, and an heuristic to identify interventions that might simultaneously improve pedestrian and vehicle safety. The proposed framework should be taken as a complement to the work of urban planners and public authorities.

...read moreread less

14 citations

1
2
3
4
5
6
…
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Deep Residual Learning for Image Recognition

[...]

Kaiming He¹, Xiangyu Zhang¹, Shaoqing Ren¹, Jian Sun¹•Institutions (1)

Microsoft¹

27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

...read moreread less

123,388 citations

Proceedings Article•

ImageNet Classification with Deep Convolutional Neural Networks

[...]

Alex Krizhevsky¹, Ilya Sutskever¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

...read moreread less

73,978 citations

Proceedings Article•

Very Deep Convolutional Networks for Large-Scale Image Recognition

[...]

Karen Simonyan¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

01 Jan 2015

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

...read moreread less

49,914 citations

Proceedings Article•DOI•

Going deeper with convolutions

[...]

Christian Szegedy¹, Wei Liu², Yangqing Jia¹, Pierre Sermanet¹, Scott Reed³, Dragomir Anguelov¹, Dumitru Erhan¹, Vincent Vanhoucke¹, Andrew Rabinovich - Show less +5 more•Institutions (3)

Google¹, University of North Carolina at Chapel Hill², University of Michigan³

07 Jun 2015

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

...read moreread less

40,257 citations

Proceedings Article•DOI•

Fully convolutional networks for semantic segmentation

[...]

Jonathan Long¹, Evan Shelhamer¹, Trevor Darrell¹•Institutions (1)

University of California, Berkeley¹

07 Jun 2015

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.

...read moreread less

Abstract: Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet [20], the VGG net [31], and GoogLeNet [32]) into fully convolutional networks and transfer their learned representations by fine-tuning [3] to the segmentation task. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes less than one fifth of a second for a typical image.

...read moreread less

28,225 citations