Ablation-CAM: Visual Explanations for Deep Convolutional Network via Gradient-free Localization

doi:10.1109/WACV45572.2020.9093360

Home
/
Papers
/
Ablation-CAM: Visual Explanations for Deep Convolutional Network via Gradient-free Localization

Proceedings Article•DOI•

Ablation-CAM: Visual Explanations for Deep Convolutional Network via Gradient-free Localization

Saurabh Desai¹, Harish G. Ramaswamy¹•Institutions (1)

Indian Institute of Technology Madras¹

01 Mar 2020-pp 983-991

TL;DR: This approach – Ablation-based Class Activation Mapping (Ablation CAM) uses ablation analysis to determine the importance of individual feature map units w.r.t. class to produce a coarse localization map highlighting the important regions in the image for predicting the concept.

read less

Abstract: In response to recent criticism of gradient-based visualization techniques, we propose a new methodology to generate visual explanations for deep Convolutional Neural Networks (CNN) - based models. Our approach – Ablation-based Class Activation Mapping (Ablation CAM) uses ablation analysis to determine the importance (weights) of individual feature map units w.r.t. class. Further, this is used to produce a coarse localization map highlighting the important regions in the image for predicting the concept. Our objective and subjective evaluations show that this gradient-free approach works better than state-of-the-art Grad-CAM technique. Moreover, further experiments are carried out to show that Ablation-CAM is class discriminative as well as can be used to evaluate trust in a model.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Multi-granularity visual explanations for CNN

[...]

Huanan Bao, Guoyin Wang, Shuai Li, Qun Liu

01 Jul 2022-Knowledge Based Systems

TL;DR: Zhang et al. as discussed by the authors proposed a new CAM method to divide the highlighted areas into commonality and specificity saliency maps for multi-granularity visualization, which is based on the concept of granular computing theory.

...read moreread less

Abstract: The interpretability of convolutional neural networks (CNNs) is attracting increasing attention. Class activation maps (CAM) intuitively explain the classification mechanisms of CNNs by highlighting important areas. However, as coarse-grained explanations, classical CAM methods are incapable of explaining the classification mechanism in detail. Inspired by the concept of granular computing theory, we propose a new CAM method to divide the highlighted areas into commonality saliency maps and specificity saliency maps for multi-granularity visualization. This method consists of three components. First, the universe is simplified to contain only a category pair. Then, neighborhood rough sets are used to divide the universe into three disjointed regions containing the commonality and specificity of category pairs by adaptive thresholds of optimal granularity. Finally, these three regions are used to generate multi-granularity saliency maps. This method well visualizes the multi-granularity classification mechanism of the CNN and further explains the misclassification. We compare this method to five representative CAM methods using two newly proposed fine-grained evaluation metrics and subjective observations. First, experiments demonstrate that the multi-granularity visualization method provides a more extensive and detailed explanation. Second, the adaptive thresholds can be adapted to different situations to obtain a reliable visualization explanation. Finally, in explaining of the adversarial attack, it visualizes the details that caused misclassification.

...read moreread less

3 citations

Journal Article•DOI•

SAR-BagNet: An Ante-hoc Interpretable Recognition Model Based on Deep Network for SAR Image

[...]

Peng Li, Cunqian Feng, Xiaowei Hu, Zixiang Tang

30 Apr 2022-Remote sensing

TL;DR: SAR-BagNet is presented, which is a novel interpretable recognition framework for SAR images that can provide a clear heatmap that can accurately reflect the impact of each part of a SAR image on the final network decision.

...read moreread less

Abstract: Convolutional neural networks (CNNs) have been widely used in SAR image recognition and have achieved high recognition accuracy on some public datasets. However, due to the opacity of the decision-making mechanism, the reliability and credibility of CNNs are insufficient at present, which hinders their application in some important fields such as SAR image recognition. In recent years, various interpretable network structures have been proposed to discern the relationship between a CNN’s decision and image regions. Unfortunately, most interpretable networks are based on optical images, which have poor recognition performance for SAR images, and most of them cannot accurately explain the relationship between image parts and classification decisions. Based on the above problems, in this study, we present SAR-BagNet, which is a novel interpretable recognition framework for SAR images. SAR-BagNet can provide a clear heatmap that can accurately reflect the impact of each part of a SAR image on the final network decision. Except for the good interpretability, SAR-BagNet also has high recognition accuracy and can achieve 98.25% test accuracy.

...read moreread less

3 citations

Decoupling Deep Learning for Interpretable Image Recognition

[...]

Yitao Peng, Yihang Liu, Longzhen Yang, Lianghua He

15 Oct 2022

TL;DR: DProtoNet as mentioned in this paper decouples the inference and interpretation modules of a prototype-based network by avoiding the use of prototype activation to explain the network's decisions in order to simultaneously improve the accuracy and interpretability of the neural network.

...read moreread less

Abstract: The interpretability of neural networks has recently received extensive attention. Previous prototype-based explainable networks involved prototype activation in both reasoning and interpretation processes, requiring speciﬁc explainable structures for the prototype, thus making the network less accurate as it gains interpretability. Therefore, the decoupling prototypical network (DProtoNet) was proposed to avoid this problem. This new model contains encoder, inference, and interpretation modules. As regards the encoder module, unrestricted feature masks were presented to generate expressive features and prototypes. Regarding the inference module, a multi-image prototype learning method was introduced to update prototypes so that the network can learn generalized prototypes. Finally, concerning the interpretation module, a multiple dynamic masks (MDM) decoder was suggested to explain the neural network, which generates heatmaps using the consistent activation of the original image and mask image at the detection nodes of the network. It decouples the inference and interpretation modules of a prototype-based network by avoiding the use of prototype activation to explain the network’s decisions in order to simultaneously improve the accuracy and interpretability of the neural network. The multiple public general and medical datasets were tested, and the results conﬁrmed that our method could achieve a 5% improvement in accuracy and state-of-the-art interpretability compared with previous methods.

...read moreread less

3 citations

Journal Article•DOI•

A Military Object Detection Model of UAV Reconnaissance Image and Feature Visualization

[...]

Huanhua Liu, Yonghao Yu, Sheng Liu, Wei Wang

29 Nov 2022-Applied Sciences

TL;DR: In this paper , a large-scale feature map together with multi-scale feedback is added to improve the recognition ability of small objects, and the anchor boxes are optimized by clustering the ground truth object box of UAVT-3.

...read moreread less

Abstract: Military object detection from Unmanned Aerial Vehicle (UAV) reconnaissance images faces challenges, including lack of image data, images with poor quality, and small objects. In this work, we simulate UAV low-altitude reconnaissance and construct the UAV reconnaissance image tank database UAVT-3. Then, we improve YOLOv5 and propose UAVT-YOLOv5 for object detection of UAV images. First, data augmentation of blurred images is introduced to improve the accuracy of fog and motion-blurred images. Secondly, a large-scale feature map together with multi-scale feedback is added to improve the recognition ability of small objects. Thirdly, we optimize the loss function by increasing the loss penalty of small objects and classes with fewer samples. Finally, the anchor boxes are optimized by clustering the ground truth object box of UAVT-3. The feature visualization technique Class Action Mapping (CAM) is introduced to explore the mechanisms of the proposed model. The experimental results of the improved model evaluated on UAVT-3 show that the mAP reaches 99.2%, an increase of 2.1% compared with YOLOv5, the detection speed is 40 frames per second, and data augmentation of blurred images yields an mAP increase of 20.4% and 26.6% for fog and motion blur images detection. The class action maps show the discriminant region of the tanks is the turret for UAVT-YOLOv5.

...read moreread less

3 citations

Proceedings Article•DOI•

Bagging Regional Classification Activation Maps for Weakly Supervised Object Localization

[...]

Lei Zhu, Qian Chen, Lujia Jin, Yunfei You, Yanye Lu - Show less +1 more

16 Jul 2022

TL;DR: This paper elaborates a plug-and-play mechanism called BagCAMs to better project a well-trained classifier for the localization task without refining or re-training the baseline structure and can improve the performance of baseline WSOL methods to a great extent.

...read moreread less

Abstract: , Abstract. Classification activation map (CAM), utilizing the classification structure to generate pixel-wise localization maps, is a crucial mechanism for weakly supervised object localization (WSOL). However, CAM directly uses the classifier trained on image-level features to locate objects, making it prefers to discern global discriminative factors rather than regional object cues. Thus only the discriminative locations are activated when feeding pixel-level features into this classifier. To solve this issue, this paper elaborates a plug-and-play mechanism called BagCAMs to better project a well-trained classifier for the localization task without refining or re-training the baseline structure. Our BagCAMs adopts a proposed regional localizer generation (RLG) strategy to define a set of regional localizers and then derive them from a well-trained classifier. These regional localizers can be viewed as the base learner that only dis-cerns region-wise object factors for localization tasks, and their results can be effectively weighted by our BagCAMs to form the final localization map. Experiments indicate that adopting our proposed BagCAMs can improve the performance of baseline WSOL methods to a great extent and obtains state-of-the-art performance on three WSOL benchmarks. Code are released at https://github.com/zh460045050/BagCAMs .

...read moreread less

3 citations

1
2
3
4
5
…
6
7
8
9
10
11
12
…
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Deep Residual Learning for Image Recognition

[...]

Kaiming He¹, Xiangyu Zhang¹, Shaoqing Ren¹, Jian Sun¹•Institutions (1)

Microsoft¹

27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

...read moreread less

123,388 citations

Proceedings Article•

ImageNet Classification with Deep Convolutional Neural Networks

[...]

Alex Krizhevsky¹, Ilya Sutskever¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

...read moreread less

73,978 citations

Proceedings Article•

Very Deep Convolutional Networks for Large-Scale Image Recognition

[...]

Karen Simonyan¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

01 Jan 2015

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

...read moreread less

49,914 citations

Proceedings Article•DOI•

Going deeper with convolutions

[...]

Christian Szegedy¹, Wei Liu², Yangqing Jia¹, Pierre Sermanet¹, Scott Reed³, Dragomir Anguelov¹, Dumitru Erhan¹, Vincent Vanhoucke¹, Andrew Rabinovich - Show less +5 more•Institutions (3)

Google¹, University of North Carolina at Chapel Hill², University of Michigan³

07 Jun 2015

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

...read moreread less

40,257 citations

Proceedings Article•DOI•

Fully convolutional networks for semantic segmentation

[...]

Jonathan Long¹, Evan Shelhamer¹, Trevor Darrell¹•Institutions (1)

University of California, Berkeley¹

07 Jun 2015

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.

...read moreread less

Abstract: Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet [20], the VGG net [31], and GoogLeNet [32]) into fully convolutional networks and transfer their learned representations by fine-tuning [3] to the segmentation task. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes less than one fifth of a second for a typical image.

...read moreread less

28,225 citations