scispace - formally typeset
Open AccessPosted Content

Axiom-based Grad-CAM: Towards Accurate Visualization and Explanation of CNNs

Reads0
Chats0
TLDR
This paper introduces two axioms -- Conservation and Sensitivity -- to the visualization paradigm of the CAM methods and proposes a dedicated Axiom-based Grad-CAM (XGrad-Cam) that is able to achieve better visualization performance and be class-discriminative and easy-to-implement compared with Grad-cAM++ and Ablation-C AM.
Abstract
To have a better understanding and usage of Convolution Neural Networks (CNNs), the visualization and interpretation of CNNs has attracted increasing attention in recent years. In particular, several Class Activation Mapping (CAM) methods have been proposed to discover the connection between CNN's decision and image regions. In spite of the reasonable visualization, lack of clear and sufficient theoretical support is the main limitation of these methods. In this paper, we introduce two axioms -- Conservation and Sensitivity -- to the visualization paradigm of the CAM methods. Meanwhile, a dedicated Axiom-based Grad-CAM (XGrad-CAM) is proposed to satisfy these axioms as much as possible. Experiments demonstrate that XGrad-CAM is an enhanced version of Grad-CAM in terms of conservation and sensitivity. It is able to achieve better visualization performance than Grad-CAM, while also be class-discriminative and easy-to-implement compared with Grad-CAM++ and Ablation-CAM. The code is available at this https URL.

read more

Citations
More filters
Posted Content

Explaining Convolutional Neural Networks through Attribution-Based Input Sampling and Block-Wise Feature Aggregation

TL;DR: This work collects visualization maps from multiple layers of the model based on an attribution-based input sampling technique and aggregate them to reach a fine-grained and complete explanation, and proposes a layer selection strategy that applies to the whole family of CNN-based models.
Journal ArticleDOI

A Cascade Attention Based Facial Expression Recognition Network by Fusing Multi-Scale Spatio-Temporal Features

TL;DR: This paper proposes a cascade attention-based facial expression recognition network on the basis of a combination of (i) local spatial feature, (ii) multi-scale-stereoscopic spatial context feature (extracted from the 3-scale pyramid feature), and (iii) temporal feature.
Journal ArticleDOI

CNN-LRP: Understanding Convolutional Neural Networks Performance for Target Recognition in SAR Images

TL;DR: A novel LRP algorithm particularly designed for understanding CNN’s performance on SAR image target recognition is proposed, providing a concise form of the correlation between output of a layer and weights of the next layer in CNNs.
Posted Content

Deep Active Learning for Joint Classification & Segmentation with Weak Annotator

TL;DR: The results indicate that, by simply using random sample selection, the proposed approach can significantly outperform state-of-the-art CAMs and AL methods, with an identical oracle-supervision budget.
References
More filters
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Proceedings ArticleDOI

Fully convolutional networks for semantic segmentation

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
Related Papers (5)