scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

How You See Me: Understanding Convolutional Neural Networks

TL;DR: This paper answers the question, “How does a CNN look at an image?” and proposes a generic approach which can be applied to any architecture of a neural network, which doesn't require any additional training or architectural changes.
Abstract: Convolutional Neural networks(CNN) are one of the most powerful tools in the present era of science. There has been a lot of research done to improve their performance and robustness while their internal working was left unexplored to much extent. They are often defined as black boxes that can map non-linear data effectively. This paper answers the question, “How does a CNN look at an image?”. Visual results are also provided to strongly support the proposed method. The proposed algorithm exploits the basic math behind CNN to backtrack the important pixels. This is a generic approach which can be applied to any architecture of a neural network. This doesn't require any additional training or architectural changes. In literature, few attempts have been made to explain how learning happens in CNN internally, by exploiting the convolution filter maps. This is a simple algorithm as it does not involve any cost functions, filter exploitation, gradient calculations or probability scores. Further, we demonstrate that the proposed scheme can be used in some important computer vision tasks such as object detection, salient region proposal, etc.
Citations
More filters
Journal ArticleDOI
TL;DR: In this paper , the authors conducted three experiments on different datasets to train models with various transfer learning architectures and concluded that the DenseNet-121 architecture is the best transfer learning architecture model for various datasets.
Abstract: Deep learning is a branch of machine learning with many highly successful applications. One application of deep learning is image classification using the Convolutional Neural Network (CNN) algorithm. Large image data is required to classify images with CNN to obtain satisfactory training results. However, this can be overcome with transfer learning architectural models, even with small image data. With transfer learning, the success rate of a model is likely to be higher. Since there are many transfer learning architecture models, it is necessary to compare each model's performance results to find the best-performing architecture. In this study, we conducted three experiments on different datasets to train models with various transfer learning architectures. We then performed a comprehensive comparative analysis for each experiment. The result is that the DenseNet-121 architecture is the best transfer learning architecture model for various datasets.
References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Proceedings Article
03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

73,978 citations

Book ChapterDOI
06 Sep 2014
TL;DR: A novel visualization technique is introduced that gives insight into the function of intermediate feature layers and the operation of the classifier in large Convolutional Network models, used in a diagnostic role to find model architectures that outperform Krizhevsky et al on the ImageNet classification benchmark.
Abstract: Large Convolutional Network models have recently demonstrated impressive classification performance on the ImageNet benchmark Krizhevsky et al. [18]. However there is no clear understanding of why they perform so well, or how they might be improved. In this paper we explore both issues. We introduce a novel visualization technique that gives insight into the function of intermediate feature layers and the operation of the classifier. Used in a diagnostic role, these visualizations allow us to find model architectures that outperform Krizhevsky et al on the ImageNet classification benchmark. We also perform an ablation study to discover the performance contribution from different model layers. We show our ImageNet model generalizes well to other datasets: when the softmax classifier is retrained, it convincingly beats the current state-of-the-art results on Caltech-101 and Caltech-256 datasets.

12,783 citations


"How You See Me: Understanding Convo..." refers methods in this paper

  • ...Another approach was visualizing using deconvolution, as shown in [6]....

    [...]

Proceedings Article
23 Dec 2013
TL;DR: In this paper, the gradient of the class score with respect to the input image is computed to compute a class saliency map, which can be used for weakly supervised object segmentation using classification ConvNets.
Abstract: This paper addresses the visualisation of image classification models, learnt using deep Convolutional Networks (ConvNets). We consider two visualisation techniques, based on computing the gradient of the class score with respect to the input image. The first one generates an image, which maximises the class score [5], thus visualising the notion of the class, captured by a ConvNet. The second technique computes a class saliency map, specific to a given image and class. We show that such maps can be employed for weakly supervised object segmentation using classification ConvNets. Finally, we establish the connection between the gradient-based ConvNet visualisation methods and deconvolutional networks [13].

4,959 citations

Book
01 Jan 1988
TL;DR: Signal Detection in Discrete Time and Signal Estimation in Continuous Time: Elements of Hypothesis Testing and Elements of Parameter Estimation.
Abstract: Preface I. Introduction II. Elements of Hypothesis Testing III. Signal Detection in Discrete Time IV. Elements of Parameter Estimation V. Elements of Signal Estimation VI. Signal Detection in Continuous Time VII. Signal Estimation in Continuous Time References Index

4,096 citations


"How You See Me: Understanding Convo..." refers methods in this paper

  • ...For example, starting from AlexNet [1] with 8 layers, then came ZFNet [2] and VGGNet [3]....

    [...]