scispace - formally typeset
Search or ask a question
Author

Harish G. Ramaswamy

Other affiliations: Indian Institute of Science
Bio: Harish G. Ramaswamy is an academic researcher from Indian Institute of Technology Madras. The author has contributed to research in topics: Multiclass classification & Supervised learning. The author has an hindex of 13, co-authored 24 publications receiving 509 citations. Previous affiliations of Harish G. Ramaswamy include Indian Institute of Science.

Papers
More filters
Proceedings ArticleDOI
01 Mar 2020
TL;DR: This approach – Ablation-based Class Activation Mapping (Ablation CAM) uses ablation analysis to determine the importance of individual feature map units w.r.t. class to produce a coarse localization map highlighting the important regions in the image for predicting the concept.
Abstract: In response to recent criticism of gradient-based visualization techniques, we propose a new methodology to generate visual explanations for deep Convolutional Neural Networks (CNN) - based models. Our approach – Ablation-based Class Activation Mapping (Ablation CAM) uses ablation analysis to determine the importance (weights) of individual feature map units w.r.t. class. Further, this is used to produce a coarse localization map highlighting the important regions in the image for predicting the concept. Our objective and subjective evaluations show that this gradient-free approach works better than state-of-the-art Grad-CAM technique. Moreover, further experiments are carried out to show that Ablation-CAM is class discriminative as well as can be used to evaluate trust in a model.

178 citations

Posted Content
TL;DR: In this article, a provably correct algorithm for mixture proportion estimation is proposed, which is based on embedding distributions onto an RKHS, and implementing it only requires solving a simple convex quadratic programming problem a few times.
Abstract: Mixture proportion estimation (MPE) is the problem of estimating the weight of a component distribution in a mixture, given samples from the mixture and component. This problem constitutes a key part in many "weakly supervised learning" problems like learning with positive and unlabelled samples, learning with label noise, anomaly detection and crowdsourcing. While there have been several methods proposed to solve this problem, to the best of our knowledge no efficient algorithm with a proven convergence rate towards the true proportion exists for this problem. We fill this gap by constructing a provably correct algorithm for MPE, and derive convergence rates under certain assumptions on the distribution. Our method is based on embedding distributions onto an RKHS, and implementing it only requires solving a simple convex quadratic programming problem a few times. We run our algorithm on several standard classification datasets, and demonstrate that it performs comparably to or better than other algorithms on most datasets.

130 citations

Proceedings Article
19 Jun 2016
TL;DR: This work constructs a provably correct algorithm for MPE, and derive convergence rates under certain assumptions on the distribution based on embedding distributions onto an RKHS, and demonstrates that it performs comparably to or better than other algorithms on most datasets.
Abstract: Mixture proportion estimation (MPE) is the problem of estimating the weight of a component distribution in a mixture, given samples from the mixture and component. This problem constitutes a key part in many "weakly supervised learning" problems like learning with positive and unlabelled samples, learning with label noise, anomaly detection and crowdsourcing. While there have been several methods proposed to solve this problem, to the best of our knowledge no efficient algorithm with a proven convergence rate towards the true proportion exists for this problem. We fill this gap by constructing a provably correct algorithm for MPE, and derive convergence rates under certain assumptions on the distribution. Our method is based on embedding distributions onto an RKHS, and implementing it only requires solving a simple convex quadratic programming problem a few times. We run our algorithm on several standard classification datasets, and demonstrate that it performs comparably to or better than other algorithms on most datasets.

62 citations

Proceedings Article
06 Jul 2015
TL;DR: This paper presents new consistent algorithms for multiclass learning with complex performance measures, defined by arbitrary functions of the confusion matrix, and gives two specific instantiations based on the Frank-Wolfe method for concave performance measures and on the bisection method for ratio-of-linear performance measures.
Abstract: This paper presents new consistent algorithms for multiclass learning with complex performance measures, defined by arbitrary functions of the confusion matrix. This setting includes as a special case all loss-based performance measures, which are simply linear functions of the confusion matrix, but also includes more complex performance measures such as the multiclass G-mean and micro F1 measures. We give a general framework for designing consistent algorithms for such performance measures by viewing the learning problem as an optimization problem over the set of feasible confusion matrices, and give two specific instantiations based on the Frank-Wolfe method for concave performance measures and on the bisection method for ratio-of-linear performance measures. The resulting algorithms are provably consistent and outperform a multiclass version of the state-of-the-art SVMperf method in experiments; for large multiclass problems, the algorithms are also orders of magnitude faster than SVMperf.

51 citations

Journal ArticleDOI
TL;DR: The goal is to design consistent algorithms for such n-class classification problems with a ‘reject option’; while such algorithms are known for the binary (n = 2) case, little has been understood for the general multiclass case.
Abstract: We consider the problem of $n$-class classification ($n\geq2$), where the classifier can choose to abstain from making predictions at a given cost, say, a factor $\alpha$ of the cost of misclassification. Our goal is to design consistent algorithms for such $n$-class classification problems with a ‘reject option’; while such algorithms are known for the binary ($n=2$) case, little has been understood for the general multiclass case. We show that the well known Crammer-Singer surrogate and the one-vs-all hinge loss, albeit with a different predictor than the standard argmax, yield consistent algorithms for this problem when $\alpha=\frac{1}{2}$. More interestingly, we design a new convex surrogate, which we call the binary encoded predictions surrogate, that is also consistent for this problem when $\alpha=\frac{1}{2}$ and operates on a much lower dimensional space ($\log(n)$ as opposed to $n$). We also construct modified versions of all these three surrogates to be consistent for any given $\alpha\in[0,\frac{1}{2}]$.

43 citations


Cited by
More filters
Proceedings ArticleDOI
01 Jul 2017
TL;DR: In this article, a theoretically grounded approach to train deep neural networks, including recurrent networks, subject to class-dependent label noise is presented, and two procedures for loss correction that are agnostic to both application domain and network architecture are proposed.
Abstract: We present a theoretically grounded approach to train deep neural networks, including recurrent networks, subject to class-dependent label noise. We propose two procedures for loss correction that are agnostic to both application domain and network architecture. They simply amount to at most a matrix inversion and multiplication, provided that we know the probability of each class being corrupted into another. We further show how one can estimate these probabilities, adapting a recent technique for noise estimation to the multi-class setting, and thus providing an end-to-end framework. Extensive experiments on MNIST, IMDB, CIFAR-10, CIFAR-100 and a large scale dataset of clothing images employing a diversity of architectures — stacking dense, convolutional, pooling, dropout, batch normalization, word embedding, LSTM and residual layers — demonstrate the noise robustness of our proposals. Incidentally, we also prove that, when ReLU is the only non-linearity, the loss curvature is immune to class-dependent label noise.

1,171 citations

Journal ArticleDOI
TL;DR: A comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher-student architecture, distillation algorithms, performance comparison and applications can be found in this paper.
Abstract: In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver billions of model parameters. However, it is a challenge to deploy these cumbersome deep models on devices with limited resources, e.g., mobile phones and embedded devices, not only because of the high computational complexity but also the large storage requirements. To this end, a variety of model compression and acceleration techniques have been developed. As a representative type of model compression and acceleration, knowledge distillation effectively learns a small student model from a large teacher model. It has received rapid increasing attention from the community. This paper provides a comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher-student architecture, distillation algorithms, performance comparison and applications. Furthermore, challenges in knowledge distillation are briefly reviewed and comments on future research are discussed and forwarded.

1,027 citations

Journal ArticleDOI
TL;DR: This paper provides an introduction to the topic of uncertainty in machine learning as well as an overview of attempts so far at handling uncertainty in general and formalizing this distinction in particular.
Abstract: The notion of uncertainty is of major importance in machine learning and constitutes a key element of machine learning methodology. In line with the statistical tradition, uncertainty has long been perceived as almost synonymous with standard probability and probabilistic predictions. Yet, due to the steadily increasing relevance of machine learning for practical applications and related issues such as safety requirements, new problems and challenges have recently been identified by machine learning scholars, and these problems may call for new methodological developments. In particular, this includes the importance of distinguishing between (at least) two different types of uncertainty, often referred to as aleatoric and epistemic. In this paper, we provide an introduction to the topic of uncertainty in machine learning as well as an overview of attempts so far at handling uncertainty in general and formalizing this distinction in particular.

421 citations

Journal ArticleDOI
TL;DR: The notion of uncertainty is of major importance in machine learning and constitutes a key element of machine learning methodology as mentioned in this paper, and this includes the importance of distinguishing between aleatoric and epistemic uncertainty.
Abstract: The notion of uncertainty is of major importance in machine learning and constitutes a key element of machine learning methodology. In line with the statistical tradition, uncertainty has long been perceived as almost synonymous with standard probability and probabilistic predictions. Yet, due to the steadily increasing relevance of machine learning for practical applications and related issues such as safety requirements, new problems and challenges have recently been identified by machine learning scholars, and these problems may call for new methodological developments. In particular, this includes the importance of distinguishing between (at least) two different types of uncertainty, often referred to as aleatoric and epistemic. In this paper, we provide an introduction to the topic of uncertainty in machine learning as well as an overview of attempts so far at handling uncertainty in general and formalizing this distinction in particular.

321 citations

Journal ArticleDOI
TL;DR: A survey of the current state of the art in PU learning proposes seven key research questions that commonly arise in this field and provides a broad overview of how the field has tried to address them.
Abstract: Learning from positive and unlabeled data or PU learning is the setting where a learner only has access to positive examples and unlabeled data. The assumption is that the unlabeled data can contain both positive and negative examples. This setting has attracted increasing interest within the machine learning literature as this type of data naturally arises in applications such as medical diagnosis and knowledge base completion. This article provides a survey of the current state of the art in PU learning. It proposes seven key research questions that commonly arise in this field and provides a broad overview of how the field has tried to address them.

291 citations