scispace - formally typeset
Search or ask a question
Author

Balaji Krishnapuram

Other affiliations: IBM, Duke University
Bio: Balaji Krishnapuram is an academic researcher from Siemens. The author has contributed to research in topics: Support vector machine & Feature selection. The author has an hindex of 28, co-authored 84 publications receiving 3918 citations. Previous affiliations of Balaji Krishnapuram include IBM & Duke University.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper introduces a true multiclass formulation based on multinomial logistic regression and derives fast exact algorithms for learning sparse multiclass classifiers that scale favorably in both the number of training samples and the feature dimensionality, making them applicable even to large data sets in high-dimensional feature spaces.
Abstract: Recently developed methods for learning sparse classifiers are among the state-of-the-art in supervised learning. These methods learn classifiers that incorporate weighted sums of basis functions with sparsity-promoting priors encouraging the weight estimates to be either significantly large or exactly zero. From a learning-theoretic perspective, these methods control the capacity of the learned classifier by minimizing the number of basis functions used, resulting in better generalization. This paper presents three contributions related to learning sparse classifiers. First, we introduce a true multiclass formulation based on multinomial logistic regression. Second, by combining a bound optimization approach with a component-wise update procedure, we derive fast exact algorithms for learning sparse multiclass classifiers that scale favorably in both the number of training samples and the feature dimensionality, making them applicable even to large data sets in high-dimensional feature spaces. To the best of our knowledge, these are the first algorithms to perform exact multinomial logistic regression with a sparsity-promoting prior. Third, we show how nontrivial generalization bounds can be derived for our classifier in the binary case. Experimental results on standard benchmark data sets attest to the accuracy, sparsity, and efficiency of the proposed methods.

919 citations

Journal Article
TL;DR: Experimental results on two real life MTL problems indicate that the proposed algorithms automatically identify subgroups of related tasks whose training data appear to be drawn from similar distributions are more accurate than simpler approaches such as single-task learning, pooling of data across all tasks, and simplified approximations to DP.
Abstract: Consider the problem of learning logistic-regression models for multiple classification tasks, where the training data set for each task is not drawn from the same statistical distribution. In such a multi-task learning (MTL) scenario, it is necessary to identify groups of similar tasks that should be learned jointly. Relying on a Dirichlet process (DP) based statistical model to learn the extent of similarity between classification tasks, we develop computationally efficient algorithms for two different forms of the MTL problem. First, we consider a symmetric multi-task learning (SMTL) situation in which classifiers for multiple tasks are learned jointly using a variational Bayesian (VB) algorithm. Second, we consider an asymmetric multi-task learning (AMTL) formulation in which the posterior density function from the SMTL model parameters (from previous tasks) is used as a prior for a new task: this approach has the significant advantage of not requiring storage and use of all previous data from prior tasks. The AMTL formulation is solved with a simple Markov Chain Monte Carlo (MCMC) construction. Experimental results on two real life MTL problems indicate that the proposed algorithms: (a) automatically identify subgroups of related tasks whose training data appear to be drawn from similar distributions; and (b) are more accurate than simpler approaches such as single-task learning, pooling of data across all tasks, and simplified approximations to DP.

582 citations

Proceedings Article
03 Dec 2007
TL;DR: It is shown that classical survival analysis involving censored data can naturally be cast as a ranking problem, and two bounds on CI are devised, one of which emerges directly from the properties of PH models-and optimize them directly.
Abstract: In this paper, we show that classical survival analysis involving censored data can naturally be cast as a ranking problem. The concordance index (CI), which quantifies the quality of rankings, is the standard performance measure for model assessment in survival analysis. In contrast, the standard approach to learning the popular proportional hazard (PH) model is based on Cox's partial likelihood. We devise two bounds on CI-one of which emerges directly from the properties of PH models-and optimize them directly. Our experimental results suggest that all three methods perform about equally well, with our new approach giving slightly better results. We also explain why a method designed to maximize the Cox's partial likelihood also ends up (approximately) maximizing the CI.

181 citations

Proceedings Article
13 Aug 2016
TL;DR: The 2016 ACM Conference on Knowledge Discovery and Data Mining (KDD'16) as mentioned in this paper has attracted a significant number of submissions from countries all over the world, in particular, the research track attracted 784 submissions and the applied data science track attracted 331 submissions.
Abstract: It is our great pleasure to welcome you to the 2016 ACM Conference on Knowledge Discovery and Data Mining -- KDD'16. We hope that the content and the professional network at KDD'16 will help you succeed professionally by enabling you to: identify technology trends early; make new/creative contributions; increase your productivity by using newer/better tools, processes or ways of organizing teams; identify new job opportunities; and hire new team members. We are living in an exciting time for our profession. On the one hand, we are witnessing the industrialization of data science, and the emergence of the industrial assembly line processes characterized by the division of labor, integrated processes/pipelines of work, standards, automation, and repeatability. Data science practitioners are organizing themselves in more sophisticated ways, embedding themselves in larger teams in many industry verticals, improving their productivity substantially, and achieving a much larger scale of social impact. On the other hand we are also witnessing astonishing progress from research in algorithms and systems -- for example the field of deep neural networks has revolutionized speech recognition, NLP, computer vision, image recognition, etc. By facilitating interaction between practitioners at large companies & startups on the one hand, and the algorithm development researchers including leading academics on the other, KDD'16 fosters technological and entrepreneurial innovation in the area of data science. This year's conference continues its tradition of being the premier forum for presentation of results in the field of data mining, both in the form of cutting edge research, and in the form of insights from the development and deployment of real world applications. Further, the conference continues with its tradition of a strong tutorial and workshop program on leading edge issues of data mining. The mission of this conference has broadened in recent years even as we placed a significant amount of focus on both the research and applied aspects of data mining. As an example of this broadened focus, this year we have introduced a strong hands-on tutorial program nduring the conference in which participants will learn how to use practical tools for data mining. KDD'16 also gives researchers and practitioners a unique opportunity to form professional networks, and to share their perspectives with others interested in the various aspects of data mining. For example, we have introduced office hours for budding entrepreneurs from our community to meet leading Venture Capitalists investing in this area. We hope that KDD 2016 conference will serve as a meeting ground for researchers, practitioners, funding agencies, and investors to help create new algorithms and commercial products. The call for papers attracted a significant number of submissions from countries all over the world. In particular, the research track attracted 784 submissions and the applied data science track attracted 331 submissions. Papers were accepted either as full papers or as posters. The overall acceptance rate either as full papers or posters was less than 20%. For full papers in the research track, the acceptance rate was lower than 10%. This is consistent with the fact that the KDD Conference is a premier conference in data mining and the acceptance rates historically tend to be low. It is noteworthy that the applied data science track received a larger number of submissions compared to previous years. We view this as an encouraging sign that research in data mining is increasingly becoming relevant to industrial applications. All papers were reviewed by at least three program committee members and then discussed by the PC members in a discussion moderated by a meta-reviewer. Borderline papers were thoroughly reviewed by the program chairs before final decisions were made.

179 citations

Proceedings Article
Shipeng Yu1, Balaji Krishnapuram1, Harald Steck1, R. B. Rao1, Romer Rosales1 
03 Dec 2007
TL;DR: In this article, a Bayesian undirected graphical model for co-training is proposed, which makes explicit the previously unstated assumptions of a large class of cotraining type algorithms, and also clarifies the circumstances under which these assumptions fail.
Abstract: We propose a Bayesian undirected graphical model for co-training, or more generally for semi-supervised multi-view learning. This makes explicit the previously unstated assumptions of a large class of co-training type algorithms, and also clarifies the circumstances under which these assumptions fail. Building upon new insights from this model, we propose an improved method for co-training, which is a novel co-training kernel for Gaussian process classifiers. The resulting approach is convex and avoids local-maxima problems, unlike some previous multi-view learning methods. Furthermore, it can automatically estimate how much each view should be trusted, and thus accommodate noisy or unreliable views. Experiments on toy data and real world data sets illustrate the benefits of this approach.

168 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift are discussed.
Abstract: A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a classification task in one domain of interest, but we only have sufficient training data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution. In such cases, knowledge transfer, if done successfully, would greatly improve the performance of learning by avoiding much expensive data-labeling efforts. In recent years, transfer learning has emerged as a new learning framework to address this problem. This survey focuses on categorizing and reviewing the current progress on transfer learning for classification, regression, and clustering problems. In this survey, we discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift. We also explore some potential future issues in transfer learning research.

18,616 citations

Journal ArticleDOI
TL;DR: In comparative timings, the new algorithms are considerably faster than competing methods and can handle large problems and can also deal efficiently with sparse features.
Abstract: We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, two-class logistic regression, and multinomial regression problems while the penalties include l(1) (the lasso), l(2) (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.

13,656 citations

Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Book
24 Aug 2012
TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.
Abstract: Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled model-based approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package--PMTK (probabilistic modeling toolkit)--that is freely available online. The book is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

8,059 citations