New theoretical frameworks for machine learning

New theoretical frameworks for machine learning

Avrim Blum¹, Maria-Florina Balcan¹•Institutions (1)

01 Jan 2008-

TL;DR: This thesis introduces a new discriminative model (a PAC or Statistical Learning Theory style model) for semi-supervised learning, that can be used to reason about many of the different approaches taken over the past decade in the Machine Learning community.

read less

Abstract: This thesis has two primary thrusts. The first is developing new models and algorithms for important modern and classic learning problems. The second is establishing new connections between Machine Learning and Algorithmic Game Theory. The formulation of the PAC learning model by Valiant [201] and the Statistical Learning Theory framework by Vapnik [203] have been instrumental in the development of machine learning and the design and analysis of algorithms for supervised learning. However, while extremely influential, these models do not capture or explain other important classic learning paradigms such as Clustering, nor do they capture important emerging learning paradigms such as Semi-Supervised Learning and other ways of incorporating unlabeled data in the learning process. In this thesis, we develop the first analog of these general discriminative models to the problems of Semi-Supervised Learning and Clustering, and we analyze both their algorithmic and sample complexity implications. We also provide the first generalization of the well-established theory of learning with kernel functions to case of general pairwise similarity functions and in addition provide new positive theoretical results for Active Learning Finally, this dissertation presents new applications of techniques from Machine Learning to Algorithmic Game Theory, which has been a major area of research at the intersection of Computer Science and Economics. In machine learning, there has been growing interest in using unlabeled data together with labeled data due to the availability of large amounts of unlabeled data in many contemporary applications. As a result, a number of different semi-supervised learning methods such as Co-training, transductive SVM, or graph based methods have been developed. However, the underlying assumptions of these methods are often quite distinct and not captured by standard theoretical models. This thesis introduces a new discriminative model (a PAC or Statistical Learning Theory style model) for semi-supervised learning, that can be used to reason about many of the different approaches taken over the past decade in the Machine Learning community. This model provides a unified framework for analyzing when and why unlabeled data can help in the semi-supervised learning setting, in which one can analyze both sample-complexity and algorithmic issues. In particular, our model allows us to address in a unified way key issues such as "Under what conditions will unlabeled data help and by how much?" and "How much data should I expect to need in order to perform well?". Another important part of this thesis is Active Learning for which we provide several new theoretical results. In particular, this dissertation includes the first active learning algorithm which works in the presence of arbitrary forms of noise, as well as a few margin based active learning algorithms. In the context of Kernel methods (another flourishing area of machine learning research), this thesis shows how Random Projection techniques can be used to convert a given kernel function into an explicit, distribution dependent set of features, which can then be fed into more general (not necessarily kernelizable) learning algorithms In addition, this work shows how such methods can be extended to more general pairwise similarity functions and also gives a formal theory that matches the standard intuition that a good kernel function is one that acts as a good measure of similarity. We thus strictly generalize and simplify the existing theory of kernel methods. Our approach brings a new perspective as well as a much simpler explanation for the effectiveness of kernel methods, which can help in the design of good kernel functions for new learning problems. We also show how we can use this perspective to help thinking about Clustering in a novel way. While the study of clustering is centered around an intuitively compelling goal (and it has been a major tool in manydifferent fields), reasoning about it in a generic and unified way has been difficult, in part due to the lack of a general theoretical framework along the lines we have for supervised classification. In our work we develop the first general discriminative clustering framework for analyzing accuracy without probabilistic assumptions. This dissertation also contributes with new connections between Machine Learning and Mechanism Design. Specifically, this thesis presents the first general framework in which machine learning methods can be used for reducing mechanism design problems to standard algorithmic questions for a wide range of revenue maximization problems in an unlimited supply setting. Our results substantially generalize the previous work based on random sampling mechanisms–both by broadening the applicability of such mechanisms and by simplifying the analysis. From a learning perspective, these settings present several unique challenges: the loss function is discontinuous and asymmetric, and the range of bidders' valuations may be large.

...read moreread less

Content maybe subject to copyright Report

New theoretical frameworks for machine learning

Citations

Cites methods from "New theoretical frameworks for mach..."

References

"New theoretical frameworks for mach..." refers background in this paper

"New theoretical frameworks for mach..." refers background in this paper

"New theoretical frameworks for mach..." refers methods in this paper

"New theoretical frameworks for mach..." refers methods in this paper

Related Papers (5)