scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Learnability and the Vapnik-Chervonenkis dimension

TL;DR: This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned.
Abstract: Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space En. The methods in this paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufficient conditions are provided for feasible learnability.

Content maybe subject to copyright    Report

Citations
More filters
Book
18 Nov 2016
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

38,208 citations

Book
01 Jan 1995
TL;DR: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition, and is designed as a text, with over 100 exercises, to benefit anyone involved in the fields of neural computation and pattern recognition.
Abstract: From the Publisher: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition. After introducing the basic concepts, the book examines techniques for modelling probability density functions and the properties and merits of the multi-layer perceptron and radial basis function network models. Also covered are various forms of error functions, principal algorithms for error function minimalization, learning and generalization in neural networks, and Bayesian techniques and their applications. Designed as a text, with over 100 exercises, this fully up-to-date work will benefit anyone involved in the fields of neural computation and pattern recognition.

19,056 citations

Proceedings ArticleDOI
08 Feb 1999
TL;DR: Support vector machines for dynamic reconstruction of a chaotic system, Klaus-Robert Muller et al pairwise classification and support vector machines, Ulrich Kressel.
Abstract: Introduction to support vector learning roadmap. Part 1 Theory: three remarks on the support vector method of function estimation, Vladimir Vapnik generalization performance of support vector machines and other pattern classifiers, Peter Bartlett and John Shawe-Taylor Bayesian voting schemes and large margin classifiers, Nello Cristianini and John Shawe-Taylor support vector machines, reproducing kernel Hilbert spaces, and randomized GACV, Grace Wahba geometry and invariance in kernel based methods, Christopher J.C. Burges on the annealed VC entropy for margin classifiers - a statistical mechanics study, Manfred Opper entropy numbers, operators and support vector kernels, Robert C. Williamson et al. Part 2 Implementations: solving the quadratic programming problem arising in support vector classification, Linda Kaufman making large-scale support vector machine learning practical, Thorsten Joachims fast training of support vector machines using sequential minimal optimization, John C. Platt. Part 3 Applications: support vector machines for dynamic reconstruction of a chaotic system, Davide Mattera and Simon Haykin using support vector machines for time series prediction, Klaus-Robert Muller et al pairwise classification and support vector machines, Ulrich Kressel. Part 4 Extensions of the algorithm: reducing the run-time complexity in support vector machines, Edgar E. Osuna and Federico Girosi support vector regression with ANOVA decomposition kernels, Mark O. Stitson et al support vector density estimation, Jason Weston et al combining support vector and mathematical programming methods for classification, Bernhard Scholkopf et al.

5,506 citations

Journal ArticleDOI
Vladimir Vapnik1
TL;DR: How the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms are demonstrated and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems are demonstrated.
Abstract: Statistical learning theory was introduced in the late 1960's. Until the 1990's it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990's new types of learning algorithms (called support vector machines) based on the developed theory were proposed. This made statistical learning theory not only a tool for the theoretical analysis but also a tool for creating practical algorithms for estimating multidimensional functions. This article presents a very general overview of statistical learning theory including both theoretical and algorithmic aspects of the theory. The goal of this overview is to demonstrate how the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems.

5,370 citations

Book
01 Jan 2015
TL;DR: The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way in an advanced undergraduate or beginning graduate course.
Abstract: Machine learning is one of the fastest growing areas of computer science, with far-reaching applications. The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way. The book provides an extensive theoretical account of the fundamental ideas underlying machine learning and the mathematical derivations that transform these principles into practical algorithms. Following a presentation of the basics of the field, the book covers a wide array of central topics that have not been addressed by previous textbooks. These include a discussion of the computational complexity of learning and the concepts of convexity and stability; important algorithmic paradigms including stochastic gradient descent, neural networks, and structured output learning; and emerging theoretical concepts such as the PAC-Bayes approach and compression-based bounds. Designed for an advanced undergraduate or beginning graduate course, the text makes the fundamentals and algorithms of machine learning accessible to students and non-expert readers in statistics, computer science, mathematics, and engineering.

3,857 citations

References
More filters
Proceedings ArticleDOI
01 Jan 1987
TL;DR: The goals are to prove results and develop general techniques that shed light on the boundary between the classes of expressions that are learnable in polynomial time and those that are apparently not, and to employ the distribution-free model of learning.
Abstract: We study the computational feasibility of learning boolean expressions from examples. Our goals are to prove results and develop general techniques that shed light on the boundary between the classes of expressions that are learnable in polynomial time and those that are apparently not. The elucidation of this boundary, for boolean expressions and possibly other knowledge representations, is an example of the potential contribution of complexity theory to artificial intelligence. We employ the distribution-free model of learning introduced in /lo]. A more complete discussion and justification of this model can be found in [4,10,11,12]. [4] includes some discussion that is relevant more particularly to infinite representations, such as geometric ones, rather than the finite case of boolean functions. For other recent related work see [1,2,7,&g]. The results of this paper fall into three categories: closure properties of learnable classes, negative results, and distribution-specific positive results. The closure properties are of two kinds. In section 3 we discuss closure under boolean operations on the members of the learnable classes. The assumption that the classes are learnable from positive or negative ex-

306 citations


"Learnability and the Vapnik-Chervon..." refers background or methods in this paper

  • ...These notions of polynomial learnability, both closely related to the model introduced in [59] and elaborated in [ 36 ] and [52], are discussed in Sections 3.1 and 3.2, respectively....

    [...]

  • ...The above example shows that it is not only useful to parameterize learning algorithms and learnability results by the dimension of the domain, but also by some natural measure of the syntactic complexity of the target concept, in this case the number of intervals used to define it. Both of these considerations are emphasized in [ 36 ] and [52] in the investigation into the learnability of Boolean functions....

    [...]

  • ...Usually the class of target concepts and hypothesis space are the same and the same representation is used, but this is not always so (see, e.g., [ 36 ])....

    [...]

  • ...The functional and oracle models of polynomial learnability are shown to be equivalent in [30], along with another variant of the oracle model in which there are two probability distributions on the domain X, and two oracles, one for positive examples of the target concept and one for negative examples (e.g., [ 36 ] and [52])....

    [...]

  • ...It is also possible to allow the computation time to depend explicitly on the accuracy and confidence parameters t and 6. Since this, and other extensions of the above model, are allowed in the definition of polynomial learnability in [52] and [59], we now introduce a second model of polynomial learnability, which we call the oracle model (see also [3] and [ 36 ])....

    [...]

Journal ArticleDOI
TL;DR: The state of the art of computational geometry is surveyed, a discipline that deals with the complexity of geometric problems within the framework of the analysis of algorithms.
Abstract: We survey the state of the art of computational geometry, a discipline that deals with the complexity of geometric problems within the framework of the analysis of algorithms. This newly emerged area of activities has found numerous applications in various other disciplines, such as computer-aided design, computer graphics, operations research, pattern recognition, robotics, and statistics. Five major problem areas—convex hulls, intersections, searching, proximity, and combinatorial optimizations—are discussed. Seven algorithmic techniques—incremental construction, plane-sweep, locus, divide-and-conquer, geometric transformation, prune-and-search, and dynamization—are each illustrated with an example. A collection of problem transformations to establish lower bounds for geo-metric problems in the algebraic computation/decision model is also included.

271 citations

Proceedings ArticleDOI
01 Dec 1988
TL;DR: For many simple two-layer networks whose nodes compute linear threshold functions of their inputs that training is NP-complete, it is shown that these networks differ fundamentally from the perceptron in a worst-case computational sense.
Abstract: We show for many simple two-layer networks whose nodes compute linear threshold functions of their inputs that training is NP-complete. For any training algorithm for one of these networks there will be some sets of training data on which it performs poorly, either by running for more than an amount of time polynomial in the input length, or by producing sub-optimal weights. Thus, these networks differ fundamentally from the perceptron in a worst-case computational sense.

252 citations

Proceedings ArticleDOI
01 Feb 1989
TL;DR: It is proved that for Boolean formulae, finite automata, and constant depth threshold circuits (simplified neural nets), this problem is computationally as difficult as the quadratic residue problem, inverting the RSA function and factoring Blum integers.

227 citations


"Learnability and the Vapnik-Chervon..." refers background in this paper

  • ...hard to learn” classes include the class of all concepts represented by Boolean formulas of size bounded by a fixed polynomial in y1 [ 35 ]....

    [...]

Journal ArticleDOI
TL;DR: Comparisons and equivalences are given between Valiant's model and the prediction learning models of Haussler, Littlestone, and Warmuth and show that several simplifying assumptions on polynomial learning algorithms can be made without loss of generality.
Abstract: In this paper we consider several variants of Valiant's learnability model that have appeared in the literature. We give conditions under which these models are equivalent in terms of the polynomially learnable concept classes they define. These equivalences allow comparisons of most of the existing theorems in Valiant-style learnability and show that several simplifying assumptions on polynomial learning algorithms can be made without loss of generality. We also give a useful reduction of learning problems to the problem of finding consistent hypotheses, and give comparisons and equivalences between Valiant's model and the prediction learning models of Haussler, Littlestone, and Warmuth ( in “29th Annual IEEE Symposium on Foundations of Computer Science,” 1988).

208 citations