scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Learnability and the Vapnik-Chervonenkis dimension

TL;DR: This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned.
Abstract: Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space En. The methods in this paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufficient conditions are provided for feasible learnability.

Content maybe subject to copyright    Report

Citations
More filters
Book
18 Nov 2016
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

38,208 citations

Book
01 Jan 1995
TL;DR: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition, and is designed as a text, with over 100 exercises, to benefit anyone involved in the fields of neural computation and pattern recognition.
Abstract: From the Publisher: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition. After introducing the basic concepts, the book examines techniques for modelling probability density functions and the properties and merits of the multi-layer perceptron and radial basis function network models. Also covered are various forms of error functions, principal algorithms for error function minimalization, learning and generalization in neural networks, and Bayesian techniques and their applications. Designed as a text, with over 100 exercises, this fully up-to-date work will benefit anyone involved in the fields of neural computation and pattern recognition.

19,056 citations

Proceedings ArticleDOI
08 Feb 1999
TL;DR: Support vector machines for dynamic reconstruction of a chaotic system, Klaus-Robert Muller et al pairwise classification and support vector machines, Ulrich Kressel.
Abstract: Introduction to support vector learning roadmap. Part 1 Theory: three remarks on the support vector method of function estimation, Vladimir Vapnik generalization performance of support vector machines and other pattern classifiers, Peter Bartlett and John Shawe-Taylor Bayesian voting schemes and large margin classifiers, Nello Cristianini and John Shawe-Taylor support vector machines, reproducing kernel Hilbert spaces, and randomized GACV, Grace Wahba geometry and invariance in kernel based methods, Christopher J.C. Burges on the annealed VC entropy for margin classifiers - a statistical mechanics study, Manfred Opper entropy numbers, operators and support vector kernels, Robert C. Williamson et al. Part 2 Implementations: solving the quadratic programming problem arising in support vector classification, Linda Kaufman making large-scale support vector machine learning practical, Thorsten Joachims fast training of support vector machines using sequential minimal optimization, John C. Platt. Part 3 Applications: support vector machines for dynamic reconstruction of a chaotic system, Davide Mattera and Simon Haykin using support vector machines for time series prediction, Klaus-Robert Muller et al pairwise classification and support vector machines, Ulrich Kressel. Part 4 Extensions of the algorithm: reducing the run-time complexity in support vector machines, Edgar E. Osuna and Federico Girosi support vector regression with ANOVA decomposition kernels, Mark O. Stitson et al support vector density estimation, Jason Weston et al combining support vector and mathematical programming methods for classification, Bernhard Scholkopf et al.

5,506 citations

Journal ArticleDOI
Vladimir Vapnik1
TL;DR: How the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms are demonstrated and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems are demonstrated.
Abstract: Statistical learning theory was introduced in the late 1960's. Until the 1990's it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990's new types of learning algorithms (called support vector machines) based on the developed theory were proposed. This made statistical learning theory not only a tool for the theoretical analysis but also a tool for creating practical algorithms for estimating multidimensional functions. This article presents a very general overview of statistical learning theory including both theoretical and algorithmic aspects of the theory. The goal of this overview is to demonstrate how the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems.

5,370 citations

Book
01 Jan 2015
TL;DR: The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way in an advanced undergraduate or beginning graduate course.
Abstract: Machine learning is one of the fastest growing areas of computer science, with far-reaching applications. The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way. The book provides an extensive theoretical account of the fundamental ideas underlying machine learning and the mathematical derivations that transform these principles into practical algorithms. Following a presentation of the basics of the field, the book covers a wide array of central topics that have not been addressed by previous textbooks. These include a discussion of the computational complexity of learning and the concepts of convexity and stability; important algorithmic paradigms including stochastic gradient descent, neural networks, and structured output learning; and emerging theoretical concepts such as the PAC-Bayes approach and compression-based bounds. Designed for an advanced undergraduate or beginning graduate course, the text makes the fundamentals and algorithms of machine learning accessible to students and non-expert readers in statistics, computer science, mathematics, and engineering.

3,857 citations

References
More filters
Book
01 Jan 1974
TL;DR: This text introduces the basic data structures and programming techniques often used in efficient algorithms, and covers use of lists, push-down stacks, queues, trees, and graphs.
Abstract: From the Publisher: With this text, you gain an understanding of the fundamental concepts of algorithms, the very heart of computer science. It introduces the basic data structures and programming techniques often used in efficient algorithms. Covers use of lists, push-down stacks, queues, trees, and graphs. Later chapters go into sorting, searching and graphing algorithms, the string-matching algorithms, and the Schonhage-Strassen integer-multiplication algorithm. Provides numerous graded exercises at the end of each chapter. 0201000296B04062001

9,262 citations

Proceedings ArticleDOI
05 Nov 1984
TL;DR: This paper regards learning as the phenomenon of knowledge acquisition in the absence of explicit programming, and gives a precise methodology for studying this phenomenon from a computational viewpoint.
Abstract: Humans appear to be able to learn new concepts without needing to be programmed explicitly in any conventional sense. In this paper we regard learning as the phenomenon of knowledge acquisition in the absence of explicit programming. We give a precise methodology for studying this phenomenon from a computational viewpoint. It consists of choosing an appropriate information gathering mechanism, the learning protocol, and exploring the class of concepts that can be learnt using it in a reasonable (polynomial) number of steps. We find that inherent algorithmic complexity appears to set serious limits to the range of concepts that can be so learnt. The methodology and results suggest concrete principles for designing realistic learning systems.

5,311 citations

Journal ArticleDOI
Narendra Karmarkar1
TL;DR: It is proved that given a polytopeP and a strictly interior point a εP, there is a projective transformation of the space that mapsP, a toP′, a′ having the following property: the ratio of the radius of the smallest sphere with center a′, containingP′ to theradius of the largest sphere withCenter a′ contained inP′ isO(n).
Abstract: We present a new polynomial-time algorithm for linear programming. In the worst case, the algorithm requiresO(n 3.5 L) arithmetic operations onO(L) bit numbers, wheren is the number of variables andL is the number of bits in the input. The running-time of this algorithm is better than the ellipsoid algorithm by a factor ofO(n 2.5). We prove that given a polytopeP and a strictly interior point a eP, there is a projective transformation of the space that mapsP, a toP′, a′ having the following property. The ratio of the radius of the smallest sphere with center a′, containingP′ to the radius of the largest sphere with center a′ contained inP′ isO(n). The algorithm consists of repeated application of such projective transformations each followed by optimization over an inscribed sphere to create a sequence of points which converges to the optimal solution in polynomial time.

4,806 citations

Book ChapterDOI
TL;DR: This chapter reproduces the English translation by B. Seckler of the paper by Vapnik and Chervonenkis in which they gave proofs for the innovative results they had obtained in a draft form in July 1966 and announced in 1968 in their note in Soviet Mathematics Doklady.
Abstract: This chapter reproduces the English translation by B. Seckler of the paper by Vapnik and Chervonenkis in which they gave proofs for the innovative results they had obtained in a draft form in July 1966 and announced in 1968 in their note in Soviet Mathematics Doklady. The paper was first published in Russian as Вапник В. Н. and Червоненкис А. Я. О равномерноЙ сходимости частот появления событиЙ к их вероятностям. Теория вероятностеЙ и ее применения 16(2), 264–279 (1971).

3,939 citations


"Learnability and the Vapnik-Chervon..." refers background or methods or result in this paper

  • ...introduced in [29]* to arbitrary probability distributions on E”. For a fixed distribution, an E-transversal for R is a finite set of points N G E” such that every region in R of probability at least E contains at least one point in N. Se:ction A2 uses the notion of an c-transversal to provide the primary machinery for Theorem 2.1, following [29] and [ 62 ]....

    [...]

  • ...A2. Proof of Theorem 2.1 (ii)(a) For the next two lemmas let R G 2x be a fixed nonempty class of sets and P be a distribution on X such that QT and Jz” are measurable for all m 2 1 and t > 0. The proofs of these lemmas are analogous to those of Lemma and Theorem 2 of [ 62 ]....

    [...]

  • ...To show that these classes are also uniformly learnable, we use a concept first introduced in [ 62 ]....

    [...]

  • ...We note only that it employs techniques similar to those used in [ 62 ], which form the basis of Lemmas A2.1 and A2.2 given above....

    [...]

  • ...Since the above bound is O( l/c(log( l/6) + n log( l/E))), for small t it is significantly better than the O( l/~‘(log( l/F) + nlog(n/t))) bound given in [51] (eq. 29) derived directly from [ 151 and [ 62 ]....

    [...]

Book
01 Jan 1984
TL;DR: In this paper, the authors define a functional on Stochastic Processes as random functions and propose a uniform convergence of empirical measures in Euclidean spaces, based on the notion of convergence in distribution.
Abstract: I Functional on Stochastic Processes.- 1. Stochastic Processes as Random Functions.- Notes.- Problems.- II Uniform Convergence of Empirical Measures.- 1. Uniformity and Consistency.- 2. Direct Approximation.- 3. The Combinatorial Method.- 4. Classes of Sets with Polynomial Discrimination.- 5. Classes of Functions.- 6. Rates of Convergence.- Notes.- Problems.- III Convergence in Distribution in Euclidean Spaces.- 1. The Definition.- 2. The Continuous Mapping Theorem.- 3. Expectations of Smooth Functions.- 4. The Central Limit Theorem.- 5. Characteristic Functions.- 6. Quantile Transformations and Almost Sure Representations.- Notes.- Problems.- IV Convergence in Distribution in Metric Spaces.- 1. Measurability.- 2. The Continuous Mapping Theorem.- 3. Representation by Almost Surely Convergent Sequences.- 4. Coupling.- 5. Weakly Convergent Subsequences.- Notes.- Problems.- V The Uniform Metric on Spaces of Cadlag Functions.- 1. Approximation of Stochastic Processes.- 2. Empirical Processes.- 3. Existence of Brownian Bridge and Brownian Motion.- 4. Processes with Independent Increments.- 5. Infinite Time Scales.- 6. Functional of Brownian Motion and Brownian Bridge.- Notes.- Problems.- VI The Skorohod Metric on D(0, ?).- 1. Properties of the Metric.- 2. Convergence in Distribution.- Notes.- Problems.- VII Central Limit Theorems.- 1. Stochastic Equicontinuity.- 2. Chaining.- 3. Gaussian Processes.- 4. Random Covering Numbers.- 5. Empirical Central Limit Theorems.- 6. Restricted Chaining.- Notes.- Problems.- VIII Martingales.- 1. A Central Limit Theorem for Martingale-Difference Arrays.- 2. Continuous Time Martingales.- 3. Estimation from Censored Data.- Notes.- Problems.- Appendix A Stochastic-Order Symbols.- Appendix B Exponential Inequalities.- Notes.- Problems.- Appendix C Measurability.- Notes.- Problems.- References.- Author Index.

2,641 citations


"Learnability and the Vapnik-Chervon..." refers background in this paper

  • ...can easily be shown to be universally separable (see exercises 4, 5 and 7 in chapter II of [ 54 ])....

    [...]

  • ...This problem is addressed in [ 171, [ 181, [23], [ 54 ], and [61] from a purely statistical point of view....

    [...]

  • ...countably S-coverable in [7a]; see the appendix of [ 54 ] for a discussion of other approaches)....

    [...]