Learnability and the Vapnik-Chervonenkis dimension

doi:10.1145/76359.76371

Home
/
Papers
/
Learnability and the Vapnik-Chervonenkis dimension

Journal Article•DOI•

Learnability and the Vapnik-Chervonenkis dimension

Anselm Blumer¹, Andrzej Ehrenfeucht², David Haussler³, Manfred K. Warmuth³•Institutions (3)

Tufts University¹, University of Colorado Boulder², University of California, Santa Cruz³

01 Oct 1989-Journal of the ACM (ACM)-Vol. 36, Iss: 4, pp 929-965

TL;DR: This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned.

read less

Abstract: Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space En. The methods in this paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufficient conditions are provided for feasible learnability.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Book•

Deep Learning

[...]

Ian Goodfellow¹, Yoshua Bengio², Aaron Courville²•Institutions (2)

Google¹, Université de Montréal²

18 Nov 2016

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

...read moreread less

Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

...read moreread less

38,208 citations

Book•

Neural networks for pattern recognition

[...]

Christopher M. Bishop¹•Institutions (1)

Aston University¹

01 Jan 1995

TL;DR: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition, and is designed as a text, with over 100 exercises, to benefit anyone involved in the fields of neural computation and pattern recognition.

...read moreread less

Abstract: From the Publisher: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition. After introducing the basic concepts, the book examines techniques for modelling probability density functions and the properties and merits of the multi-layer perceptron and radial basis function network models. Also covered are various forms of error functions, principal algorithms for error function minimalization, learning and generalization in neural networks, and Bayesian techniques and their applications. Designed as a text, with over 100 exercises, this fully up-to-date work will benefit anyone involved in the fields of neural computation and pattern recognition.

...read moreread less

19,056 citations

Proceedings Article•DOI•

Advances in kernel methods: support vector learning

[...]

Bernhard Schölkopf¹, Christopher John Burges, Alexander J. Smola•Institutions (1)

Max Planck Society¹

08 Feb 1999

TL;DR: Support vector machines for dynamic reconstruction of a chaotic system, Klaus-Robert Muller et al pairwise classification and support vector machines, Ulrich Kressel.

...read moreread less

Abstract: Introduction to support vector learning roadmap. Part 1 Theory: three remarks on the support vector method of function estimation, Vladimir Vapnik generalization performance of support vector machines and other pattern classifiers, Peter Bartlett and John Shawe-Taylor Bayesian voting schemes and large margin classifiers, Nello Cristianini and John Shawe-Taylor support vector machines, reproducing kernel Hilbert spaces, and randomized GACV, Grace Wahba geometry and invariance in kernel based methods, Christopher J.C. Burges on the annealed VC entropy for margin classifiers - a statistical mechanics study, Manfred Opper entropy numbers, operators and support vector kernels, Robert C. Williamson et al. Part 2 Implementations: solving the quadratic programming problem arising in support vector classification, Linda Kaufman making large-scale support vector machine learning practical, Thorsten Joachims fast training of support vector machines using sequential minimal optimization, John C. Platt. Part 3 Applications: support vector machines for dynamic reconstruction of a chaotic system, Davide Mattera and Simon Haykin using support vector machines for time series prediction, Klaus-Robert Muller et al pairwise classification and support vector machines, Ulrich Kressel. Part 4 Extensions of the algorithm: reducing the run-time complexity in support vector machines, Edgar E. Osuna and Federico Girosi support vector regression with ANOVA decomposition kernels, Mark O. Stitson et al support vector density estimation, Jason Weston et al combining support vector and mathematical programming methods for classification, Bernhard Scholkopf et al.

...read moreread less

5,506 citations

Journal Article•DOI•

An overview of statistical learning theory

[...]

Vladimir Vapnik¹•Institutions (1)

AT&T Labs¹

01 Sep 1999-IEEE Transactions on Neural Networks

TL;DR: How the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms are demonstrated and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems are demonstrated.

...read moreread less

Abstract: Statistical learning theory was introduced in the late 1960's. Until the 1990's it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990's new types of learning algorithms (called support vector machines) based on the developed theory were proposed. This made statistical learning theory not only a tool for the theoretical analysis but also a tool for creating practical algorithms for estimating multidimensional functions. This article presents a very general overview of statistical learning theory including both theoretical and algorithmic aspects of the theory. The goal of this overview is to demonstrate how the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems.

...read moreread less

5,370 citations

Book•

Understanding Machine Learning: From Theory To Algorithms

[...]

Shai Shalev-Shwartz¹, Shai Ben-David²•Institutions (2)

Hebrew University of Jerusalem¹, University of Waterloo²

01 Jan 2015

TL;DR: The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way in an advanced undergraduate or beginning graduate course.

...read moreread less

Abstract: Machine learning is one of the fastest growing areas of computer science, with far-reaching applications. The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way. The book provides an extensive theoretical account of the fundamental ideas underlying machine learning and the mathematical derivations that transform these principles into practical algorithms. Following a presentation of the basics of the field, the book covers a wide array of central topics that have not been addressed by previous textbooks. These include a discussion of the computational complexity of learning and the concepts of convexity and stability; important algorithmic paradigms including stochastic gradient descent, neural networks, and structured output learning; and emerging theoretical concepts such as the PAC-Bayes approach and compression-based bounds. Designed for an advanced undergraduate or beginning graduate course, the text makes the fundamentals and algorithms of machine learning accessible to students and non-expert readers in statistics, computer science, mathematics, and engineering.

...read moreread less

3,857 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book•

The Design and Analysis of Computer Algorithms

[...]

Alfred V. Aho, John E. Hopcroft

01 Jan 1974

TL;DR: This text introduces the basic data structures and programming techniques often used in efficient algorithms, and covers use of lists, push-down stacks, queues, trees, and graphs.

...read moreread less

Abstract: From the Publisher: With this text, you gain an understanding of the fundamental concepts of algorithms, the very heart of computer science. It introduces the basic data structures and programming techniques often used in efficient algorithms. Covers use of lists, push-down stacks, queues, trees, and graphs. Later chapters go into sorting, searching and graphing algorithms, the string-matching algorithms, and the Schonhage-Strassen integer-multiplication algorithm. Provides numerous graded exercises at the end of each chapter. 0201000296B04062001

...read moreread less

9,262 citations

Proceedings Article•DOI•

A theory of the learnable

[...]

Leslie G. Valiant¹•Institutions (1)

Harvard University¹

05 Nov 1984

TL;DR: This paper regards learning as the phenomenon of knowledge acquisition in the absence of explicit programming, and gives a precise methodology for studying this phenomenon from a computational viewpoint.

...read moreread less

Abstract: Humans appear to be able to learn new concepts without needing to be programmed explicitly in any conventional sense. In this paper we regard learning as the phenomenon of knowledge acquisition in the absence of explicit programming. We give a precise methodology for studying this phenomenon from a computational viewpoint. It consists of choosing an appropriate information gathering mechanism, the learning protocol, and exploring the class of concepts that can be learnt using it in a reasonable (polynomial) number of steps. We find that inherent algorithmic complexity appears to set serious limits to the range of concepts that can be so learnt. The methodology and results suggest concrete principles for designing realistic learning systems.

...read moreread less

5,311 citations

Journal Article•DOI•

A new polynomial-time algorithm for linear programming

[...]

Narendra Karmarkar¹•Institutions (1)

Bell Labs¹

01 Dec 1984-Combinatorica

TL;DR: It is proved that given a polytopeP and a strictly interior point a εP, there is a projective transformation of the space that mapsP, a toP′, a′ having the following property: the ratio of the radius of the smallest sphere with center a′, containingP′ to theradius of the largest sphere withCenter a′ contained inP′ isO(n).

...read moreread less

Abstract: We present a new polynomial-time algorithm for linear programming. In the worst case, the algorithm requiresO(n 3.5 L) arithmetic operations onO(L) bit numbers, wheren is the number of variables andL is the number of bits in the input. The running-time of this algorithm is better than the ellipsoid algorithm by a factor ofO(n 2.5). We prove that given a polytopeP and a strictly interior point a eP, there is a projective transformation of the space that mapsP, a toP′, a′ having the following property. The ratio of the radius of the smallest sphere with center a′, containingP′ to the radius of the largest sphere with center a′ contained inP′ isO(n). The algorithm consists of repeated application of such projective transformations each followed by optimization over an inscribed sphere to create a sequence of points which converges to the optimal solution in polynomial time.

...read moreread less

4,806 citations

Book Chapter•DOI•

On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities

[...]

Vladimir Vapnik, A. Ya. Chervonenkis

01 Jan 1971-Theory of Probability and Its Applications

TL;DR: This chapter reproduces the English translation by B. Seckler of the paper by Vapnik and Chervonenkis in which they gave proofs for the innovative results they had obtained in a draft form in July 1966 and announced in 1968 in their note in Soviet Mathematics Doklady.

...read moreread less

Abstract: This chapter reproduces the English translation by B. Seckler of the paper by Vapnik and Chervonenkis in which they gave proofs for the innovative results they had obtained in a draft form in July 1966 and announced in 1968 in their note in Soviet Mathematics Doklady. The paper was first published in Russian as Вапник В. Н. and Червоненкис А. Я. О равномерноЙ сходимости частот появления событиЙ к их вероятностям. Теория вероятностеЙ и ее применения 16(2), 264–279 (1971).

...read moreread less

3,939 citations

"Learnability and the Vapnik-Chervon..." refers background or methods or result in this paper

...introduced in [29]* to arbitrary probability distributions on E”. For a fixed distribution, an E-transversal for R is a finite set of points N G E” such that every region in R of probability at least E contains at least one point in N. Se:ction A2 uses the notion of an c-transversal to provide the primary machinery for Theorem 2.1, following [29] and [ 62 ]....
[...]
...A2. Proof of Theorem 2.1 (ii)(a) For the next two lemmas let R G 2x be a fixed nonempty class of sets and P be a distribution on X such that QT and Jz” are measurable for all m 2 1 and t > 0. The proofs of these lemmas are analogous to those of Lemma and Theorem 2 of [ 62 ]....
[...]
...To show that these classes are also uniformly learnable, we use a concept first introduced in [ 62 ]....
[...]
...We note only that it employs techniques similar to those used in [ 62 ], which form the basis of Lemmas A2.1 and A2.2 given above....
[...]
...Since the above bound is O( l/c(log( l/6) + n log( l/E))), for small t it is significantly better than the O( l/~‘(log( l/F) + nlog(n/t))) bound given in [51] (eq. 29) derived directly from [ 151 and [ 62 ]....
[...]

Book•

Convergence of stochastic processes

[...]

David Pollard

01 Jan 1984

TL;DR: In this paper, the authors define a functional on Stochastic Processes as random functions and propose a uniform convergence of empirical measures in Euclidean spaces, based on the notion of convergence in distribution.

...read moreread less

Abstract: I Functional on Stochastic Processes.- 1. Stochastic Processes as Random Functions.- Notes.- Problems.- II Uniform Convergence of Empirical Measures.- 1. Uniformity and Consistency.- 2. Direct Approximation.- 3. The Combinatorial Method.- 4. Classes of Sets with Polynomial Discrimination.- 5. Classes of Functions.- 6. Rates of Convergence.- Notes.- Problems.- III Convergence in Distribution in Euclidean Spaces.- 1. The Definition.- 2. The Continuous Mapping Theorem.- 3. Expectations of Smooth Functions.- 4. The Central Limit Theorem.- 5. Characteristic Functions.- 6. Quantile Transformations and Almost Sure Representations.- Notes.- Problems.- IV Convergence in Distribution in Metric Spaces.- 1. Measurability.- 2. The Continuous Mapping Theorem.- 3. Representation by Almost Surely Convergent Sequences.- 4. Coupling.- 5. Weakly Convergent Subsequences.- Notes.- Problems.- V The Uniform Metric on Spaces of Cadlag Functions.- 1. Approximation of Stochastic Processes.- 2. Empirical Processes.- 3. Existence of Brownian Bridge and Brownian Motion.- 4. Processes with Independent Increments.- 5. Infinite Time Scales.- 6. Functional of Brownian Motion and Brownian Bridge.- Notes.- Problems.- VI The Skorohod Metric on D(0, ?).- 1. Properties of the Metric.- 2. Convergence in Distribution.- Notes.- Problems.- VII Central Limit Theorems.- 1. Stochastic Equicontinuity.- 2. Chaining.- 3. Gaussian Processes.- 4. Random Covering Numbers.- 5. Empirical Central Limit Theorems.- 6. Restricted Chaining.- Notes.- Problems.- VIII Martingales.- 1. A Central Limit Theorem for Martingale-Difference Arrays.- 2. Continuous Time Martingales.- 3. Estimation from Censored Data.- Notes.- Problems.- Appendix A Stochastic-Order Symbols.- Appendix B Exponential Inequalities.- Notes.- Problems.- Appendix C Measurability.- Notes.- Problems.- References.- Author Index.

...read moreread less

2,641 citations

"Learnability and the Vapnik-Chervon..." refers background in this paper

...can easily be shown to be universally separable (see exercises 4, 5 and 7 in chapter II of [ 54 ])....
[...]
...This problem is addressed in [ 171, [ 181, [23], [ 54 ], and [61] from a purely statistical point of view....
[...]
...countably S-coverable in [7a]; see the appendix of [ 54 ] for a discussion of other approaches)....
[...]