Learnability and the Vapnik-Chervonenkis dimension

doi:10.1145/76359.76371

Home
/
Papers
/
Learnability and the Vapnik-Chervonenkis dimension

Journal Article•DOI•

Learnability and the Vapnik-Chervonenkis dimension

Anselm Blumer¹, Andrzej Ehrenfeucht², David Haussler³, Manfred K. Warmuth³•Institutions (3)

Tufts University¹, University of Colorado Boulder², University of California, Santa Cruz³

01 Oct 1989-Journal of the ACM (ACM)-Vol. 36, Iss: 4, pp 929-965

TL;DR: This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned.

read less

Abstract: Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space En. The methods in this paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufficient conditions are provided for feasible learnability.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Book•

Deep Learning

[...]

Ian Goodfellow¹, Yoshua Bengio², Aaron Courville²•Institutions (2)

Google¹, Université de Montréal²

18 Nov 2016

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

...read moreread less

Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

...read moreread less

38,208 citations

Book•

Neural networks for pattern recognition

[...]

Christopher M. Bishop¹•Institutions (1)

Aston University¹

01 Jan 1995

TL;DR: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition, and is designed as a text, with over 100 exercises, to benefit anyone involved in the fields of neural computation and pattern recognition.

...read moreread less

Abstract: From the Publisher: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition. After introducing the basic concepts, the book examines techniques for modelling probability density functions and the properties and merits of the multi-layer perceptron and radial basis function network models. Also covered are various forms of error functions, principal algorithms for error function minimalization, learning and generalization in neural networks, and Bayesian techniques and their applications. Designed as a text, with over 100 exercises, this fully up-to-date work will benefit anyone involved in the fields of neural computation and pattern recognition.

...read moreread less

19,056 citations

Proceedings Article•DOI•

Advances in kernel methods: support vector learning

[...]

Bernhard Schölkopf¹, Christopher John Burges, Alexander J. Smola•Institutions (1)

Max Planck Society¹

08 Feb 1999

TL;DR: Support vector machines for dynamic reconstruction of a chaotic system, Klaus-Robert Muller et al pairwise classification and support vector machines, Ulrich Kressel.

...read moreread less

Abstract: Introduction to support vector learning roadmap. Part 1 Theory: three remarks on the support vector method of function estimation, Vladimir Vapnik generalization performance of support vector machines and other pattern classifiers, Peter Bartlett and John Shawe-Taylor Bayesian voting schemes and large margin classifiers, Nello Cristianini and John Shawe-Taylor support vector machines, reproducing kernel Hilbert spaces, and randomized GACV, Grace Wahba geometry and invariance in kernel based methods, Christopher J.C. Burges on the annealed VC entropy for margin classifiers - a statistical mechanics study, Manfred Opper entropy numbers, operators and support vector kernels, Robert C. Williamson et al. Part 2 Implementations: solving the quadratic programming problem arising in support vector classification, Linda Kaufman making large-scale support vector machine learning practical, Thorsten Joachims fast training of support vector machines using sequential minimal optimization, John C. Platt. Part 3 Applications: support vector machines for dynamic reconstruction of a chaotic system, Davide Mattera and Simon Haykin using support vector machines for time series prediction, Klaus-Robert Muller et al pairwise classification and support vector machines, Ulrich Kressel. Part 4 Extensions of the algorithm: reducing the run-time complexity in support vector machines, Edgar E. Osuna and Federico Girosi support vector regression with ANOVA decomposition kernels, Mark O. Stitson et al support vector density estimation, Jason Weston et al combining support vector and mathematical programming methods for classification, Bernhard Scholkopf et al.

...read moreread less

5,506 citations

Journal Article•DOI•

An overview of statistical learning theory

[...]

Vladimir Vapnik¹•Institutions (1)

AT&T Labs¹

01 Sep 1999-IEEE Transactions on Neural Networks

TL;DR: How the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms are demonstrated and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems are demonstrated.

...read moreread less

Abstract: Statistical learning theory was introduced in the late 1960's. Until the 1990's it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990's new types of learning algorithms (called support vector machines) based on the developed theory were proposed. This made statistical learning theory not only a tool for the theoretical analysis but also a tool for creating practical algorithms for estimating multidimensional functions. This article presents a very general overview of statistical learning theory including both theoretical and algorithmic aspects of the theory. The goal of this overview is to demonstrate how the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems.

...read moreread less

5,370 citations

Book•

Understanding Machine Learning: From Theory To Algorithms

[...]

Shai Shalev-Shwartz¹, Shai Ben-David²•Institutions (2)

Hebrew University of Jerusalem¹, University of Waterloo²

01 Jan 2015

TL;DR: The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way in an advanced undergraduate or beginning graduate course.

...read moreread less

Abstract: Machine learning is one of the fastest growing areas of computer science, with far-reaching applications. The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way. The book provides an extensive theoretical account of the fundamental ideas underlying machine learning and the mathematical derivations that transform these principles into practical algorithms. Following a presentation of the basics of the field, the book covers a wide array of central topics that have not been addressed by previous textbooks. These include a discussion of the computational complexity of learning and the concepts of convexity and stability; important algorithmic paradigms including stochastic gradient descent, neural networks, and structured output learning; and emerging theoretical concepts such as the PAC-Bayes approach and compression-based bounds. Designed for an advanced undergraduate or beginning graduate course, the text makes the fundamentals and algorithms of machine learning accessible to students and non-expert readers in statistics, computer science, mathematics, and engineering.

...read moreread less

3,857 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

On the learnability of Boolean formulae

[...]

Michael Kearns¹, Ming Li¹, Leonard Pitt², Leslie G. Valiant¹•Institutions (2)

Harvard University¹, University of Illinois at Urbana–Champaign²

01 Jan 1987

TL;DR: The goals are to prove results and develop general techniques that shed light on the boundary between the classes of expressions that are learnable in polynomial time and those that are apparently not, and to employ the distribution-free model of learning.

...read moreread less

Abstract: We study the computational feasibility of learning boolean expressions from examples. Our goals are to prove results and develop general techniques that shed light on the boundary between the classes of expressions that are learnable in polynomial time and those that are apparently not. The elucidation of this boundary, for boolean expressions and possibly other knowledge representations, is an example of the potential contribution of complexity theory to artificial intelligence. We employ the distribution-free model of learning introduced in /lo]. A more complete discussion and justification of this model can be found in [4,10,11,12]. [4] includes some discussion that is relevant more particularly to infinite representations, such as geometric ones, rather than the finite case of boolean functions. For other recent related work see [1,2,7,&g]. The results of this paper fall into three categories: closure properties of learnable classes, negative results, and distribution-specific positive results. The closure properties are of two kinds. In section 3 we discuss closure under boolean operations on the members of the learnable classes. The assumption that the classes are learnable from positive or negative ex-

...read moreread less

306 citations

"Learnability and the Vapnik-Chervon..." refers background or methods in this paper

...These notions of polynomial learnability, both closely related to the model introduced in [59] and elaborated in [ 36 ] and [52], are discussed in Sections 3.1 and 3.2, respectively....
[...]
...The above example shows that it is not only useful to parameterize learning algorithms and learnability results by the dimension of the domain, but also by some natural measure of the syntactic complexity of the target concept, in this case the number of intervals used to define it. Both of these considerations are emphasized in [ 36 ] and [52] in the investigation into the learnability of Boolean functions....
[...]
...Usually the class of target concepts and hypothesis space are the same and the same representation is used, but this is not always so (see, e.g., [ 36 ])....
[...]
...The functional and oracle models of polynomial learnability are shown to be equivalent in [30], along with another variant of the oracle model in which there are two probability distributions on the domain X, and two oracles, one for positive examples of the target concept and one for negative examples (e.g., [ 36 ] and [52])....
[...]
...It is also possible to allow the computation time to depend explicitly on the accuracy and confidence parameters t and 6. Since this, and other extensions of the above model, are allowed in the definition of polynomial learnability in [52] and [59], we now introduce a second model of polynomial learnability, which we call the oracle model (see also [3] and [ 36 ])....
[...]

Journal Article•DOI•

Computational Geometry—A Survey

[...]

Lee¹, Preparata•Institutions (1)

Northwestern University¹

01 Dec 1984-IEEE Transactions on Computers

TL;DR: The state of the art of computational geometry is surveyed, a discipline that deals with the complexity of geometric problems within the framework of the analysis of algorithms.

...read moreread less

Abstract: We survey the state of the art of computational geometry, a discipline that deals with the complexity of geometric problems within the framework of the analysis of algorithms. This newly emerged area of activities has found numerous applications in various other disciplines, such as computer-aided design, computer graphics, operations research, pattern recognition, robotics, and statistics. Five major problem areas—convex hulls, intersections, searching, proximity, and combinatorial optimizations—are discussed. Seven algorithmic techniques—incremental construction, plane-sweep, locus, divide-and-conquer, geometric transformation, prune-and-search, and dynamization—are each illustrated with an example. A collection of problem transformations to establish lower bounds for geo-metric problems in the algebraic computation/decision model is also included.

...read moreread less

271 citations

Proceedings Article•DOI•

Training a 3-node neural network is NP-complete

[...]

Avrim Blum¹, Ronald L. Rivest¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Dec 1988

TL;DR: For many simple two-layer networks whose nodes compute linear threshold functions of their inputs that training is NP-complete, it is shown that these networks differ fundamentally from the perceptron in a worst-case computational sense.

...read moreread less

Abstract: We show for many simple two-layer networks whose nodes compute linear threshold functions of their inputs that training is NP-complete. For any training algorithm for one of these networks there will be some sets of training data on which it performs poorly, either by running for more than an amount of time polynomial in the input length, or by producing sub-optimal weights. Thus, these networks differ fundamentally from the perceptron in a worst-case computational sense.

...read moreread less

252 citations

Proceedings Article•DOI•

Crytographic limitations on learning Boolean formulae and finite automata

[...]

Michael Kearns¹, Leslie G. Valiant¹•Institutions (1)

Harvard University¹

01 Feb 1989

TL;DR: It is proved that for Boolean formulae, finite automata, and constant depth threshold circuits (simplified neural nets), this problem is computationally as difficult as the quadratic residue problem, inverting the RSA function and factoring Blum integers.

...read moreread less

227 citations

"Learnability and the Vapnik-Chervon..." refers background in this paper

...hard to learn” classes include the class of all concepts represented by Boolean formulas of size bounded by a fixed polynomial in y1 [ 35 ]....
[...]

Journal Article•DOI•

Equivalence of models for polynomial learnability

[...]

David Haussler¹, Michael Kearns²•Institutions (2)

University of California, Santa Cruz¹, Harvard University²

01 Dec 1991-Information & Computation

TL;DR: Comparisons and equivalences are given between Valiant's model and the prediction learning models of Haussler, Littlestone, and Warmuth and show that several simplifying assumptions on polynomial learning algorithms can be made without loss of generality.

...read moreread less

Abstract: In this paper we consider several variants of Valiant's learnability model that have appeared in the literature. We give conditions under which these models are equivalent in terms of the polynomially learnable concept classes they define. These equivalences allow comparisons of most of the existing theorems in Valiant-style learnability and show that several simplifying assumptions on polynomial learning algorithms can be made without loss of generality. We also give a useful reduction of learning problems to the problem of finding consistent hypotheses, and give comparisons and equivalences between Valiant's model and the prediction learning models of Haussler, Littlestone, and Warmuth ( in “29th Annual IEEE Symposium on Foundations of Computer Science,” 1988).

...read moreread less

208 citations