Learnability and the Vapnik-Chervonenkis dimension

doi:10.1145/76359.76371

Home
/
Papers
/
Learnability and the Vapnik-Chervonenkis dimension

Journal Article•DOI•

Learnability and the Vapnik-Chervonenkis dimension

Anselm Blumer¹, Andrzej Ehrenfeucht², David Haussler³, Manfred K. Warmuth³•Institutions (3)

Tufts University¹, University of Colorado Boulder², University of California, Santa Cruz³

01 Oct 1989-Journal of the ACM (ACM)-Vol. 36, Iss: 4, pp 929-965

TL;DR: This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned.

read less

Abstract: Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space En. The methods in this paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufficient conditions are provided for feasible learnability.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Learning Changing Concepts by Exploiting the Structure of Change

[...]

Peter L. Bartlett¹, Shai Ben-David², Sanjeev R. Kulkarni³•Institutions (3)

Australian National University¹, Technion – Israel Institute of Technology², Princeton University³

01 Nov 2000-Machine Learning

TL;DR: Using a deterministic analysis in a general metric space setting, this paper provides a technique for constructing a successful prediction algorithm, given a successful estimation algorithm, for the prediction of changing concepts.

...read moreread less

Abstract: This paper examines learning problems in which the target function is allowed to change. The learner sees a sequence of random examples, labelled according to a sequence of functions, and must provide an accurate estimate of the target function sequence. We consider a variety of restrictions on how the target function is allowed to change, including infrequent but arbitrary changes, sequences that correspond to slow walks on a graph whose nodes are functions, and changes that are small on average, as measured by the probability of disagreements between consecutive functions. We first study estimation, in which the learner sees a batch of examples and is then required to give an accurate estimate of the function sequence. Our results provide bounds on the sample complexity and allowable drift rate for these problems. We also study prediction, in which the learner must produce online a hypothesis after each labelled example and the average misclassification probability over this hypothesis sequence should be small. Using a deterministic analysis in a general metric space setting, we provide a technique for constructing a successful prediction algorithm, given a successful estimation algorithm. This leads to sample complexity and drift rate bounds for the prediction of changing concepts.

...read moreread less

77 citations

Journal Article•DOI•

Learning in the Presence of Finitely or Infinitely Many Irrelevant Attributes

[...]

Avrim Blum¹, Lisa Hellerstein¹, Nick Littlestone¹•Institutions (1)

Carnegie Mellon University¹

01 Feb 1995-Journal of Computer and System Sciences

TL;DR: The problem of learning boolean functions in query and mistake-bound models in the presence of irrelevant attributes is addressed and a large class of functions, including the set of monotone functions, is described, for which learnability does imply attribute-efficient learnability in this model.

...read moreread less

77 citations

Journal Article•DOI•

On the role of locality in learning stress patterns

[...]

Jeffrey Heinz¹•Institutions (1)

University of Delaware¹

01 Aug 2009-Phonology

TL;DR: In this article, the authors present a previously unnoticed universal property of stress patterns in the world's languages: they are, for small neighbourhoods, neighbourhood-distinct, a locality condition defined in automata-theoretic terms.

...read moreread less

Abstract: This paper presents a previously unnoticed universal property of stress patterns in the world's languages: they are, for small neighbourhoods, neighbourhood-distinct. Neighbourhood-distinctness is a locality condition defined in automata-theoretic terms. This universal is established by examining stress patterns contained in two typological studies. Strikingly, many logically possible – but unattested – patterns do not have this property. Not only does neighbourhood-distinctness unite the attested patterns in a non-trivial way, it also naturally provides an inductive principle allowing learners to generalise from limited data. A learning algorithm is presented which generalises by failing to distinguish same-neighbourhood environments perceived in the learner's linguistic input – hence learning neighbourhood-distinct patterns – as well as almost every stress pattern in the typology. In this way, this work lends support to the idea that properties of the learner can explain certain properties of the attested typology, an idea not straightforwardly available in optimality-theoretic and Principle and Parameter frameworks.

...read moreread less

77 citations

Proceedings Article•DOI•

How many queries are needed to learn

[...]

Lisa Hellerstein¹, Vijay V. Raghavan², Krishnan Pillaipakkamnatt², Dawn Wilkins²•Institutions (2)

Northwestern University¹, Vanderbilt University²

29 May 1995

TL;DR: It is shown that an honest class is exactly polynomial-query learnable if and only if it is learnable using an oracle for Γp4, and a new relationship between query complexity and time complexity in exact learning is shown.

...read moreread less

Abstract: We investigate the query complexity of exact learning in the membership and (proper) equivalence query model. We give a complete characterization of concept classes that are learnable with a polynomial number of polynomial sized queries in this model. We give applications of this characterization, including results on learning a natural subclass of DNF formulas, and on learning with membership queries alone. Query complexity has previously been used to prove lower bounds on the time complexity of exact learning. We show a new relationship between query complexity and time complexity in exact learning: If any “honest” class is exactly and properly learnable with polynomial query complexity, but not learnable in polynomial time, then P = NP. In particular, we show that an honest class is exactly polynomial-query learnable if and only if it is learnable using an oracle for Γp4.

...read moreread less

75 citations

Journal Article•DOI•

The Perceptron Algorithm Is Fast for Non-Malicious Distributions

[...]

Eric B. Baum¹•Institutions (1)

Princeton University¹

01 Jan 1989

TL;DR: A modification of Valiant's distribution-independent protocol for learning is proposed in which the distribution and the function to be learned may be chosen by adversaries, however these adversaries may not communicate.

...read moreread less

Abstract: Within the context of Valiant's protocol for learning, the Perceptron algorithm is shown to learn an arbitrary half-space in time O(n2/e3) if D, the probability distribution of examples, is taken uniform over the unit sphere Sn. Here e is the accuracy parameter. This is surprisingly fast, as "standard" approaches involve solution of a linear programming problem involving Ω(n/e) constraints in n dimensions. A modification of Valiant's distribution independent protocol for learning is proposed in which the distribution and the function to be learned may be chosen by adversaries, however these adversaries may not communicate. It is argued that this definition is more reasonable and applicable to real world learning than Valiant's. Under this definition, the Perceptron algorithm is shown to be a distribution independent learning algorithm. In an appendix we show that, for uniform distributions, some classes of infinite V-C dimension including convex sets and a class of nested differences of convex sets are learnable.

...read moreread less

75 citations

Cites background or result from "Learnability and the Vapnik-Chervon..."

...Third, the results of [Blumer et aI, 1987] imply that we can only expect to learn a class of functions F if F has finite V-C dimension....
[...]
...This would suffice to assure that the hypothesis half space so generated would (with confidence 1 -0) have error less than €, as is seen from [Blumer et aI, 1987, Theorem A3....
[...]
...In particular, if F has Vapnik-Chervonenkis (V-C) dimension l1 d, then it has been proved[Blumer et al, 1987] that all A needs to do to be a valid learning algorithm is to call MO(f, 8, d) = max(~logj, Sfdlog1f3) examples and to find in polynomial time a function 9 E F which correctly classifies…...
[...]
...Thus, for example, it is simple to show that the class H of half spaces is Valiant learnable[Blumer et aI, 1987]....
[...]
...First, although the results of [Blumer et al., 1987] tell us we can gather enough information for learning in polynomial time, they say nothing about when we can actually find an algorithm A which learns in polynomial time....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
…
50
51
52
53
54
55
56
…
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Johnson: Computers and Intractability-A Guide to the Theory of NP-Completeness

[...]

Michael Randolph Garey

01 Jan 1979

42,654 citations

Book•

Computers and Intractability: A Guide to the Theory of NP-Completeness

[...]

Michael Randolph Garey, David S. Johnson

01 Jan 1979

TL;DR: The second edition of a quarterly column as discussed by the authors provides a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book "Computers and Intractability: A Guide to the Theory of NP-Completeness,” W. H. Freeman & Co., San Francisco, 1979.

...read moreread less

Abstract: This is the second edition of a quarterly column the purpose of which is to provide a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NP-Completeness,’’ W. H. Freeman & Co., San Francisco, 1979 (hereinafter referred to as ‘‘[G&J]’’; previous columns will be referred to by their dates). A background equivalent to that provided by [G&J] is assumed. Readers having results they would like mentioned (NP-hardness, PSPACE-hardness, polynomial-time-solvability, etc.), or open problems they would like publicized, should send them to David S. Johnson, Room 2C355, Bell Laboratories, Murray Hill, NJ 07974, including details, or at least sketches, of any new proofs (full papers are preferred). In the case of unpublished results, please state explicitly that you would like the results mentioned in the column. Comments and corrections are also welcome. For more details on the nature of the column and the form of desired submissions, see the December 1981 issue of this journal.

...read moreread less

40,020 citations

Book•

The Art of Computer Programming

[...]

Donald Ervin Knuth

01 Jan 1968

TL;DR: The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid.

...read moreread less

Abstract: A fuel pin hold-down and spacing apparatus for use in nuclear reactors is disclosed. Fuel pins forming a hexagonal array are spaced apart from each other and held-down at their lower end, securely attached at two places along their length to one of a plurality of vertically disposed parallel plates arranged in horizontally spaced rows. These plates are in turn spaced apart from each other and held together by a combination of spacing and fastening means. The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid. This apparatus is particularly useful in connection with liquid cooled reactors such as liquid metal cooled fast breeder reactors.

...read moreread less

17,939 citations

Journal Article•DOI•

Pattern Classification and Scene Analysis.

[...]

Ulf Grenander, Richard O. Duda, Peter E. Hart

01 Sep 1974-Journal of the American Statistical Association

14,948 citations

Book•

Pattern classification and scene analysis

[...]

Richard O. Duda, Peter E. Hart

01 Jan 1973

TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

...read moreread less

Abstract: Provides a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition. The topics treated include Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

...read moreread less

13,647 citations