Learnability and the Vapnik-Chervonenkis dimension

doi:10.1145/76359.76371

Home
/
Papers
/
Learnability and the Vapnik-Chervonenkis dimension

Journal Article•DOI•

Learnability and the Vapnik-Chervonenkis dimension

Anselm Blumer¹, Andrzej Ehrenfeucht², David Haussler³, Manfred K. Warmuth³•Institutions (3)

Tufts University¹, University of Colorado Boulder², University of California, Santa Cruz³

01 Oct 1989-Journal of the ACM (ACM)-Vol. 36, Iss: 4, pp 929-965

TL;DR: This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned.

read less

Abstract: Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space En. The methods in this paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufficient conditions are provided for feasible learnability.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Why concept lattices are large: extremal theory for generators, concepts, and VC-dimension

[...]

Alexandre Albano¹, Bogdan Chornomaz²•Institutions (2)

Dresden University of Technology¹, University of Kharkiv²

06 Aug 2017-International Journal of General Systems

TL;DR: An upper bound for the number of minimal generators of contexts without contranominal scales larger than a given size is given by giving an interpretation of this bound in terms of the Vapnik–Chervonenkis dimension of the concept lattice.

...read moreread less

Abstract: A unique type of subcontexts is always present in formal contexts with many concepts: the contranominal scales. We make this precise by giving an upper bound for the number of minimal generators (and thereby for the number of concepts) of contexts without contranominal scales larger than a given size. We give an interpretation of this bound in terms of the Vapnik–Chervonenkis dimension of the concept lattice. Extremal contexts are constructed which meet this bound exactly. They are completely classified.

...read moreread less

23 citations

Cites background from "Learnability and the Vapnik-Chervon..."

...Such area has found many applications in computational learning (Blumer et al. 1989) and its central notion of shattered sets is a widely studied topic in extremal set theory (Jukna 2010)....
[...]

Journal Article•

Generalization Error Bounds for Threshold Decision Lists

[...]

Martin Anthony

01 Dec 2004-Journal of Machine Learning Research

TL;DR: The primary focus in this paper is on obtaining generalization error bounds that depend on the levels of separation---or margins---achieved by the successive linear classifiers.

...read moreread less

Abstract: In this paper we consider the generalization accuracy of classification methods based on the iterative use of linear classifiers. The resulting classifiers, which we call threshold decision lists act as follows. Some points of the data set to be classified are given a particular classification according to a linear threshold function (or hyperplane). These are then removed from consideration, and the procedure is iterated until all points are classified. Geometrically, we can imagine that at each stage, points of the same classification are successively chopped off from the data set by a hyperplane. We analyse theoretically the generalization properties of data classification techniques that are based on the use of threshold decision lists and on the special subclass of multilevel threshold functions. We present bounds on the generalization error in a standard probabilistic learning framework. The primary focus in this paper is on obtaining generalization error bounds that depend on the levels of separation---or margins---achieved by the successive linear classifiers. We also improve and extend previously published theoretical bounds on the generalization ability of perceptron decision trees.

...read moreread less

23 citations

Cites background or methods or result from "Learnability and the Vapnik-Chervon..."

...The chopping procedure described above suggests that the use of threshold decision lists is fairly natural, if an iterative approach is to be taken to pattern classification....
[...]
...Following a form of the PAC model of computational learning theory (see Anthony and Biggs, 1992; Vapnik, 1998; Blumer et al., 1989), we assume that labeled data points (x,b) (where x ∈ Rn and b ∈ {0,1}) have been generated randomly (perhaps from some larger corpus of data) according to a fixed…...
[...]
...For similar results, see Vapnik and Chervonenkis (1971); Blumer et al. (1989); and Anthony and Bartlett (1999). then, for m≥ 8/ε, Pm(Q)≤ 2P2m(T )....
[...]
...The key probability results we employ are the following bounds, due respectively to Vapnik and Chervonenkis (1971) and Blumer et al. (1989) (see also Anthony and Bartlett, 1999): for any ε ∈ (0,1), Pm ({s ∈ Zm : there exists f ∈ H, erP( f )≥ ers( f )+ ε}) < 4ΠH(2m)e−mε 2/8, and, for m≥ 8/ε, Pm ({s…...
[...]
...Lower bounds on the VC-dimension would provide worst-case lower bounds on generalization error (see Ehrenfeucht et al., 1989; Anthony and Biggs, 1992; Anthony and Bartlett, 1999; Blumer et al., 1989)....
[...]

Proceedings Article•DOI•

Being taught can be faster than asking questions

[...]

Ronald L. Rivest¹, Yiqun Lisa Yin•Institutions (1)

Massachusetts Institute of Technology¹

05 Jul 1995

TL;DR: Two uu-lille learuing models: teach er-clirecteci learninE and self-dlrectecl learning are taught, in both models, the learner tries to identify an unkuowu concept based on examples of the concept presented one at, a time.

...read moreread less

Abstract: WF ?X]JIO~(’ th? l>~wt’r of teaChillg by StUdyiIlg two uu-lille learuing models: teach er-clirecteci learninE and self-dlrectecl learning. In both models, the learner tries to identify an unkuowu concept based on examples of the concept presented one at, a time. The learner predirts wheth~r each example is positive or negative with immediate feedback, and the ol)ject,ive is to minimize the uurnl)er of predict,iou mistakes. ThP examples are selected by the teacher in teacher-dlrectecl learning and hy tlhe learner itself in self-directed learning. R,oughly, teacher-directed learning represents the scenario in which a teacher teaches a class of learners, and self-directed learning represents the scenario in which a smart learnerasks questious and learns by itself. For all previolmly studied concept classes, the rnirrimum numl)er of mistalws in teacller-ciirectf ecl learning is always larger than that, in self-directed learning. This raises an mtermting question [.)t’ whrt, hrr teaching is helpful for all learners mrlu(ling the smart learner’. Assuming the existence of clue-way functioms, we construct com cept clahses for which the miuimum nurnher of mislakes is hnear in teacher-directed learning I,ut sllI>rrlJolyllorlllal m self-directed learning, cler~lc~llst,rt~tillg the power of a helpful teacher in a Iearmng process.

...read moreread less

23 citations

Journal Article•DOI•

Sparse approximate multiquadric interpolation

[...]

R.E. Carlson¹, B.K. Natarajan²•Institutions (2)

Lawrence Livermore National Laboratory¹, Hewlett-Packard²

01 Mar 1994-Computers & Mathematics With Applications

TL;DR: It is shown how T may be selected in a provably good fashion, the smallest set of points T ⊆ S such that the multiquadric interpolant of T is within δ of f over S .

...read moreread less

Abstract: Multiquadric interpolation is a technique for interpolating nonuniform samples of multivariate functions, in order to enable a variety of operations such as data visualization. We are interested in computing sparse but approximate interpolants, i.e., approximate interpolants with few coefficients. Such interpolants are useful since (1) the cost of evaluating the interpolant scales directly with the number of nonzero coefficients, and (2) the principle of Occam's Razor 2 suggests that the interpolant with fewer coefficients better approximates the underlying function. Since the number of coefficients in a multiquadric interpolant is, as is to be expected, equal to the number of data points in the given set, the problem can be abstracted thus: given a set S of samples of a function f : R k → R , and an error tolerance δ, find the smallest set of points T ⊆ S such that the multiquadric interpolant of T is within δ of f over S . Using some recent results on sparse solutions of linear systems, we show how T may be selected in a provably good fashion.

...read moreread less

23 citations

Journal Article•DOI•

Sample sizes for multiple-output threshold networks

[...]

John Shawe-Taylor¹, Martin Anthony²•Institutions (2)

University of London¹, London School of Economics and Political Science²

01 Jan 1991-Network: Computation In Neural Systems

TL;DR: It is shown that the sample size for reliable learning can be bounded above by a quantity independent of the number of outputs of the network.

...read moreread less

Abstract: This paper applies the theory of probably approximately correct (PAC) learning to multiple-output feedforward threshold networks. It is shown that the sample size for reliable learning can be bounded above by a quantity independent of the number of outputs of the network.

...read moreread less

23 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
…
120
121
122
123
124
125
126
…
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Johnson: Computers and Intractability-A Guide to the Theory of NP-Completeness

[...]

Michael Randolph Garey

01 Jan 1979

42,654 citations

Book•

Computers and Intractability: A Guide to the Theory of NP-Completeness

[...]

Michael Randolph Garey, David S. Johnson

01 Jan 1979

TL;DR: The second edition of a quarterly column as discussed by the authors provides a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book "Computers and Intractability: A Guide to the Theory of NP-Completeness,” W. H. Freeman & Co., San Francisco, 1979.

...read moreread less

Abstract: This is the second edition of a quarterly column the purpose of which is to provide a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NP-Completeness,’’ W. H. Freeman & Co., San Francisco, 1979 (hereinafter referred to as ‘‘[G&J]’’; previous columns will be referred to by their dates). A background equivalent to that provided by [G&J] is assumed. Readers having results they would like mentioned (NP-hardness, PSPACE-hardness, polynomial-time-solvability, etc.), or open problems they would like publicized, should send them to David S. Johnson, Room 2C355, Bell Laboratories, Murray Hill, NJ 07974, including details, or at least sketches, of any new proofs (full papers are preferred). In the case of unpublished results, please state explicitly that you would like the results mentioned in the column. Comments and corrections are also welcome. For more details on the nature of the column and the form of desired submissions, see the December 1981 issue of this journal.

...read moreread less

40,020 citations

Book•

The Art of Computer Programming

[...]

Donald Ervin Knuth

01 Jan 1968

TL;DR: The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid.

...read moreread less

Abstract: A fuel pin hold-down and spacing apparatus for use in nuclear reactors is disclosed. Fuel pins forming a hexagonal array are spaced apart from each other and held-down at their lower end, securely attached at two places along their length to one of a plurality of vertically disposed parallel plates arranged in horizontally spaced rows. These plates are in turn spaced apart from each other and held together by a combination of spacing and fastening means. The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid. This apparatus is particularly useful in connection with liquid cooled reactors such as liquid metal cooled fast breeder reactors.

...read moreread less

17,939 citations

Journal Article•DOI•

Pattern Classification and Scene Analysis.

[...]

Ulf Grenander, Richard O. Duda, Peter E. Hart

01 Sep 1974-Journal of the American Statistical Association

14,948 citations

Book•

Pattern classification and scene analysis

[...]

Richard O. Duda, Peter E. Hart

01 Jan 1973

TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

...read moreread less

Abstract: Provides a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition. The topics treated include Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

...read moreread less

13,647 citations