Learnability and the Vapnik-Chervonenkis dimension

doi:10.1145/76359.76371

Home
/
Papers
/
Learnability and the Vapnik-Chervonenkis dimension

Journal Article•DOI•

Learnability and the Vapnik-Chervonenkis dimension

Anselm Blumer¹, Andrzej Ehrenfeucht², David Haussler³, Manfred K. Warmuth³•Institutions (3)

Tufts University¹, University of Colorado Boulder², University of California, Santa Cruz³

01 Oct 1989-Journal of the ACM (ACM)-Vol. 36, Iss: 4, pp 929-965

TL;DR: This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned.

read less

Abstract: Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space En. The methods in this paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufficient conditions are provided for feasible learnability.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

On the Properties of Non-Media Digital Watermarking: A Review of State of the Art Techniques

[...]

Arezou Soltani Panah¹, Ron van Schyndel¹, Timos Sellis², Elisa Bertino³•Institutions (3)

RMIT University¹, Swinburne University of Technology², Purdue University³

19 May 2016-IEEE Access

TL;DR: This paper reviews recent developments in the non-media applications of data watermarking, which have emerged over the last decade as an exciting new sub-domain as well as looking at the new challenges of digital water marking that have arisen with the evolution of big data.

...read moreread less

Abstract: Over the last 25 years, there has been much work on multimedia digital watermarking. In this domain, the primary limitation to watermark strength has been in its visibility. For multimedia watermarks, invisibility is defined in human terms (that is, in terms of human sensory limitations). In this paper, we review recent developments in the non-media applications of data watermarking, which have emerged over the last decade as an exciting new sub-domain. Since by definition, the intended receiver should be able to detect the watermark, we have to redefine invisibility in an acceptable way that is often application-specific and thus cannot be easily generalized. In particular, this is true when the data is not intended to be directly consumed by humans. For example, a loose definition of robustness might be in terms of the resilience of a watermark against normal host data operations, and of invisibility as resilience of the data interpretation against change introduced by the watermark. In this paper, we classify the data in terms of data mining rules on complex types of data such as time-series, symbolic sequences, data streams, and so forth. We emphasize the challenges involved in non-media watermarking in terms of common watermarking properties, including invisibility, capacity, robustness, and security. With the aid of a few examples of watermarking applications, we demonstrate these distinctions and we look at the latest research in this regard to make our argument clear and more meaningful. As the last aim, we look at the new challenges of digital watermarking that have arisen with the evolution of big data.

...read moreread less

64 citations

Cites background from "Learnability and the Vapnik-Chervon..."

...1The Vapnik-Chervonenkis dimension of a set V is the size of the longest index sequence I that is shattered by V [100]....
[...]

Journal Article•

Learning Halfspaces with Malicious Noise

[...]

Adam R. Klivans, Philip M. Long, Rocco A. Servedio

01 Dec 2009-Journal of Machine Learning Research

TL;DR: In this paper, a poly(n, 1/e)-time algorithm was proposed for learning origin-centered halfspaces in the challenging malicious noise model, where an adversary may corrupt both the labels and the underlying distribution of examples.

...read moreread less

Abstract: We give new algorithms for learning halfspaces in the challenging malicious noise model, where an adversary may corrupt both the labels and the underlying distribution of examples. Our algorithms can tolerate malicious noise rates exponentially larger than previous work in terms of the dependence on the dimension n, and succeed for the fairly broad class of all isotropic log-concave distributions. We give poly(n, 1/e)-time algorithms for solving the following problems to accuracy e: Learning origin-centered halfspaces in Rn with respect to the uniform distribution on the unit ball with malicious noise rate η = Ω(e2 / log(n/e)). (The best previous result was Ω(e / (n log(n/e))1/4).) Learning origin-centered halfspaces with respect to any isotropic log-concave distribution on Rn with malicious noise rate η = Ω(e3 / log2(n/e)). This is the first efficient algorithm for learning under isotropic log-concave distributions in the presence of malicious noise. We also give a poly(n,1/e)-time algorithm for learning origin-centered halfspaces under any isotropic log-concave distribution on Rn in the presence of adversarial label noise at rate η = Ω(e3 / log(1/e)). In the adversarial label noise setting (or agnostic model), labels can be noisy, but not example points themselves. Previous results could handle η = Ω(e) but had running time exponential in an unspecified function of 1/e. Our analysis crucially exploits both concentration and anti-concentration properties of isotropic log-concave distributions. Our algorithms combine an iterative outlier removal procedure using Principal Component Analysis together with "smooth" boosting.

...read moreread less

64 citations

Proceedings Article•

Improved Generalization Bounds for Robust Learning.

[...]

Idan Attias¹, Aryeh Kontorovich¹, Yishay Mansour²•Institutions (2)

Ben-Gurion University of the Negev¹, Google²

10 Mar 2019

TL;DR: In this paper, the adversarial examples model is extended to the real-valued setting, and the sample complexity is improved from O(1/ε 4 ) to O(ε 2 ) for any α > 0.

...read moreread less

Abstract: We consider a model of robust learning in an adversarial environment. The learner gets uncorrupted training data with access to possible corruptions that may be effected by the adversary during testing. The learner's goal is to build a robust classifier, which will be tested on future adversarial examples. The adversary is limited to $k$ possible corruptions for each input. We model the learner-adversary interaction as a zero-sum game. This model is closely related to the adversarial examples model of Schmidt et al. (2018); Madry et al. (2017). Our main results consist of generalization bounds for the binary and multiclass classification, as well as the real-valued case (regression). For the binary classification setting, we both tighten the generalization bound of Feige, Mansour, and Schapire (2015), and are also able to handle infinite hypothesis classes. The sample complexity is improved from $O(\frac{1}{\epsilon^4}\log(\frac{|\mathcal{H}|}{\delta}))$ to $O\big(\frac{1}{\epsilon^2}(\sqrt{k \mathrm{VC}(\mathcal{H})}\log^{\frac{3}{2}+\alpha}(k\mathrm{VC}(\mathcal{H}))+\log(\frac{1}{\delta})\big)$ for any $\alpha > 0$. Additionally, we extend the algorithm and generalization bound from the binary to the multiclass and real-valued cases. Along the way, we obtain results on fat-shattering dimension and Rademacher complexity of $k$-fold maxima over function classes; these may be of independent interest. For binary classification, the algorithm of Feige et al. (2015) uses a regret minimization algorithm and an ERM oracle as a black box; we adapt it for the multiclass and regression settings. The algorithm provides us with near-optimal policies for the players on a given training sample.

...read moreread less

64 citations

Journal Article•DOI•

An efficient fully unsupervised video object segmentation scheme using an adaptive neural-network classifier architecture

[...]

Anastasios Doulamis¹, Nikolaos Doulamis¹, K.S. Ntalianis¹, Stefanos Kollias¹•Institutions (1)

National and Kapodistrian University of Athens¹

01 May 2003-IEEE Transactions on Neural Networks

TL;DR: Experimental results and comparisons indicate the good performance of the proposed scheme even in sequences with complicated content (object bending, occlusion), according to an adaptable neural-network architecture.

...read moreread less

Abstract: In this paper, an unsupervised video object (VO) segmentation and tracking algorithm is proposed based on an adaptable neural-network architecture. The proposed scheme comprises: 1) a VO tracking module and 2) an initial VO estimation module. Object tracking is handled as a classification problem and implemented through an adaptive network classifier, which provides better results compared to conventional motion-based tracking algorithms. Network adaptation is accomplished through an efficient and cost effective weight updating algorithm, providing a minimum degradation of the previous network knowledge and taking into account the current content conditions. A retraining set is constructed and used for this purpose based on initial VO estimation results. Two different scenarios are investigated. The first concerns extraction of human entities in video conferencing applications, while the second exploits depth information to identify generic VOs in stereoscopic video sequences. Human face/ body detection based on Gaussian distributions is accomplished in the first scenario, while segmentation fusion is obtained using color and depth information in the second scenario. A decision mechanism is also incorporated to detect time instances for weight updating. Experimental results and comparisons indicate the good performance of the proposed scheme even in sequences with complicated content (object bending, occlusion).

...read moreread less

64 citations

Cites methods from "Learnability and the Vapnik-Chervon..."

...Taking the previously mentioned issues into consideration, we propose an unsupervised VO segmentation and tracking scheme that incorporates an adaptable neural-network architecture....
[...]

Proceedings Article•DOI•

Generalizing the PAC model: sample size bounds from metric dimension-based uniform convergence results

[...]

David Haussler¹•Institutions (1)

University of California, Santa Cruz¹

30 Oct 1989

TL;DR: The probably approximately correct (PAC) model of learning from examples is generalized, and a distribution-independent uniform convergence result for certain classes of functions computed by feedforward neural nets is obtained.

...read moreread less

Abstract: The probably approximately correct (PAC) model of learning from examples is generalized. The problem of learning functions from a set X into a set Y is considered, assuming only that the examples are generated by independent draws according to an unknown probability measure on X*Y. The learner's goal is to find a function in a given hypothesis space of functions from X into Y that on average give Y values that are close to those observed in random examples. The discrepancy is measured by a bounded real-valued loss function. The average loss is called the error of the hypothesis. A theorem on the uniform convergence of empirical error estimates to true error rates is given for certain hypothesis spaces, and it is shown how this implies learnability. A generalized notion of VC dimension that applies to classes of real-valued functions and a notion of capacity for classes of functions that map into a bounded metric space are given. These measures are used to bound the rate of convergence of empirical error estimates to true error rates, giving bounds on the sample size needed for learning using hypotheses in these classes. As an application, a distribution-independent uniform convergence result for certain classes of functions computed by feedforward neural nets is obtained. Distribution-specific uniform convergence results for classes of functions that are uniformly continuous on average are also obtained. >

...read moreread less

63 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
…
58
59
60
61
62
63
64
…
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Johnson: Computers and Intractability-A Guide to the Theory of NP-Completeness

[...]

Michael Randolph Garey

01 Jan 1979

42,654 citations

Book•

Computers and Intractability: A Guide to the Theory of NP-Completeness

[...]

Michael Randolph Garey, David S. Johnson

01 Jan 1979

TL;DR: The second edition of a quarterly column as discussed by the authors provides a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book "Computers and Intractability: A Guide to the Theory of NP-Completeness,” W. H. Freeman & Co., San Francisco, 1979.

...read moreread less

Abstract: This is the second edition of a quarterly column the purpose of which is to provide a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NP-Completeness,’’ W. H. Freeman & Co., San Francisco, 1979 (hereinafter referred to as ‘‘[G&J]’’; previous columns will be referred to by their dates). A background equivalent to that provided by [G&J] is assumed. Readers having results they would like mentioned (NP-hardness, PSPACE-hardness, polynomial-time-solvability, etc.), or open problems they would like publicized, should send them to David S. Johnson, Room 2C355, Bell Laboratories, Murray Hill, NJ 07974, including details, or at least sketches, of any new proofs (full papers are preferred). In the case of unpublished results, please state explicitly that you would like the results mentioned in the column. Comments and corrections are also welcome. For more details on the nature of the column and the form of desired submissions, see the December 1981 issue of this journal.

...read moreread less

40,020 citations

Book•

The Art of Computer Programming

[...]

Donald Ervin Knuth

01 Jan 1968

TL;DR: The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid.

...read moreread less

Abstract: A fuel pin hold-down and spacing apparatus for use in nuclear reactors is disclosed. Fuel pins forming a hexagonal array are spaced apart from each other and held-down at their lower end, securely attached at two places along their length to one of a plurality of vertically disposed parallel plates arranged in horizontally spaced rows. These plates are in turn spaced apart from each other and held together by a combination of spacing and fastening means. The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid. This apparatus is particularly useful in connection with liquid cooled reactors such as liquid metal cooled fast breeder reactors.

...read moreread less

17,939 citations

Journal Article•DOI•

Pattern Classification and Scene Analysis.

[...]

Ulf Grenander, Richard O. Duda, Peter E. Hart

01 Sep 1974-Journal of the American Statistical Association

14,948 citations

Book•

Pattern classification and scene analysis

[...]

Richard O. Duda, Peter E. Hart

01 Jan 1973

TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

...read moreread less

Abstract: Provides a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition. The topics treated include Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

...read moreread less

13,647 citations