scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Learnability and the Vapnik-Chervonenkis dimension

TL;DR: This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned.
Abstract: Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space En. The methods in this paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufficient conditions are provided for feasible learnability.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, a lower bound of Ω ((1/∆)ln(1/δ)+VCdim(C )/ε) was shown for distribution-free learning of a concept class C, where VCdim( C ) is the Vapnik-Chervonenkis dimension and ǫ and à are the accuracy and confidence parameters.
Abstract: We prove a lower bound of Ω ((1/ɛ)ln(1/δ)+VCdim( C )/ɛ) on the number of random examples required for distribution-free learning of a concept class C , where VCdim( C ) is the Vapnik-Chervonenkis dimension and ɛ and δ are the accuracy and confidence parameters. This improves the previous best lower bound of Ω ((1/ɛ)ln(1/δ)+VCdim( C )) and comes close to the known general upper bound of O ((1/ɛ)ln(1/δ)+(VCdim( C )/ɛ)ln(1/ɛ)) for consistent algorithms. We show that for many interesting concept classes, including k CNF and k DNF, our bound is actually tight to within a constant factor.

410 citations

Journal ArticleDOI
TL;DR: A theory for ψ-learning is provided and it is shown that it essentially attains the optimal rates of convergence in two learning examples, and results from simulation studies and from breast cancer classification confirm the ability ofπ-learning to outperform SVM in generalization.
Abstract: The concept of large margins have been recognized as an important principle in analyzing learning methodologies, including boosting, neural networks, and support vector machines (SVMs). However, this concept alone is not adequate for learning in nonseparable cases. We propose a learning methodology, called ψ-learning, that is derived from a direct consideration of generalization errors. We provide a theory for ψ-learning and show that it essentially attains the optimal rates of convergence in two learning examples. Finally, results from simulation studies and from breast cancer classification confirm the ability of ψ-learning to outperform SVM in generalization.

403 citations

Journal ArticleDOI
TL;DR: A characterization of learnability in the probabilistic concept model, solving an open problem posed by Kearns and Schapire, and shows that the accuracy parameter plays a crucial role in determining the effective complexity of the learner's hypothesis class.
Abstract: Learnability in Valiant's PAC learning model has been shown to be strongly related to the existence of uniform laws of large numbers These laws define a distribution-free convergence property of means to expectations uniformly over classes of random variables Classes of real-valued functions enjoying such a property are also known as uniform Glivenko-Cantelli classes In this paper, we prove, through a generalization of Sauer's lemma that may be interesting in its own right, a new characterization of uniform Glivenko-Cantelli classes Our characterization yields Dudley, Gine´, and Zinn's previous characterization as a corollary Furthermore, it is the first based on a Gine´, and Zinn's previous characterization as a corollary Furthermore, it is the first based on a simple combinatorial quantity generalizing the Vapnik-Chervonenkis dimension We apply this result to obtain the weakest combinatorial condition known to imply PAC learnability in the statistical regression (or “agnostic”) framework Furthermore, we find a characterization of learnability in the probabilistic concept model, solving an open problem posed by Kearns and Schapire These results show that the accuracy parameter plays a crucial role in determining the effective complexity of the learner's hypothesis class

398 citations


Cites background from "Learnability and the Vapnik-Chervon..."

  • ...In this case, it is well known that the learnability of ' is completely characterized by the finiteness of a simple combinatorial quantity known as the Vapnik Chervonenkis (VC) dimension of ' [Vapnik and Chervonenkis 1971; Blumer et al. 1989]....

    [...]

  • ...In this case it is well known that the learnability of H is completely characterized by the finiteness of a simple combinatorial quantity known as the Vapnik-Chervonenkis (VC) dimension of H [24, 6]....

    [...]

Journal ArticleDOI
17 Oct 1999
TL;DR: This work provides a novel algorithmic analysis via a model of robust concept learning (closely related to “margin classifiers”), and shows that a relatively small number of examples are sufficient to learn rich concept classes.
Abstract: We study the phenomenon of cognitive learning from an algorithmic standpoint. How does the brain effectively learn concepts from a small number of examples despite the fact that each example contains a huge amount of information? We provide a novel analysis for a model of robust concept learning (closely related to "margin classifiers"), and show that a relatively small number of examples are sufficient to learn rich concept classes (including threshold functions, Boolean formulae and polynomial surfaces). As a result, we obtain simple intuitive proofs for the generalization bounds of Support Vector Machines. In addition, the new algorithm has several advantages-they are faster conceptually simpler and highly resistant to noise. For example, a robust half-space can be PAC-learned in linear time using only a constant number of training examples, regardless of the number of attributes. A general (algorithmic) consequence of the model, that "more robust concepts are easier to learn", is supported by a multitude of psychological studies.

396 citations


Cites background from "Learnability and the Vapnik-Chervon..."

  • ...Theorem 4 (Blumer et al. 1989)....

    [...]

  • ...The following well-known theorem (see Kearns & Vazirani (1994) or Blumer et al. (1989)) gives a bound on the size of the sample so that a hypothesis that is consistent with the sample also has, with high probability, small error with respect to the entire distribution....

    [...]

  • ...The theorem below is a slight variant of a similar theorem from Blumer et al. (1989)....

    [...]

Proceedings ArticleDOI
01 Jul 1992
TL;DR: An investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions is initiated, providing an initial outline of the possibilities for agnostic learning.
Abstract: In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termed agnostic learning, in which we make virtually no assumptions on the target function. The name derives from the fact that as designers of learning algorithms, we give up the belief that Nature (as represented by the target function) has a simple or succinct explanation.We give a number of both positive and negative results that provide an initial outline of the possibilities for agnostic learning. Our results include hardness results for the most obvious generalization of the PAC model to an agnostic setting, an efficient and general agnostic learning method based on dynamic programming, relationships between loss functions for agnostic learning, and an algorithm for learning in a model for problems involving hidden variables.

380 citations


Cites methods from "Learnability and the Vapnik-Chervon..."

  • ...Although we omit the details, these arguments can be made rigorous using, for instance, the randomized lower bound techniques of Blumer et al. (1989) ....

    [...]

  • ...It follows from standard arguments ( Blumer, Ehrenfeucht, Haussler & Warmuth, 1989 ) that if the Vapnik-Chervonenkis dimension of T is polynomially bounded by the complexity parameter n, an algorithm that efficiently solves the disagreement minimization problem for T can be used as a subroutine by an efficient algorithm for learning fir" in the agnostic PAC model....

    [...]

References
More filters
Book
01 Jan 1979
TL;DR: The second edition of a quarterly column as discussed by the authors provides a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book "Computers and Intractability: A Guide to the Theory of NP-Completeness,” W. H. Freeman & Co., San Francisco, 1979.
Abstract: This is the second edition of a quarterly column the purpose of which is to provide a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NP-Completeness,’’ W. H. Freeman & Co., San Francisco, 1979 (hereinafter referred to as ‘‘[G&J]’’; previous columns will be referred to by their dates). A background equivalent to that provided by [G&J] is assumed. Readers having results they would like mentioned (NP-hardness, PSPACE-hardness, polynomial-time-solvability, etc.), or open problems they would like publicized, should send them to David S. Johnson, Room 2C355, Bell Laboratories, Murray Hill, NJ 07974, including details, or at least sketches, of any new proofs (full papers are preferred). In the case of unpublished results, please state explicitly that you would like the results mentioned in the column. Comments and corrections are also welcome. For more details on the nature of the column and the form of desired submissions, see the December 1981 issue of this journal.

40,020 citations

Book
01 Jan 1968
TL;DR: The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid.
Abstract: A fuel pin hold-down and spacing apparatus for use in nuclear reactors is disclosed. Fuel pins forming a hexagonal array are spaced apart from each other and held-down at their lower end, securely attached at two places along their length to one of a plurality of vertically disposed parallel plates arranged in horizontally spaced rows. These plates are in turn spaced apart from each other and held together by a combination of spacing and fastening means. The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid. This apparatus is particularly useful in connection with liquid cooled reactors such as liquid metal cooled fast breeder reactors.

17,939 citations

Book
01 Jan 1973
TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.
Abstract: Provides a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition. The topics treated include Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

13,647 citations