scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Learnability and the Vapnik-Chervonenkis dimension

TL;DR: This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned.
Abstract: Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space En. The methods in this paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufficient conditions are provided for feasible learnability.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
Luca Oneto1
TL;DR: The purpose of this review is to give an intelligible overview of the problems of model selection and error estimation, by focusing on the ideas behind the different SLT‐based approaches and simplifying most of the technical aspects with the purpose of making them more accessible and usable in practice.
Abstract: How can we select the best performing data‐driven model? How can we rigorously estimate its generalization error? Statistical learning theory (SLT) answers these questions by deriving nonasymptotic bounds on the generalization error of a model or, in other words, by delivering upper bounding of the true error of the learned model based just on quantities computed on the available data. However, for a long time, SLT has been considered only as an abstract theoretical framework, useful for inspiring new learning approaches, but with limited applicability to practical problems. The purpose of this review is to give an intelligible overview of the problems of model selection (MS) and error estimation (EE), by focusing on the ideas behind the different SLT‐based approaches and simplifying most of the technical aspects with the purpose of making them more accessible and usable in practice. We start by presenting the seminal works of the 80s until the most recent results, then discuss open problems and finally outline future directions of this field of research.

22 citations

Posted Content
TL;DR: It is proved that for every class of graphs ℒ which is nowhere dense, and for every first order formula φ(x, y), the number of subsets of A|y| which are of the form ū for some valuation ū of x in G is bounded by O(|A||x|ε), which provides optimal bounds on the VC-density of first-order definable set systems in nowhere dense graph classes.
Abstract: We prove that for every class of graphs $\mathcal{C}$ which is nowhere dense, as defined by Nesetril and Ossona de Mendez, and for every first order formula $\phi(\bar x,\bar y)$, whenever one draws a graph $G\in \mathcal{C}$ and a subset of its nodes $A$, the number of subsets of $A^{|\bar y|}$ which are of the form $\{\bar v\in A^{|\bar y|}\, \colon\, G\models\phi(\bar u,\bar v)\}$ for some valuation $\bar u$ of $\bar x$ in $G$ is bounded by $\mathcal{O}(|A|^{|\bar x|+\epsilon})$, for every $\epsilon>0$. This provides optimal bounds on the VC-density of first-order definable set systems in nowhere dense graph classes. We also give two new proofs of upper bounds on quantities in nowhere dense classes which are relevant for their logical treatment. Firstly, we provide a new proof of the fact that nowhere dense classes are uniformly quasi-wide, implying explicit, polynomial upper bounds on the functions relating the two notions. Secondly, we give a new combinatorial proof of the result of Adler and Adler stating that every nowhere dense class of graphs is stable. In contrast to the previous proofs of the above results, our proofs are completely finitistic and constructive, and yield explicit and computable upper bounds on quantities related to uniform quasi-wideness (margins) and stability (ladder indices).

22 citations


Cites background from "Learnability and the Vapnik-Chervon..."

  • ...ably approximately correct learning (PAC learning, introduced by Valiant in [40]), the size of the random sample required as the training set is determined by the VC-density of the concept class rather than by its VC-dimension (see [6], Lemma 7)....

    [...]

  • ...The motivation for finding bounds on the VC-density comes from the fact that it is this quantity, rather than VC-dimension, that is actually relevant in combinatorial and algorithmic applications [6, 8, 9, 25, 26]....

    [...]

Book ChapterDOI
03 Oct 2005
TL;DR: Experimental results show that SVM with the stump kernel is usually superior than boosting, even with noisy data, and the framework can output an infinite and nonsparse ensemble.
Abstract: Ensemble learning algorithms such as boosting can achieve better performance by averaging over the predictions of base hypotheses. However, existing algorithms are limited to combining only a finite number of hypotheses, and the generated ensemble is usually sparse. It is not clear whether we should construct an ensemble classifier with a larger or even infinite number of hypotheses. In addition, constructing an infinite ensemble itself is a challenging task. In this paper, we formulate an infinite ensemble learning framework based on SVM. The framework can output an infinite and nonsparse ensemble, and can be used to construct new kernels for SVM as well as to interpret some existing ones. We demonstrate the framework with a concrete application, the stump kernel, which embodies infinitely many decision stumps. The stump kernel is simple, yet powerful. Experimental results show that SVM with the stump kernel is usually superior than boosting, even with noisy data.

22 citations


Cites background from "Learnability and the Vapnik-Chervon..."

  • ...training sets (Cover 1965; Blumer et al. 1989 )....

    [...]

  • ...Denition 1 (Baum and Haussler 1989; Blumer et al. 1989 ) Consider the set of vectors X =fxig N i=12X N . We say that X is shattered byG if for all (y1; y2; ; yN)2 4 f 1; +1gN, there exists g2G such that yi = g(xi) for i = 1; 2; ; N....

    [...]

Journal Article
TL;DR: The Vapnik-Chervonenkis dimension of certain types of linearly weighted neural networks is investigated and the "probably approximately correct" learning framework is described and the importance of the Vapik-C ChervonenKis dimension is illustrated.
Abstract: Abst ract . Th e Vapn ik-Chervonenkis dimension has proven to be of great use in the theoret ical study of generalizat ion in artificial neural networks. Th e \"probably approximately correct\" learning framework is described and the importance of the Vapnik-Chervonenkis dimension is illustrated. We then investigate the Vapnik-Chervonenkis dimension of certain types of linearly weighted neural networks. First , we obtain bounds on the Vapnik-Chervonenkis dimensions of radial basis function networks with basis functions of several types. Secondly, we calculate the VapnikChervonenkis dimension of polynomial discriminant funct ions defined over both real and binary-valued inputs.

22 citations

References
More filters
Book
01 Jan 1979
TL;DR: The second edition of a quarterly column as discussed by the authors provides a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book "Computers and Intractability: A Guide to the Theory of NP-Completeness,” W. H. Freeman & Co., San Francisco, 1979.
Abstract: This is the second edition of a quarterly column the purpose of which is to provide a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NP-Completeness,’’ W. H. Freeman & Co., San Francisco, 1979 (hereinafter referred to as ‘‘[G&J]’’; previous columns will be referred to by their dates). A background equivalent to that provided by [G&J] is assumed. Readers having results they would like mentioned (NP-hardness, PSPACE-hardness, polynomial-time-solvability, etc.), or open problems they would like publicized, should send them to David S. Johnson, Room 2C355, Bell Laboratories, Murray Hill, NJ 07974, including details, or at least sketches, of any new proofs (full papers are preferred). In the case of unpublished results, please state explicitly that you would like the results mentioned in the column. Comments and corrections are also welcome. For more details on the nature of the column and the form of desired submissions, see the December 1981 issue of this journal.

40,020 citations

Book
01 Jan 1968
TL;DR: The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid.
Abstract: A fuel pin hold-down and spacing apparatus for use in nuclear reactors is disclosed. Fuel pins forming a hexagonal array are spaced apart from each other and held-down at their lower end, securely attached at two places along their length to one of a plurality of vertically disposed parallel plates arranged in horizontally spaced rows. These plates are in turn spaced apart from each other and held together by a combination of spacing and fastening means. The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid. This apparatus is particularly useful in connection with liquid cooled reactors such as liquid metal cooled fast breeder reactors.

17,939 citations

Book
01 Jan 1973
TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.
Abstract: Provides a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition. The topics treated include Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

13,647 citations