scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Learnability and the Vapnik-Chervonenkis dimension

TL;DR: This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned.
Abstract: Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space En. The methods in this paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufficient conditions are provided for feasible learnability.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: An upper bound for the number of minimal generators of contexts without contranominal scales larger than a given size is given by giving an interpretation of this bound in terms of the Vapnik–Chervonenkis dimension of the concept lattice.
Abstract: A unique type of subcontexts is always present in formal contexts with many concepts: the contranominal scales. We make this precise by giving an upper bound for the number of minimal generators (and thereby for the number of concepts) of contexts without contranominal scales larger than a given size. We give an interpretation of this bound in terms of the Vapnik–Chervonenkis dimension of the concept lattice. Extremal contexts are constructed which meet this bound exactly. They are completely classified.

23 citations


Cites background from "Learnability and the Vapnik-Chervon..."

  • ...Such area has found many applications in computational learning (Blumer et al. 1989) and its central notion of shattered sets is a widely studied topic in extremal set theory (Jukna 2010)....

    [...]

Journal Article
TL;DR: The primary focus in this paper is on obtaining generalization error bounds that depend on the levels of separation---or margins---achieved by the successive linear classifiers.
Abstract: In this paper we consider the generalization accuracy of classification methods based on the iterative use of linear classifiers. The resulting classifiers, which we call threshold decision lists act as follows. Some points of the data set to be classified are given a particular classification according to a linear threshold function (or hyperplane). These are then removed from consideration, and the procedure is iterated until all points are classified. Geometrically, we can imagine that at each stage, points of the same classification are successively chopped off from the data set by a hyperplane. We analyse theoretically the generalization properties of data classification techniques that are based on the use of threshold decision lists and on the special subclass of multilevel threshold functions. We present bounds on the generalization error in a standard probabilistic learning framework. The primary focus in this paper is on obtaining generalization error bounds that depend on the levels of separation---or margins---achieved by the successive linear classifiers. We also improve and extend previously published theoretical bounds on the generalization ability of perceptron decision trees.

23 citations


Cites background or methods or result from "Learnability and the Vapnik-Chervon..."

  • ...The chopping procedure described above suggests that the use of threshold decision lists is fairly natural, if an iterative approach is to be taken to pattern classification....

    [...]

  • ...Following a form of the PAC model of computational learning theory (see Anthony and Biggs, 1992; Vapnik, 1998; Blumer et al., 1989), we assume that labeled data points (x,b) (where x ∈ Rn and b ∈ {0,1}) have been generated randomly (perhaps from some larger corpus of data) according to a fixed…...

    [...]

  • ...For similar results, see Vapnik and Chervonenkis (1971); Blumer et al. (1989); and Anthony and Bartlett (1999). then, for m≥ 8/ε, Pm(Q)≤ 2P2m(T )....

    [...]

  • ...The key probability results we employ are the following bounds, due respectively to Vapnik and Chervonenkis (1971) and Blumer et al. (1989) (see also Anthony and Bartlett, 1999): for any ε ∈ (0,1), Pm ({s ∈ Zm : there exists f ∈ H, erP( f )≥ ers( f )+ ε}) < 4ΠH(2m)e−mε 2/8, and, for m≥ 8/ε, Pm ({s…...

    [...]

  • ...Lower bounds on the VC-dimension would provide worst-case lower bounds on generalization error (see Ehrenfeucht et al., 1989; Anthony and Biggs, 1992; Anthony and Bartlett, 1999; Blumer et al., 1989)....

    [...]

Proceedings ArticleDOI
05 Jul 1995
TL;DR: Two uu-lille learuing models: teach er-clirecteci learninE and self-dlrectecl learning are taught, in both models, the learner tries to identify an unkuowu concept based on examples of the concept presented one at, a time.
Abstract: WF ?X]JIO~(’ th? l>~wt’r of teaChillg by StUdyiIlg two uu-lille learuing models: teach er-clirecteci learninE and self-dlrectecl learning. In both models, the learner tries to identify an unkuowu concept based on examples of the concept presented one at, a time. The learner predirts wheth~r each example is positive or negative with immediate feedback, and the ol)ject,ive is to minimize the uurnl)er of predict,iou mistakes. ThP examples are selected by the teacher in teacher-dlrectecl learning and hy tlhe learner itself in self-directed learning. R,oughly, teacher-directed learning represents the scenario in which a teacher teaches a class of learners, and self-directed learning represents the scenario in which a smart learnerasks questious and learns by itself. For all previolmly studied concept classes, the rnirrimum numl)er of mistalws in teacller-ciirectf ecl learning is always larger than that, in self-directed learning. This raises an mtermting question [.)t’ whrt, hrr teaching is helpful for all learners mrlu(ling the smart learner’. Assuming the existence of clue-way functioms, we construct com cept clahses for which the miuimum nurnher of mislakes is hnear in teacher-directed learning I,ut sllI>rrlJolyllorlllal m self-directed learning, cler~lc~llst,rt~tillg the power of a helpful teacher in a Iearmng process.

23 citations

Journal ArticleDOI
TL;DR: It is shown how T may be selected in a provably good fashion, the smallest set of points T ⊆ S such that the multiquadric interpolant of T is within δ of f over S .
Abstract: Multiquadric interpolation is a technique for interpolating nonuniform samples of multivariate functions, in order to enable a variety of operations such as data visualization. We are interested in computing sparse but approximate interpolants, i.e., approximate interpolants with few coefficients. Such interpolants are useful since (1) the cost of evaluating the interpolant scales directly with the number of nonzero coefficients, and (2) the principle of Occam's Razor 2 suggests that the interpolant with fewer coefficients better approximates the underlying function. Since the number of coefficients in a multiquadric interpolant is, as is to be expected, equal to the number of data points in the given set, the problem can be abstracted thus: given a set S of samples of a function f : R k → R , and an error tolerance δ, find the smallest set of points T ⊆ S such that the multiquadric interpolant of T is within δ of f over S . Using some recent results on sparse solutions of linear systems, we show how T may be selected in a provably good fashion.

23 citations

Journal ArticleDOI
TL;DR: It is shown that the sample size for reliable learning can be bounded above by a quantity independent of the number of outputs of the network.
Abstract: This paper applies the theory of probably approximately correct (PAC) learning to multiple-output feedforward threshold networks. It is shown that the sample size for reliable learning can be bounded above by a quantity independent of the number of outputs of the network.

23 citations

References
More filters
Book
01 Jan 1979
TL;DR: The second edition of a quarterly column as discussed by the authors provides a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book "Computers and Intractability: A Guide to the Theory of NP-Completeness,” W. H. Freeman & Co., San Francisco, 1979.
Abstract: This is the second edition of a quarterly column the purpose of which is to provide a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NP-Completeness,’’ W. H. Freeman & Co., San Francisco, 1979 (hereinafter referred to as ‘‘[G&J]’’; previous columns will be referred to by their dates). A background equivalent to that provided by [G&J] is assumed. Readers having results they would like mentioned (NP-hardness, PSPACE-hardness, polynomial-time-solvability, etc.), or open problems they would like publicized, should send them to David S. Johnson, Room 2C355, Bell Laboratories, Murray Hill, NJ 07974, including details, or at least sketches, of any new proofs (full papers are preferred). In the case of unpublished results, please state explicitly that you would like the results mentioned in the column. Comments and corrections are also welcome. For more details on the nature of the column and the form of desired submissions, see the December 1981 issue of this journal.

40,020 citations

Book
01 Jan 1968
TL;DR: The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid.
Abstract: A fuel pin hold-down and spacing apparatus for use in nuclear reactors is disclosed. Fuel pins forming a hexagonal array are spaced apart from each other and held-down at their lower end, securely attached at two places along their length to one of a plurality of vertically disposed parallel plates arranged in horizontally spaced rows. These plates are in turn spaced apart from each other and held together by a combination of spacing and fastening means. The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid. This apparatus is particularly useful in connection with liquid cooled reactors such as liquid metal cooled fast breeder reactors.

17,939 citations

Book
01 Jan 1973
TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.
Abstract: Provides a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition. The topics treated include Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

13,647 citations