scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Learnability and the Vapnik-Chervonenkis dimension

TL;DR: This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned.
Abstract: Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space En. The methods in this paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufficient conditions are provided for feasible learnability.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
Ricardo Vilalta1, Youssef Drissi1
TL;DR: This paper provides its own perspective view in which the goal is to build self-adaptive learners that improve their bias dynamically through experience by accumulating meta-knowledge, and provides a survey of meta-learning as reported by the machine-learning literature.
Abstract: Different researchers hold different views of what the term meta-learning exactly means. The first part of this paper provides our own perspective view in which the goal is to build self-adaptive learners (i.e. learning algorithms that improve their bias dynamically through experience by accumulating meta-knowledge). The second part provides a survey of meta-learning as reported by the machine-learning literature. We find that, despite different views and research lines, a question remains constant: how can we exploit knowledge about learning (i.e. meta-knowledge) to improve the performance of learning algorithms? Clearly the answer to this question is key to the advancement of the field and continues being the subject of intensive research.

1,052 citations


Cites background from "Learnability and the Vapnik-Chervon..."

  • ...The study draws an analogy between the role of the VC dimension (Blumer et al. (1989)) and the size of the family of hypothesis spaces |{H}|....

    [...]

Book
01 Jan 1998
TL;DR: This book attempts to give an overview of the different recent efforts to deal with covariate shift, a challenging situation where the joint distribution of inputs and outputs differs between the training and test stages.
Abstract: All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from this is the last candidate. next esc will revert to uncompleted text. he publisher. Overview Dataset shift is a challenging situation where the joint distribution of inputs and outputs differs between the training and test stages. Covariate shift is a simpler particular case of dataset shift where only the input distribution changes (covariate denotes input), while the conditional distribution of the outputs given the inputs p(y|x) remains unchanged. Dataset shift is present in most practical applications for reasons ranging from the bias introduced by experimental design, to the mere irreproducibility of the testing conditions at training time. For example, in an image classification task, training data might have been recorded under controlled laboratory conditions, whereas the test data may show different lighting conditions. In other applications, the process that generates data is in itself adaptive. Some of our authors consider the problem of spam email filtering: successful " spammers " will try to build spam in a form that differs from the spam the automatic filter has been built on. Dataset shift seems to have raised relatively little interest in the machine learning community until very recently. Indeed, many machine learning algorithms are based on the assumption that the training data is drawn from exactly the same distribution as the test data on which the model will later be evaluated. Semi-supervised learning and active learning, two problems that seem very similar to covariate shift have received much more attention. How do they differ from covariate shift? Semi-supervised learning is designed to take advantage of unlabeled data present at training time, but is not conceived to be robust against changes in the input distribution. In fact, one can easily construct examples of covariate shift for which common SSL strategies such as the " cluster assumption " will lead to disaster. In active learning the algorithm is asked to select from the available unlabeled inputs those for which obtaining the label will be most beneficial for learning. This is very relevant in contexts where labeling data is very costly, but active learning strategies 2 Contents are not specifically design for dealing with covariate shift. This book attempts to give an overview of the different recent efforts that are being …

1,037 citations


Cites background or methods from "Learnability and the Vapnik-Chervon..."

  • ..., 2005) or the size of the smallest ǫ-cover (Benedek and Itai, 1991), rather than VC dimension (Blumer et al., 1989; Kearns and Vazirani, 1994)....

    [...]

  • ...In particular, in the PAC model there is purposefully a complete disconnect between the data distribution D and the target function f being learned (Valiant, 1984; Blumer et al., 1989; Kearns and Vazirani, 1994)....

    [...]

Journal ArticleDOI
TL;DR: This paper considers the question of determining whether a function f has property P or is ε-far from any function with property P, and devise algorithms to test whether the underlying graph has properties such as being bipartite, k-Colorable, or having a clique of density p-Clique with respect to the vertex set.
Abstract: In this paper, we consider the question of determining whether a function f has property P or is e-far from any function with property P. A property testing algorithm is given a sample of the value of f on instances drawn according to some distribution. In some cases, it is also allowed to query f on instances of its choice. We study this question for different properties and establish some connections to problems in learning theory and approximation.In particular, we focus our attention on testing graph properties. Given access to a graph G in the form of being able to query whether an edge exists or not between a pair of vertices, we devise algorithms to test whether the underlying graph has properties such as being bipartite, k-Colorable, or having a p-Clique (clique of density p with respect to the vertex set). Our graph property testing algorithms are probabilistic and make assertions that are correct with high probability, while making a number of queries that is independent of the size of the graph. Moreover, the property testing algorithms can be used to efficiently (i.e., in time linear in the number of vertices) construct partitions of the graph that correspond to the property being tested, if it holds for the input graph.

1,027 citations


Cites background from "Learnability and the Vapnik-Chervon..."

  • ...On the other hand, the VC-dimension of gn is 2n-1 (since the set {0y : y E {0, 1}n-1}is shattered by gn)....

    [...]

  • ...…1 -o/2 over the additional sample, f is rejected. e In particular, the above proposition implies that if for every n, gn has polynomial (in n) VC-dimension [Vapnik and Chervonenkis 1971; Blumer et al. 1989],12 then g has a tester whose sample complexity is polynomial in n,1/E, and log(1/o)....

    [...]

  • ...Every class that is learnable with constant confidence using a sample of size poly(n/e) (and thus has a poly(n) VC dimension [Blumer et al. 1989]), is testable with a poly(n/e) z log(1/d) sample (in at most exponential time)....

    [...]

  • ...In particular, the above proposition implies that if for every n, ^n has polynomial (in n) VC-dimension [Vapnik and Chervonenkis 1971; Blumer et al. 1989],(12) then ^ has a tester whose sample complexity is polynomial in n, 1/e, and log(1/d)....

    [...]

  • ...By Blumer et al. [1989], learning this class requires a sample of size i(2n). e The impossibility of learning the function class in Proposition 3.3.1 is due to its exponential VC-dimension, (i.e., it is a pure information theoretic consider­ation)....

    [...]

Journal ArticleDOI
TL;DR: In this article, a generalization of the PAC learning model based on statistical decision theory is described, where the learner receives randomly drawn examples, each example consisting of an instance x in X and an outcome y in Y, and tries to find a hypothesis h : X < A, where h in H, that specifies the appropriate action a in A to take for each instance x, in order to minimize the expectation of a loss l(y,a).
Abstract: We describe a generalization of the PAC learning model that is based on statistical decision theory. In this model the learner receives randomly drawn examples, each example consisting of an instance x in X and an outcome y in Y , and tries to find a hypothesis h : X --< A , where h in H , that specifies the appropriate action a in A to take for each instance x , in order to minimize the expectation of a loss l(y,a). Here X, Y, and A are arbitrary sets, l is a real-valued function, and examples are generated according to an arbitrary joint distribution on X times Y . Special cases include the problem of learning a function from X into Y , the problem of learning the conditional probability distribution on Y given X (regression), and the problem of learning a distribution on X (density estimation). We give theorems on the uniform convergence of empirical loss estimates to true expected loss rates for certain hypothesis spaces H , and show how this implies learnability with bounded sample size, disregarding computational complexity. As an application, we give distribution-independent upper bounds on the sample size needed for learning with feedforward neural networks. Our theorems use a generalized notion of VC dimension that applies to classes of real-valued functions, adapted from Pollard''s work, and a notion of *capacity* and *metric dimension* for classes of functions that map into a bounded metric space. (Supersedes 89-30 and 90-52.) [Also in "Information and Computation", Vol. 100, No.1, September 1992]

1,025 citations

Journal Article
TL;DR: In this paper, the authors consider the question of determining whether a function f has property P or is e-far from any function with property P. In some cases, it is also allowed to query f on instances of its choice.
Abstract: In this paper, we consider the question of determining whether a function f has property P or is e-far from any function with property P. A property testing algorithm is given a sample of the value of f on instances drawn according to some distribution. In some cases, it is also allowed to query f on instances of its choice. We study this question for different properties and establish some connections to problems in learning theory and approximation.In particular, we focus our attention on testing graph properties. Given access to a graph G in the form of being able to query whether an edge exists or not between a pair of vertices, we devise algorithms to test whether the underlying graph has properties such as being bipartite, k-Colorable, or having a p-Clique (clique of density p with respect to the vertex set). Our graph property testing algorithms are probabilistic and make assertions that are correct with high probability, while making a number of queries that is independent of the size of the graph. Moreover, the property testing algorithms can be used to efficiently (i.e., in time linear in the number of vertices) construct partitions of the graph that correspond to the property being tested, if it holds for the input graph.

870 citations

References
More filters
Book
01 Jan 1979
TL;DR: The second edition of a quarterly column as discussed by the authors provides a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book "Computers and Intractability: A Guide to the Theory of NP-Completeness,” W. H. Freeman & Co., San Francisco, 1979.
Abstract: This is the second edition of a quarterly column the purpose of which is to provide a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NP-Completeness,’’ W. H. Freeman & Co., San Francisco, 1979 (hereinafter referred to as ‘‘[G&J]’’; previous columns will be referred to by their dates). A background equivalent to that provided by [G&J] is assumed. Readers having results they would like mentioned (NP-hardness, PSPACE-hardness, polynomial-time-solvability, etc.), or open problems they would like publicized, should send them to David S. Johnson, Room 2C355, Bell Laboratories, Murray Hill, NJ 07974, including details, or at least sketches, of any new proofs (full papers are preferred). In the case of unpublished results, please state explicitly that you would like the results mentioned in the column. Comments and corrections are also welcome. For more details on the nature of the column and the form of desired submissions, see the December 1981 issue of this journal.

40,020 citations

Book
01 Jan 1968
TL;DR: The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid.
Abstract: A fuel pin hold-down and spacing apparatus for use in nuclear reactors is disclosed. Fuel pins forming a hexagonal array are spaced apart from each other and held-down at their lower end, securely attached at two places along their length to one of a plurality of vertically disposed parallel plates arranged in horizontally spaced rows. These plates are in turn spaced apart from each other and held together by a combination of spacing and fastening means. The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid. This apparatus is particularly useful in connection with liquid cooled reactors such as liquid metal cooled fast breeder reactors.

17,939 citations

Book
01 Jan 1973
TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.
Abstract: Provides a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition. The topics treated include Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

13,647 citations