scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Learnability and the Vapnik-Chervonenkis dimension

TL;DR: This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned.
Abstract: Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space En. The methods in this paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufficient conditions are provided for feasible learnability.

Content maybe subject to copyright    Report

Citations
More filters
Dissertation
01 Jan 1989
TL;DR: This thesis begins by presenting a new learning algorithm for a particular problem within the pac learning model: learning submodules of the free Z-module Zk, and proves that this algorithm achieves probable approximate correctness, and indeed, that it is within a log log factor of optimal in a related, but more stringent model of learning, on-line mistake bounded learning.
Abstract: In the past several years, there has been a surge of interest in computational learning theory-the formal (as opposed to empirical) study of learning algorithms. One major cause for this interest was the model of probably approximately correct learning, or pac learning, introduced by Valiant in 1984. This thesis begins by presenting a new learning algorithm for a particular problem within that model: learning submodules of the free Z-module Zk. We prove that this algorithm achieves probable approximate correctness, and indeed, that it is within a log log factor of optimal in a related, but more stringent model of learning, on-line mistake bounded learning. We then proceed to examine the influence of noisy data on pac learning algorithms in general. Previously it has been shown that it is possible to tolerate large amounts of random classification noise, but only a very small amount of a very malicious sort of noise. We show that similar results can be obtained for models of noise in between the previously studied models: a large amount of malicious classification noise can be tolerated, but only a small amount of random attribute noise. Next, we overcome a major limitation of the pac learning model by introducing a variant model with a more powerful teacher. We show how to learn any concept representable as a boolean function, with the help of a teacher who breaks the concept into subconcepts and teaches one subconcept per lesson. The learner outputs not the unknown boolean circuit, but rather a program which, on a given input, either produces the same answer as the unknown boolean circuit would, or else says "I don't know." Thus, unlike many learning programs, the output of this learning procedure is reliable. Furthermore, with high probability the output program is nearly always useful in that it says "I don't know" on only a small fraction of the domain. Finally, we look at a new model for an older learning problem, inductive inference. This new model combines certain features of the traditional model of Gold for inductive inference together with the concern of the Valiant model for efficient computation and also with notions of Bayesianism. The result is a model that captures certain qualitative aspects of the classic scientific method.

12 citations


Cites background from "Learnability and the Vapnik-Chervon..."

  • ...[10] showed that the sample complexity of C is bounded by...

    [...]

01 Jan 1994
TL;DR: Two types of algorithms for selecting relevant examples that have been developed in the context of computation learning theory are discussed and some of their proven properties are pointed to and some possible future implications are suggested.
Abstract: We discuss two types of algorithms for selecting relevant examples that have been developed in the context of computation learning theory. The examples are selected out of a stream of examples that are generated independently at random. The first two algorithms are the so-called "boosting" algorithms of Sehapire [Schapire, 1990] and Freund [Freund, 1990], and the Query-by-Committee algorithm of Seung [Seung et al., 1992]. We describe the algorithms and some of their proven properties, point to some of their commonalities, and suggest some possible future implications.

12 citations

Book ChapterDOI
01 Jan 2015
TL;DR: It is shown that maximum classes can be characterised by a local-connectivity property of the graph obtained by viewing the class as a cubical complex, and a negative embedding result is proved which demonstrates VC-d classes that cannot be embedded in any maximum class of VC dimension lower than 2d.
Abstract: One of the earliest conjectures in computational learning theory—the Sample Compression conjecture—asserts that concept classes (equivalently set systems) admit compression schemes of size linear in their VC dimension.

12 citations


Cites background from "Learnability and the Vapnik-Chervon..."

  • ...The lemma has found many applications in such diverse fields as computational learning theory and empirical process theory [5, 31, 3, 12, 2, 7, 30], coding theory [11], computational geometry [15, 22, 6, 16], road network routing [1], and automatic verification [6]; in the former it is the avenue through which the VC dimension enters into generalisation error bounds and the theoretical foundations of learnability....

    [...]

Dissertation
01 Jan 2006
TL;DR: This dissertation proposes a novel rectangle-based and graph-based rule learning approach that finds rule sets with small cardinality and considers Nearest Rectangle learning to explore the data classification capacity of generalized rectangles.
Abstract: The ultimate goal of data mining is to extract knowledge from massive data. Knowledge is ideally represented as human-comprehensible patterns from which end-users can gain intuitions and insights. Axis-parallel hyper-rectangles provide interpretable generalizations for multi-dimensional data points with numerical attributes. In this dissertation, we study the fundamental problem of rectangle-based discriminative data generalization in the context of several useful data mining applications: cluster description, rule learning, and Nearest Rectangle classification. Clustering is one of the most important data mining tasks. However, most clustering methods output sets of points as clusters and do not generalize them into interpretable patterns. We perform a systematic study of cluster description, where we propose novel description formats leading to enhanced expressive power and introduce novel description problems specifying different trade-offs between interpretability and accuracy. We also present efficient heuristic algorithms for the introduced problems in the proposed formats. If-then rules are known to be the most expressive and human-comprehensible representation of knowledge. Rectangles are essentially a special type of rules with all the attributional conditions specified whereas normal rules appear more compact. Decision rules can be used for both data classification and data description depending on whether the focus is on future data or existing data. For either scenario, smaller rule sets are desirable. We propose a novel rectangle-based and graph-based rule learning approach that finds rule sets with small cardinality. We also consider Nearest Rectangle learning to explore the data classification capacity of generalized rectangles. We show that by enforcing the so-called "right of inference", Nearest Rectangle learning can potentially become an interpretable hybrid inductive learning method with competitive accuracy. Keywords. discriminative generalization; hyper-rectangle; cluster description; Minimum Rule Set; Minimum Consistent Subset Cover; Nearest Rectangle learning.

12 citations

Journal ArticleDOI
TL;DR: A simple proof is given that the perceptron learning algorithm for finding a linearly separable boolean function consistent with a sample of such a function is not efficient.
Abstract: The perceptron learning algorithm yields quite naturally an algorithm for finding a linearly separable boolean function consistent with a sample of such a function. Using the idea of a specifying sample, we give a simple proof that this algorithm is not efficient, in general.

12 citations


Cites background or methods from "Learnability and the Vapnik-Chervon..."

  • ...We remark that there is a polynomial time consistent-hypothesis-finder for BPn: rephrase the problem as a linear programme and use Karmarkar's algorithm (see [3])....

    [...]

  • ...Thus, the perceptron algorithm (for any learning constant v) can be used as a consistent-hypothesis-finder (using terminology from [3])....

    [...]

  • ...As indicated in [3], given t € BPn and a sample x = (*i,x2,....

    [...]

References
More filters
Book
01 Jan 1979
TL;DR: The second edition of a quarterly column as discussed by the authors provides a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book "Computers and Intractability: A Guide to the Theory of NP-Completeness,” W. H. Freeman & Co., San Francisco, 1979.
Abstract: This is the second edition of a quarterly column the purpose of which is to provide a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NP-Completeness,’’ W. H. Freeman & Co., San Francisco, 1979 (hereinafter referred to as ‘‘[G&J]’’; previous columns will be referred to by their dates). A background equivalent to that provided by [G&J] is assumed. Readers having results they would like mentioned (NP-hardness, PSPACE-hardness, polynomial-time-solvability, etc.), or open problems they would like publicized, should send them to David S. Johnson, Room 2C355, Bell Laboratories, Murray Hill, NJ 07974, including details, or at least sketches, of any new proofs (full papers are preferred). In the case of unpublished results, please state explicitly that you would like the results mentioned in the column. Comments and corrections are also welcome. For more details on the nature of the column and the form of desired submissions, see the December 1981 issue of this journal.

40,020 citations

Book
01 Jan 1968
TL;DR: The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid.
Abstract: A fuel pin hold-down and spacing apparatus for use in nuclear reactors is disclosed. Fuel pins forming a hexagonal array are spaced apart from each other and held-down at their lower end, securely attached at two places along their length to one of a plurality of vertically disposed parallel plates arranged in horizontally spaced rows. These plates are in turn spaced apart from each other and held together by a combination of spacing and fastening means. The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid. This apparatus is particularly useful in connection with liquid cooled reactors such as liquid metal cooled fast breeder reactors.

17,939 citations

Book
01 Jan 1973
TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.
Abstract: Provides a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition. The topics treated include Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

13,647 citations