scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Learnability and the Vapnik-Chervonenkis dimension

TL;DR: This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned.
Abstract: Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space En. The methods in this paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufficient conditions are provided for feasible learnability.

Content maybe subject to copyright    Report

Citations
More filters
01 Jan 1994
TL;DR: This work examines the complexity of testing different program constructs by defining a measure of testing complexity known as VCP-dimension, which is similar to the Vapnik?Chervonenkis dimension, and applying it to classes of programs, where all programs in a class share the same syntactic structure.
Abstract: We examine the complexity of testing different program constructs. We do this by defining a measure of testing complexity known as VCP-dimension, which is similar to the Vapnik-Chervonenkis dimension, and applying it to classes of programs, where all programs in a class share the same syntactic structure. VCP-dimension gives bounds on the number of test points needed to determine that a program is approximately correct, so by studying it for a class of programs we gain insight into the difficulty of testing the program construct represented by the class. We investigate the VCP-dimension of straight line code, if-then-else statements, and for loops. We also compare the VCP-dimension of nested and sequential if-then-else statements as well as that of two types of for loops with embedded if-then-else statements. Finally, we perform an empirical study to estimate the expected complexity of straight line code.

9 citations

Journal ArticleDOI
TL;DR: The first proper-learning algorithm of constant-dimensional decision trees and the first negative results in proper learning from membership and equivalence queries for many classes are given.
Abstract: We study the proper learnability of axis-parallel concept classes in the PAC-learning and exact-learning models. These classes include union of boxes, DNF, decision trees and multivariate polynomials. For constant-dimensional axis-parallel concepts C we show that the following problems have time complexities that are within a polynomial factor of each other. C is α-properly exactly learnable (with hypotheses of size at most α times the target size) from membership and equivalence queries. C is α-properly PAC learnable (without membership queries) under any product distribution. There is an α-approximation algorithm for the MINEQUIC problem (given a g ∈ C find a minimal size f ∈ C that is logically equivalent to g). In particular, if one has polynomial time complexity, they all do. Using this we give the first proper-learning algorithm of constant-dimensional decision trees and the first negative results in proper learning from membership and equivalence queries for many classes. For axis-parallel concepts over a nonconstant dimension we show that with the equivalence oracle (1) ⇒ (3). We use this to show that (binary) decision trees are not properly learnable in polynomial time (assuming P ≠ NP) and DNF is not se-properly learnable (e < 1) in polynomial time even with an NP-oracle (assuming Σ2P ≠ PNP).

9 citations

Posted Content
TL;DR: A new de-randomized PAC-Bayes margin bound for deterministic non-convex and non-smooth predictors, e.g., ReLU-nets is presented, which depends on a trade-off between the $L_2$-norm of the weights and the effective curvature of the predictor, avoids any dependency on the Lipschitz constant, and yields meaningful bounds with the increase in training set size.
Abstract: In spite of several notable efforts, explaining the generalization of deterministic non-smooth deep nets, e.g., ReLU-nets, has remained challenging. Existing approaches for deterministic non-smooth deep nets typically need to bound the Lipschitz constant of such deep nets but such bounds are quite large, may even increase with the training set size yielding vacuous generalization bounds. In this paper, we present a new family of de-randomized PAC-Bayes margin bounds for deterministic non-convex and non-smooth predictors, e.g., ReLU-nets. Unlike PAC-Bayes, which applies to Bayesian predictors, the de-randomized bounds apply to deterministic predictors like ReLU-nets. A specific instantiation of the bound depends on a trade-off between the (weighted) distance of the trained weights from the initialization and the effective curvature (`flatness') of the trained predictor. To get to these bounds, we first develop a de-randomization argument for non-convex but smooth predictors, e.g., linear deep networks (LDNs), which connects the performance of the deterministic predictor with a Bayesian predictor. We then consider non-smooth predictors which for any given input realized as a smooth predictor, e.g., ReLU-nets become some LDNs for any given input, but the realized smooth predictors can be different for different inputs. For such non-smooth predictors, we introduce a new PAC-Bayes analysis which takes advantage of the smoothness of the realized predictors, e.g., LDN, for a given input, and avoids dependency on the Lipschitz constant of the non-smooth predictor. After careful de-randomization, we get a bound for the deterministic non-smooth predictor. We also establish non-uniform sample complexity results based on such bounds. Finally, we present extensive empirical results of our bounds over changing training set size and randomness in labels.

9 citations


Cites background from "Learnability and the Vapnik-Chervon..."

  • ... the bound in Theorem 4 is that the result holds with probability (1 ) for any y, but the actual bound is different for different y, i.e., Theorem 4 is a non-uniform bound [Benedek and Itai, 1994, Blumer et al., 1989, Shalev-Shwartz and Ben-David, 2014]. At a high level, recall that uniform bounds take the form: with probability at least (1 ), for any predictor yin a hypothesis class H, i.e., y2H, we have ‘ 0( ...

    [...]

Journal ArticleDOI
TL;DR: Inspired by the way Quantified Boolean Formulas extend SAT formulas to model problems beyond NP, an extension of ASP is proposed that introduces quantifiers over stable models of programs and is named the new language ASP with Quantifiers (ASP(Q).
Abstract: Answer Set Programming (ASP) is a logic programming paradigm featuring a purely declarative language with comparatively high modeling capabilities. Indeed, ASP can model problems in NP in a compact and elegant way. However, modeling problems beyond NP with ASP is known to be complicated, on the one hand, and limited to problems in $\[\Sigma _2^P\]$ on the other. Inspired by the way Quantified Boolean Formulas extend SAT formulas to model problems beyond NP, we propose an extension of ASP that introduces quantifiers over stable models of programs. We name the new language ASP with Quantifiers (ASP(Q)). In the paper we identify computational properties of ASP(Q); we highlight its modeling capabilities by reporting natural encodings of several complex problems with applications in artificial intelligence and number theory; and we compare ASP(Q) with related languages. Arguably, ASP(Q) allows one to model problems in the Polynomial Hierarchy in a direct way, providing an elegant expansion of ASP beyond the class NP.

9 citations


Cites background from "Learnability and the Vapnik-Chervon..."

  • ...The VC dimension is a measure of the capacity of a space of functions that can be learned by a statistical classification algorithm (Blumer et al. 1989)....

    [...]

Proceedings ArticleDOI
08 Apr 2019
TL;DR: A sampling procedure which requires a linear pre-processing time O(|X|) is developed and it is proved that the sampling procedure can estimate the correlation loss of all clusterings in F using only a small number of labelled examples.
Abstract: We view data de-duplication as a clustering problem. Recently, [1] introduced a framework called restricted correlation clustering (RCC) to model de-duplication problems. Given a set X, an unknown target clustering C* of X and a class F of clusterings of X, the goal is to find a clustering C from the set F which minimizes the correlation loss. The clustering algorithm is allowed to interact with a domain expert by asking whether a pair of records correspond to the same entity or not. Main drawback of the algorithm developed by [1] is that the pre-processing step had a time complexity of theta (|X|2) (where X is the input set). In this paper, we make the following contributions. We develop a sampling procedure (based on locality sensitive hashing) which requires a linear pre-processing time O(|X|). We prove that our sampling procedure can estimate the correlation loss of all clusterings in F using only a small number of labelled examples. In fact, the number of labelled examples is independent of |X| and depends only on the complexity of the class F. Further we show that to sample one pair, with high probability our procedure makes a constant number of queries to the domain expert. We then perform an extensive empirical evaluation of our approach which shows the efficiency of our method.

9 citations

References
More filters
Book
01 Jan 1979
TL;DR: The second edition of a quarterly column as discussed by the authors provides a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book "Computers and Intractability: A Guide to the Theory of NP-Completeness,” W. H. Freeman & Co., San Francisco, 1979.
Abstract: This is the second edition of a quarterly column the purpose of which is to provide a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NP-Completeness,’’ W. H. Freeman & Co., San Francisco, 1979 (hereinafter referred to as ‘‘[G&J]’’; previous columns will be referred to by their dates). A background equivalent to that provided by [G&J] is assumed. Readers having results they would like mentioned (NP-hardness, PSPACE-hardness, polynomial-time-solvability, etc.), or open problems they would like publicized, should send them to David S. Johnson, Room 2C355, Bell Laboratories, Murray Hill, NJ 07974, including details, or at least sketches, of any new proofs (full papers are preferred). In the case of unpublished results, please state explicitly that you would like the results mentioned in the column. Comments and corrections are also welcome. For more details on the nature of the column and the form of desired submissions, see the December 1981 issue of this journal.

40,020 citations

Book
01 Jan 1968
TL;DR: The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid.
Abstract: A fuel pin hold-down and spacing apparatus for use in nuclear reactors is disclosed. Fuel pins forming a hexagonal array are spaced apart from each other and held-down at their lower end, securely attached at two places along their length to one of a plurality of vertically disposed parallel plates arranged in horizontally spaced rows. These plates are in turn spaced apart from each other and held together by a combination of spacing and fastening means. The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid. This apparatus is particularly useful in connection with liquid cooled reactors such as liquid metal cooled fast breeder reactors.

17,939 citations

Book
01 Jan 1973
TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.
Abstract: Provides a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition. The topics treated include Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

13,647 citations