scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Learnability and the Vapnik-Chervonenkis dimension

TL;DR: This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned.
Abstract: Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space En. The methods in this paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufficient conditions are provided for feasible learnability.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: It is shown that any class of functions that can be inference from examples with probability exceeding 1/2 can be inferred deterministically, and that for probabilities p there is a discrete hierarchy of inferability parameterized by p.
Abstract: Inductive inference machines construct programs for total recursive functions given only example values of the functions. Probabilistic inductive inference machines are defined, and for various criteria of successful inference, it is asked whether a probabilistic inductive inference machine can infer larger classes of functions if the inference criterion is relaxed to allow inference with probability at least p, (0

107 citations

Journal ArticleDOI
TL;DR: The DL-Learner framework is described, which supports supervised machine learning using OWL and RDF for background knowledge representation and includes several algorithm implementations, usage examples and has applications building on top of the framework.

105 citations


Cites methods from "Learnability and the Vapnik-Chervon..."

  • ...Whereas early approaches of applying machine learning techniques to Description Logics focused on the Probably Approximately Correct (PAC) [31] learnability of concept description languages later several supervised and unsupervised methods arose....

    [...]

Journal ArticleDOI
TL;DR: It is shown that if for anym-point subset $$Y \subseteq X$$ the number of distinct subsets induced by ℛ onY is bounded byO(md) for a fixed integerd, then there are improved upper bounds on the size of ε-approximations for (X,ℛ).
Abstract: Let (X, ℛ) be a set system on ann-point setX. For a two-coloring onX, itsdiscrepancy is defined as the maximum number by which the occurrences of the two colors differ in any set in ℛ. We show that if for anym-point subset $$Y \subseteq X$$ the number of distinct subsets induced by ℛ onY is bounded byO(m d) for a fixed integerd, then there is a coloring with discrepancy bounded byO(n 1/2−1/2d(logn)1+1/2d ). Also if any subcollection ofm sets of ℛ partitions the points into at mostO(m d) classes, then there is a coloring with discrepancy at mostO(n 1/2−1/2dlogn). These bounds imply improved upper bounds on the size of e-approximations for (X, ℛ). All the bounds are tight up to polylogarithmic factors in the worst case. Our results allow to generalize several results of Beck bounding the discrepancy in certain geometric settings to the case when the discrepancy is taken relative to an arbitrary measure.

105 citations

Proceedings Article
01 Jan 1996
TL;DR: This paper performs experiments suggested by the formal results for Adaboost and C4:5 within the weak learning framework, and argues through experimental results that the theory must be understood in terms of a measure of a boosting algorithm's behavior called its advantage sequence.
Abstract: There has long been a chasm between theoretical models of machine learning and practical machine learning algorithms. For instance, empirically successful algorithms such as C4:5 and backpropagation have not met the criteria of the PAC model and its variants. Conversely, the algorithms suggested by computational learning theory are usually too limited in various ways to nd wide application. The theoretical status of decision tree learning algorithms is a case in point: while it has been proven that C4:5 (and all reasonable variants of it) fails to meet the PAC model criteria [2], other recently proposed decision tree algorithms that do have non-trivial performance guarantees unfortunately require membership queries [6, 13]. Two recent developments have narrowed this gap between theory and practice|not for the PAC model, but for the related model known as weak learning or boosting . First, an algorithm called Adaboost was proposed that meets the formal criteria of the boosting model and is also competitive in practice [10]. Second, the basic algorithms underlying the popular C4:5 and CART programs have also very recently been shown to meet the formal criteria of the boosting model [12]. Thus, it seems plausible that the weak learning framework may provide a setting for interaction between formal analysis and machine learning practice that is lacking in other theoretical models. Our aim in this paper is to push this interaction further in light of these recent developments. In particular, we perform experiments suggested by the formal results for Adaboost and C4:5 within the weak learning framework. We concentrate on two particularly intriguing issues. First, the theoretical boosting results for top-down decision tree algorithms such as C4:5 [12] suggest that a new splitting criterion may result in trees that are smaller and more accurate than those obtained using the usual information gain. We con rm this suggestion experimentally. Second, a super cial interpretation of the theoretical results suggests that Adaboost should vastly outperform C4:5. This is not the case in practice, and we argue through experimental results that the theory must be understood in terms of a measure of a boosting algorithm's behavior called its advantage sequence. We compare the advantage sequences for C4:5 and Adaboost in a number of experiments. We nd that these sequences have qualitatively different behavior that explains in large part the discrepancies between empirical performance and the theoretical results. Brie y, we nd that although C4:5 and Adaboost are both boosting algorithms, Adaboost creates successively \harder" ltered distributions, while C4:5 creates successively \easier" ones, in a sense that will be made precise.

104 citations

Satyen Kale1
01 Jan 2007
TL;DR: A single meta-algorithm is presented which unifies all known applications of the Multiplicative Weights method in the design of efficient algorithms for various optimization problems and derives the following algorithmic applications: fast algorithms for approximately solving several families of semidefinite programs which beat interior point methods.
Abstract: Algorithms based on convex optimization, especially linear and semidefinite programming, are ubiquitous in Computer Science. While there are polynomial time algorithms known to solve such problems, quite often the running time of these algorithms is very high. Designing simpler and more efficient algorithms is important for practical impact. In this thesis, we explore applications of the Multiplicative Weights method in the design of efficient algorithms for various optimization problems. This method, which was repeatedly discovered in quite diverse fields, is an algorithmic technique which maintains a distribution on a certain set of interest, and updates it iteratively by multiplying the probability mass of elements by suitably chosen factors based on feedback obtained by running another algorithm on the distribution. We present a single meta-algorithm which unifies all known applications of this method in a common framework. Next, we generalize the method to the setting of symmetric matrices rather than real numbers. We derive the following applications of the resulting Matrix Multiplicative Weights algorithm: (1) The first truly general, combinatorial, primal-dual method for designing efficient algorithms for semidefinite programming. Using these techniques, we obtain significantly faster algorithms for obtaining O( logn ) approximations to various graph partitioning problems, such as S PARSEST CUT, BALANCED SEPARATOR in both directed and undirected weighted graphs, and constraint satisfaction problems such as MIN UNCUT and MIN 2CNF Deletion. (2) An O( n3) time derandomization of the Alon-Roichman construction of expanders using Cayley graphs. The algorithm yields a set of O (log n) elements which generates an expanding Cayley graph in any group of n elements. (3) An O (n3) time deterministic O(log n) approximation algorithm for the quantum hypergraph covering problem. (4) An alternative proof of a result of Aaronson that the γ-fat-shattering dimension of quantum states on n qubits is O( ng2 ). Using our framework for the classical Multiplicative Weights Update method, we derive the following algorithmic applications: (1) Fast algorithms for approximately solving several families of semidefinite programs which beat interior point methods. Our algorithms rely on eigenvector computations, which are very efficient in practice compared to the Cholesky decompositions needed by interior point methods. We also give a matrix sparsification algorithm to speed up the eigenvector computation using the Lanczos iteration. (2) O( logn ) approximation to the SPARSEST CUT and the BALANCED SEPARATOR problems in undirected weighted graphs in O(n 2) time by embedding expander flows in the graph. This improves upon the previous O(n4.5) time algorithm of Arora, Rao, and Vazirani, which was based on semidefinite programming.

103 citations


Additional excerpts

  • ...Just as the VC dimension enables us to bound sample complexity in standard PAC learning (see Blumer et al [25]), the fat-shattering dimension enables us to bound the sample complexity of learning p-concepts, by the results of Kearns and Schapire [61], Anthony and Bartlett [11], and Bartlett and Long [20]....

    [...]

References
More filters
Book
01 Jan 1979
TL;DR: The second edition of a quarterly column as discussed by the authors provides a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book "Computers and Intractability: A Guide to the Theory of NP-Completeness,” W. H. Freeman & Co., San Francisco, 1979.
Abstract: This is the second edition of a quarterly column the purpose of which is to provide a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NP-Completeness,’’ W. H. Freeman & Co., San Francisco, 1979 (hereinafter referred to as ‘‘[G&J]’’; previous columns will be referred to by their dates). A background equivalent to that provided by [G&J] is assumed. Readers having results they would like mentioned (NP-hardness, PSPACE-hardness, polynomial-time-solvability, etc.), or open problems they would like publicized, should send them to David S. Johnson, Room 2C355, Bell Laboratories, Murray Hill, NJ 07974, including details, or at least sketches, of any new proofs (full papers are preferred). In the case of unpublished results, please state explicitly that you would like the results mentioned in the column. Comments and corrections are also welcome. For more details on the nature of the column and the form of desired submissions, see the December 1981 issue of this journal.

40,020 citations

Book
01 Jan 1968
TL;DR: The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid.
Abstract: A fuel pin hold-down and spacing apparatus for use in nuclear reactors is disclosed. Fuel pins forming a hexagonal array are spaced apart from each other and held-down at their lower end, securely attached at two places along their length to one of a plurality of vertically disposed parallel plates arranged in horizontally spaced rows. These plates are in turn spaced apart from each other and held together by a combination of spacing and fastening means. The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid. This apparatus is particularly useful in connection with liquid cooled reactors such as liquid metal cooled fast breeder reactors.

17,939 citations

Book
01 Jan 1973
TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.
Abstract: Provides a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition. The topics treated include Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

13,647 citations