scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Learnability and the Vapnik-Chervonenkis dimension

TL;DR: This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned.
Abstract: Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space En. The methods in this paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufficient conditions are provided for feasible learnability.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings Article
26 Jun 2015
TL;DR: The Recursive Teaching Dimension of a concept classC is a complexity parameter referring to the worst-case number of labelled examples needed to learn any target concept inC from a teacher following the recursive teaching model, and is the first teaching complexity notion for which interesting relationships to the VC dimension (VCD) have been established.
Abstract: The Recursive Teaching Dimension (RTD) of a concept classC is a complexity parameter referring to the worst-case number of labelled examples needed to learn any target concept inC from a teacher following the recursive teaching model. It is the first teaching complexity notion for which interesting relationships to the VC dimension (VCD) have been established. In particular, for finite maximum classes of a given VCD d, the RTD equals d. To date, there is no concept class known for which the ratio of RTD over VCD exceeds 3=2. However, the only known upper bound on RTD in terms of VCD is exponential in the VCD and depends on the size of the concept class. We pose the following question: is the RTD upper-bounded by a function that grows only linearly in the VCD? Answering this question would further our understanding of the relationships between the complexity of teaching and the complexity of learning from randomly chosen examples. In addition, the answer to this question, whether positive or negative, is known to have implications on the study of the long-standing open sample compression conjecture, which claims that every concept class of VCD d has a sample compression scheme in which samples for concepts in the class are compressed to subsets of size no larger than d.

19 citations


Cites background from "Learnability and the Vapnik-Chervon..."

  • ...In PAC-learning, for example, the information complexity is upper-bounded and lower-bounded by functions that are linear in the VC dimension (VCD) of the concept class (Blumer et al., 1989)....

    [...]

Journal ArticleDOI
09 Jul 2021
TL;DR: In this article, the authors performed simulations and theoretical analysis of the quantum circuit learning problem with hardware-efficient ansatz and showed that the expressibility and generalization error scaling of the ansatz saturate when the circuit depth increases.
Abstract: Applying quantum processors to model a high-dimensional function approximator is a typical method in quantum machine learning with potential advantage. It is conjectured that the unitarity of quantum circuits provides possible regularization to avoid overfitting. However, it is not clear how the regularization interplays with the expressibility under the limitation of current Noisy-Intermediate Scale Quantum devices. In this article, we perform simulations and theoretical analysis of the quantum circuit learning problem with hardware-efficient ansatz. Thorough numerical simulations show that the expressibility and generalization error scaling of the ansatz saturate when the circuit depth increases, implying the automatic regularization to avoid the overfitting issue in the quantum circuit learning scenario. This observation is supported by the theory on PAC learnability, which proves that VC dimension is upper bounded due to the locality and unitarity of the hardware-efficient ansatz. Our study provides supporting evidence for automatic regularization by unitarity to suppress overfitting and guidelines for possible performance improvement under hardware constraints.

19 citations

Book ChapterDOI
TL;DR: In this paper, the quality of sample compression schemes and of teaching sets for classes of low VC-dimension was studied, and the best known upper bounds for both parameters were log(m), while the best lower bounds are linear in d.
Abstract: In this work we study the quantitative relation between VC-dimension and two other basic parameters related to learning and teaching. Namely, the quality of sample compression schemes and of teaching sets for classes of low VC-dimension. Let C be a binary concept class of size m and VC-dimension d. Prior to this work, the best known upper bounds for both parameters were log(m), while the best lower bounds are linear in d. We present significantly better upper bounds on both as follows. Set k = O(d2 d loglog | C | ).

19 citations

Dissertation
01 Jan 2006
TL;DR: This thesis presents a hybrid sample-based robust optimization methodology for constructing adaptability in multi-stage optimization problems, that is both tractable and also flexible, offering a hierarchy of adaptability.
Abstract: Optimization under uncertainty is a central ingredient for analyzing and designing systems with incomplete information. This thesis addresses uncertainty in optimization, in a dynamic framework where information is revealed sequentially, and future decisions are adaptable, i.e., they depend functionally on the information revealed in the past. Such problems arise in applications where actions are repeated over a time horizon (e.g., portfolio management, or dynamic scheduling problems), or that have multiple planning stages (e.g., network design). The first part of the thesis focuses on the robust optimization approach to systems with uncertainty. Unlike the probability-driven stochastic programming approach, robust optimization is built on deterministic set-based formulations of uncertainty. This thesis seeks to place Robust Optimization within a dynamic framework. In particular, we introduce the notion of finite adaptability. Using geometric results, we characterize the benefits of adaptability, and use these theoretical results to design efficient algorithms for finding near-optimal protocols. Among the novel contributions of the work are the capacity to accommodate discrete variables, and the development of a hierarchy of adaptability. The second part of the thesis takes a data-driven view to uncertainty. The central questions are (a) how can we construct adaptability in multi-stage optimization problems given only data, and (b) what feasibility guarantees can we provide. Multistage Stochastic Optimization typically requires exponentially many data points. Robust Optimization, on the other hand, has a very limited ability to address multi-stage optimization in an adaptable manner. We present a hybrid sample-based robust optimization methodology for constructing adaptability in multi-stage optimization problems, that is both tractable and also flexible, offering a hierarchy of adaptability. We prove polynomial upper bounds on sample complexity. We further extend our results to multi-stage problems with integer variables in the future stages. We illustrate the ideas above on several problems in Network Design, and Portfolio Optimization. The last part of the thesis focuses on an application of adaptability, in particular, the ideas of finite adaptability from the first part of the thesis, to the problem of air traffic control. The main problem is to sequentially schedule the departures, routes, ground-holding, and air-holding, for every flight over the national air space (NAS). The schedule seeks to minimize the aggregate delay incurred, while satisfying capacity constraints that specify the maximum number of flights that can take off or land at a particular airport, or fly over the same sector of the NAS at any given time. These capacities are impacted by the weather conditions. Since we receive an initial weather forecast, and then updates throughout the day, we naturally have a multistage optimization problem, with sequentially revealed uncertainty. We show that finite adaptability is natural, since the scheduling problem is inherently finite, and furthermore the uncertainty set is low-dimensional. We illustrate both the applicability of finite adaptability, and also its effectiveness, through several examples. Thesis Supervisor: Dimitris Bertsimas Title: Boeing Professor of Operations Research

19 citations


Cites background from "Learnability and the Vapnik-Chervon..."

  • ..., [37]), the growth function can be bounded by a polynomial of the VC dimension....

    [...]

01 Jan 1991
TL;DR: A set of machine learning methods for automatically constructing letter-to-sound rules by analyzing a dictionary of words and their pronunciations are presented, showing that error-correcting output codes provide a domain-independent, algorithm-independent approach to multiclass learning problems.
Abstract: The task of mapping spelled English words into strings of phonemes and stresses ("reading aloud") has many practical applications. Several commercial systems perform this task by applying a knowledge base of expert-supplied letter-to-sound rules. This dissertation presents a set of machine learning methods for automatically constructing letter-to-sound rules by analyzing a dictionary of words and their pronunciations. Taken together, these methods provide a substantial performance improvement over the best commercial system--DECtalk from Digital Equipment Corporation. In a performance test, the learning methods were trained on a dictionary of 19,002 words. Then, human subjects were asked to compare the performance of the resulting letter-to-sound rules against the dictionary for an additional 1,000 words not used during training. In a blind procedure, the subjects rated the pronunciations of both the learned rules and the DECtalk rules according to whether they were noticably different from the dictionary pronunciation. The error rate for the learned rules was 28.8% (288 words noticeably different), while the error rate for the DECtalk rules was 44.3% (433 words noticeably different). If, instead of using human judges, were required that the pronunciations of the letter-to-sound rules exactly match the dictionary to be counted correct, then the error rate for our learned rules is 35.2% and the error rate for DECtalk is 63.6%. Similar results were observed at the level of individual letters, phonemes, and stresses. To achieve these results, several techniques were combined. The key learning technique represents the output classes by the codewords of an error-correcting code. Boolean concept learning methods, such as the standard ID3 decision-tree algorithm, can be applied to learn the individual bits of these codewords. This converts the muticlass learning problem into a number of boolean concept learning problems. This method is shown to be superior to several other methods: multiclass ID3, one-tree-per-class ID3, the domain-specific distributed code employed by T. Sejnowski and C. Rosenberg in their NETtalk system, and a method developed by D. Wolpert. Similar results in the domain of isolated-letter speech recognition with the backpropagation algorithm show that error-correcting output codes provide a domain-independent, algorithm-independent approach to multiclass learning problems.

19 citations

References
More filters
Book
01 Jan 1979
TL;DR: The second edition of a quarterly column as discussed by the authors provides a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book "Computers and Intractability: A Guide to the Theory of NP-Completeness,” W. H. Freeman & Co., San Francisco, 1979.
Abstract: This is the second edition of a quarterly column the purpose of which is to provide a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NP-Completeness,’’ W. H. Freeman & Co., San Francisco, 1979 (hereinafter referred to as ‘‘[G&J]’’; previous columns will be referred to by their dates). A background equivalent to that provided by [G&J] is assumed. Readers having results they would like mentioned (NP-hardness, PSPACE-hardness, polynomial-time-solvability, etc.), or open problems they would like publicized, should send them to David S. Johnson, Room 2C355, Bell Laboratories, Murray Hill, NJ 07974, including details, or at least sketches, of any new proofs (full papers are preferred). In the case of unpublished results, please state explicitly that you would like the results mentioned in the column. Comments and corrections are also welcome. For more details on the nature of the column and the form of desired submissions, see the December 1981 issue of this journal.

40,020 citations

Book
01 Jan 1968
TL;DR: The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid.
Abstract: A fuel pin hold-down and spacing apparatus for use in nuclear reactors is disclosed. Fuel pins forming a hexagonal array are spaced apart from each other and held-down at their lower end, securely attached at two places along their length to one of a plurality of vertically disposed parallel plates arranged in horizontally spaced rows. These plates are in turn spaced apart from each other and held together by a combination of spacing and fastening means. The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid. This apparatus is particularly useful in connection with liquid cooled reactors such as liquid metal cooled fast breeder reactors.

17,939 citations

Book
01 Jan 1973
TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.
Abstract: Provides a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition. The topics treated include Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

13,647 citations