scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Learnability and the Vapnik-Chervonenkis dimension

TL;DR: This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned.
Abstract: Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space En. The methods in this paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufficient conditions are provided for feasible learnability.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: This paper reviews recent developments in the non-media applications of data watermarking, which have emerged over the last decade as an exciting new sub-domain as well as looking at the new challenges of digital water marking that have arisen with the evolution of big data.
Abstract: Over the last 25 years, there has been much work on multimedia digital watermarking. In this domain, the primary limitation to watermark strength has been in its visibility. For multimedia watermarks, invisibility is defined in human terms (that is, in terms of human sensory limitations). In this paper, we review recent developments in the non-media applications of data watermarking, which have emerged over the last decade as an exciting new sub-domain. Since by definition, the intended receiver should be able to detect the watermark, we have to redefine invisibility in an acceptable way that is often application-specific and thus cannot be easily generalized. In particular, this is true when the data is not intended to be directly consumed by humans. For example, a loose definition of robustness might be in terms of the resilience of a watermark against normal host data operations, and of invisibility as resilience of the data interpretation against change introduced by the watermark. In this paper, we classify the data in terms of data mining rules on complex types of data such as time-series, symbolic sequences, data streams, and so forth. We emphasize the challenges involved in non-media watermarking in terms of common watermarking properties, including invisibility, capacity, robustness, and security. With the aid of a few examples of watermarking applications, we demonstrate these distinctions and we look at the latest research in this regard to make our argument clear and more meaningful. As the last aim, we look at the new challenges of digital watermarking that have arisen with the evolution of big data.

64 citations


Cites background from "Learnability and the Vapnik-Chervon..."

  • ...1The Vapnik-Chervonenkis dimension of a set V is the size of the longest index sequence I that is shattered by V [100]....

    [...]

Journal Article
TL;DR: In this paper, a poly(n, 1/e)-time algorithm was proposed for learning origin-centered halfspaces in the challenging malicious noise model, where an adversary may corrupt both the labels and the underlying distribution of examples.
Abstract: We give new algorithms for learning halfspaces in the challenging malicious noise model, where an adversary may corrupt both the labels and the underlying distribution of examples. Our algorithms can tolerate malicious noise rates exponentially larger than previous work in terms of the dependence on the dimension n, and succeed for the fairly broad class of all isotropic log-concave distributions. We give poly(n, 1/e)-time algorithms for solving the following problems to accuracy e: Learning origin-centered halfspaces in Rn with respect to the uniform distribution on the unit ball with malicious noise rate η = Ω(e2 / log(n/e)). (The best previous result was Ω(e / (n log(n/e))1/4).) Learning origin-centered halfspaces with respect to any isotropic log-concave distribution on Rn with malicious noise rate η = Ω(e3 / log2(n/e)). This is the first efficient algorithm for learning under isotropic log-concave distributions in the presence of malicious noise. We also give a poly(n,1/e)-time algorithm for learning origin-centered halfspaces under any isotropic log-concave distribution on Rn in the presence of adversarial label noise at rate η = Ω(e3 / log(1/e)). In the adversarial label noise setting (or agnostic model), labels can be noisy, but not example points themselves. Previous results could handle η = Ω(e) but had running time exponential in an unspecified function of 1/e. Our analysis crucially exploits both concentration and anti-concentration properties of isotropic log-concave distributions. Our algorithms combine an iterative outlier removal procedure using Principal Component Analysis together with "smooth" boosting.

64 citations

Proceedings Article
10 Mar 2019
TL;DR: In this paper, the adversarial examples model is extended to the real-valued setting, and the sample complexity is improved from O(1/ε 4 ) to O(ε 2 ) for any α > 0.
Abstract: We consider a model of robust learning in an adversarial environment. The learner gets uncorrupted training data with access to possible corruptions that may be effected by the adversary during testing. The learner's goal is to build a robust classifier, which will be tested on future adversarial examples. The adversary is limited to $k$ possible corruptions for each input. We model the learner-adversary interaction as a zero-sum game. This model is closely related to the adversarial examples model of Schmidt et al. (2018); Madry et al. (2017). Our main results consist of generalization bounds for the binary and multiclass classification, as well as the real-valued case (regression). For the binary classification setting, we both tighten the generalization bound of Feige, Mansour, and Schapire (2015), and are also able to handle infinite hypothesis classes. The sample complexity is improved from $O(\frac{1}{\epsilon^4}\log(\frac{|\mathcal{H}|}{\delta}))$ to $O\big(\frac{1}{\epsilon^2}(\sqrt{k \mathrm{VC}(\mathcal{H})}\log^{\frac{3}{2}+\alpha}(k\mathrm{VC}(\mathcal{H}))+\log(\frac{1}{\delta})\big)$ for any $\alpha > 0$. Additionally, we extend the algorithm and generalization bound from the binary to the multiclass and real-valued cases. Along the way, we obtain results on fat-shattering dimension and Rademacher complexity of $k$-fold maxima over function classes; these may be of independent interest. For binary classification, the algorithm of Feige et al. (2015) uses a regret minimization algorithm and an ERM oracle as a black box; we adapt it for the multiclass and regression settings. The algorithm provides us with near-optimal policies for the players on a given training sample.

64 citations

Journal ArticleDOI
TL;DR: Experimental results and comparisons indicate the good performance of the proposed scheme even in sequences with complicated content (object bending, occlusion), according to an adaptable neural-network architecture.
Abstract: In this paper, an unsupervised video object (VO) segmentation and tracking algorithm is proposed based on an adaptable neural-network architecture. The proposed scheme comprises: 1) a VO tracking module and 2) an initial VO estimation module. Object tracking is handled as a classification problem and implemented through an adaptive network classifier, which provides better results compared to conventional motion-based tracking algorithms. Network adaptation is accomplished through an efficient and cost effective weight updating algorithm, providing a minimum degradation of the previous network knowledge and taking into account the current content conditions. A retraining set is constructed and used for this purpose based on initial VO estimation results. Two different scenarios are investigated. The first concerns extraction of human entities in video conferencing applications, while the second exploits depth information to identify generic VOs in stereoscopic video sequences. Human face/ body detection based on Gaussian distributions is accomplished in the first scenario, while segmentation fusion is obtained using color and depth information in the second scenario. A decision mechanism is also incorporated to detect time instances for weight updating. Experimental results and comparisons indicate the good performance of the proposed scheme even in sequences with complicated content (object bending, occlusion).

64 citations


Cites methods from "Learnability and the Vapnik-Chervon..."

  • ...Taking the previously mentioned issues into consideration, we propose an unsupervised VO segmentation and tracking scheme that incorporates an adaptable neural-network architecture....

    [...]

Proceedings ArticleDOI
30 Oct 1989
TL;DR: The probably approximately correct (PAC) model of learning from examples is generalized, and a distribution-independent uniform convergence result for certain classes of functions computed by feedforward neural nets is obtained.
Abstract: The probably approximately correct (PAC) model of learning from examples is generalized. The problem of learning functions from a set X into a set Y is considered, assuming only that the examples are generated by independent draws according to an unknown probability measure on X*Y. The learner's goal is to find a function in a given hypothesis space of functions from X into Y that on average give Y values that are close to those observed in random examples. The discrepancy is measured by a bounded real-valued loss function. The average loss is called the error of the hypothesis. A theorem on the uniform convergence of empirical error estimates to true error rates is given for certain hypothesis spaces, and it is shown how this implies learnability. A generalized notion of VC dimension that applies to classes of real-valued functions and a notion of capacity for classes of functions that map into a bounded metric space are given. These measures are used to bound the rate of convergence of empirical error estimates to true error rates, giving bounds on the sample size needed for learning using hypotheses in these classes. As an application, a distribution-independent uniform convergence result for certain classes of functions computed by feedforward neural nets is obtained. Distribution-specific uniform convergence results for classes of functions that are uniformly continuous on average are also obtained. >

63 citations

References
More filters
Book
01 Jan 1979
TL;DR: The second edition of a quarterly column as discussed by the authors provides a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book "Computers and Intractability: A Guide to the Theory of NP-Completeness,” W. H. Freeman & Co., San Francisco, 1979.
Abstract: This is the second edition of a quarterly column the purpose of which is to provide a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NP-Completeness,’’ W. H. Freeman & Co., San Francisco, 1979 (hereinafter referred to as ‘‘[G&J]’’; previous columns will be referred to by their dates). A background equivalent to that provided by [G&J] is assumed. Readers having results they would like mentioned (NP-hardness, PSPACE-hardness, polynomial-time-solvability, etc.), or open problems they would like publicized, should send them to David S. Johnson, Room 2C355, Bell Laboratories, Murray Hill, NJ 07974, including details, or at least sketches, of any new proofs (full papers are preferred). In the case of unpublished results, please state explicitly that you would like the results mentioned in the column. Comments and corrections are also welcome. For more details on the nature of the column and the form of desired submissions, see the December 1981 issue of this journal.

40,020 citations

Book
01 Jan 1968
TL;DR: The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid.
Abstract: A fuel pin hold-down and spacing apparatus for use in nuclear reactors is disclosed. Fuel pins forming a hexagonal array are spaced apart from each other and held-down at their lower end, securely attached at two places along their length to one of a plurality of vertically disposed parallel plates arranged in horizontally spaced rows. These plates are in turn spaced apart from each other and held together by a combination of spacing and fastening means. The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid. This apparatus is particularly useful in connection with liquid cooled reactors such as liquid metal cooled fast breeder reactors.

17,939 citations

Book
01 Jan 1973
TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.
Abstract: Provides a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition. The topics treated include Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

13,647 citations