Open AccessProceedings Article
Function learning from interpolation
Martin Anthony,Peter L. Bartlett +1 more
- pp 211-221
TLDR
In this paper, a characterization of function classes that have this property, in terms of their "fat-shattering function" is derived, which is central to the problem of learning real-valued functions from random examples.Abstract:
In this paper, we study a statistical property of classes of real-valued functions that we call approximation from interpolated examples. We derive a characterization of function classes that have this property, in terms of their ‘fat-shattering function’, a notion that has proved useful in computational learning theory. The property is central to a problem of learning real-valued functions from random examples in which we require satisfactory performance from every algorithm that returns a function which approximately interpolates the training examples.read more
Citations
More filters
Journal ArticleDOI
The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network
TL;DR: Results in this paper show that if a large neural network is used for a pattern classification problem and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights.
Journal ArticleDOI
A Bayesian/Information Theoretic Model of Learning to Learn viaMultiple Task Sampling
TL;DR: It is argued that for many common machine learning problems, although in general the authors do not know the true (objective) prior for the problem, they do have some idea of a set of possible priors to which the true prior belongs.
Proceedings ArticleDOI
A PAC analysis of a Bayesian estimator
TL;DR: The paper uses the techniques to give the first PAC style analysis of a Bayesian inspired estimator of generalisation, the size of a ball which can be placed in the consistent region of parameter space, and the resulting bounds are independent of the complexity of the function class though they depend linearly on the dimensionality of the parameter space.
Proceedings ArticleDOI
Fat-shattering and the learnability of real-valued functions
TL;DR: It is shown that, given some restrictions on the noise, a function class is learnable in this model if and only if its fat-shattering function is finite, and analogous results in an agnostic setting, where there is no assumption of an underlying function class.
Journal ArticleDOI
Enlarging the Margins in Perceptron Decision Trees
TL;DR: It is proved that other quantities can be as relevant to reduce their flexibility and combat overfitting to provide an upper bound on the generalization error which depends both on the size of the tree and on the margin of the decision nodes.
References
More filters
Book ChapterDOI
On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities
TL;DR: This chapter reproduces the English translation by B. Seckler of the paper by Vapnik and Chervonenkis in which they gave proofs for the innovative results they had obtained in a draft form in July 1966 and announced in 1968 in their note in Soviet Mathematics Doklady.
Book
Convergence of stochastic processes
TL;DR: In this paper, the authors define a functional on Stochastic Processes as random functions and propose a uniform convergence of empirical measures in Euclidean spaces, based on the notion of convergence in distribution.
Journal ArticleDOI
Learnability and the Vapnik-Chervonenkis dimension
TL;DR: This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned.
Journal ArticleDOI
Decision theoretic generalizations of the PAC model for neural net and other learning applications
TL;DR: In this article, a generalization of the PAC learning model based on statistical decision theory is described, where the learner receives randomly drawn examples, each example consisting of an instance x in X and an outcome y in Y, and tries to find a hypothesis h : X < A, where h in H, that specifies the appropriate action a in A to take for each instance x, in order to minimize the expectation of a loss l(y,a).
Journal ArticleDOI
Efficient distribution-free learning of probabilistic concepts
TL;DR: A model of machine learning in which the concept to be learned may exhibit uncertain or probabilistic behavior is investigated, and an underlying theory of learning p-concepts is developed in detail.