Open AccessProceedings Article
Progressive mixture rules are deviation suboptimal
Jean-Yves Audibert
- Vol. 20, pp 41-48
Reads0
Chats0
TLDR
This work shows that, surprisingly, for appropriate reference sets G, the deviation convergence rate of the progressive mixture rule is no better than Cst/√n: it fails to achieve the expected CSt/n.Abstract:
We consider the learning task consisting in predicting as well as the best function in a finite reference set G up to the smallest possible additive term. If R(g) denotes the generalization error of a prediction function g, under reasonable assumptions on the loss function (typically satisfied by the least square loss when the output is bounded), it is known that the progressive mixture rule ĝ satisfies
ER(ĝ) ≤ ming∈G R(g) + Cst log|G|/n, (1)
where n denotes the size of the training set, and E denotes the expectation w.r.t. the training set distribution. This work shows that, surprisingly, for appropriate reference sets G, the deviation convergence rate of the progressive mixture rule is no better than Cst/√n: it fails to achieve the expected Cst/n. We also provide an algorithm which does not suffer from this drawback, and which is optimal in both deviation and expectation convergence rates.read more
Citations
More filters
Journal ArticleDOI
Inconsistency of Bayesian inference for misspecified linear models, and a proposal for repairing it
Peter Grünwald,Thijs van Ommen +1 more
TL;DR: It is empirically show that Bayesian inference can be inconsistent under misspecification in simple linear regression problems, both in a model averaging/selection and in a Bayesian ridge regression setting.
Posted Content
Orthogonal Statistical Learning.
TL;DR: By focusing on excess risk rather than parameter estimation, this work can give guarantees under weaker assumptions than in previous works and accommodate the case where the target parameter belongs to a complex nonparametric class.
Journal ArticleDOI
Fast learning rates in statistical inference through aggregation
TL;DR: This work develops minimax optimal risk bounds for the general learning task consisting in predicting as well as the best function in a reference set G up to the smallest possible additive term, called the convergence rate.
Journal ArticleDOI
Fast learning rates in statistical inference through aggregation
TL;DR: In this paper, the authors developed minimax optimal risk bounds for the general learning task consisting in predicting as well as the best function in a reference set up to the smallest possible additive term, called the convergence rate.
Journal ArticleDOI
Fast rates in statistical and online learning
TL;DR: The central condition enables a direct proof of fast rates and its equivalence to the Bernstein condition is proved, itself a generalization of the Tsybakov margin condition, both of which have played a central role in obtaining fast rates in statistical learning.
References
More filters
Book
A Probabilistic Theory of Pattern Recognition
TL;DR: The Bayes Error and Vapnik-Chervonenkis theory are applied as guide for empirical classifier selection on the basis of explicit specification and explicit enforcement of the maximum likelihood principle.
Journal ArticleDOI
Information-theoretic determination of minimax rates of convergence
Yuhong Yang,Andrew R. Barron +1 more
TL;DR: Some general results determining minimax bounds on statistical risk for density estimation based on certain information-theoretic considerations are presented, which depend only on metric entropy conditions and are used to identify the minimax rates of convergence.
Journal ArticleDOI
On the generalization ability of on-line learning algorithms
TL;DR: This paper proves tight data-dependent bounds for the risk of this hypothesis in terms of an easily computable statistic M/sub n/ associated with the on-line performance of the ensemble, and obtains risk tail bounds for kernel perceptron algorithms interms of the spectrum of the empirical kernel matrix.
Proceedings ArticleDOI
A game of prediction with expert advice
TL;DR: The following problem is considered: at each point of discrete time the learner must make a prediction; he is given the predictions made by a pool of experts, and each prediction and the outcome, which is disclosed after the learners has made his prediction, determine the incurred loss.