Progressive mixture rules are deviation suboptimal

Open AccessProceedings Article

Progressive mixture rules are deviation suboptimal

Jean-Yves Audibert

- Vol. 20, pp 41-48

Chats0

TLDR

This work shows that, surprisingly, for appropriate reference sets G, the deviation convergence rate of the progressive mixture rule is no better than Cst/√n: it fails to achieve the expected CSt/n.

Abstract:

We consider the learning task consisting in predicting as well as the best function in a finite reference set G up to the smallest possible additive term. If R(g) denotes the generalization error of a prediction function g, under reasonable assumptions on the loss function (typically satisfied by the least square loss when the output is bounded), it is known that the progressive mixture rule ĝ satisfies ER(ĝ) ≤ ming∈G R(g) + Cst log|G|/n, (1) where n denotes the size of the training set, and E denotes the expectation w.r.t. the training set distribution. This work shows that, surprisingly, for appropriate reference sets G, the deviation convergence rate of the progressive mixture rule is no better than Cst/√n: it fails to achieve the expected Cst/n. We also provide an algorithm which does not suffer from this drawback, and which is optimal in both deviation and expectation convergence rates.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Inconsistency of Bayesian inference for misspecified linear models, and a proposal for repairing it

Peter Grünwald, +1 more

- 01 Jan 2017 -

Bayesian Analysis

TL;DR: It is empirically show that Bayesian inference can be inconsistent under misspecification in simple linear regression problems, both in a model averaging/selection and in a Bayesian ridge regression setting.

...read moreread less

Posted Content

Orthogonal Statistical Learning.

Dylan J. Foster, +1 more

- 25 Jan 2019 -

arXiv: Statistics Theory

TL;DR: By focusing on excess risk rather than parameter estimation, this work can give guarantees under weaker assumptions than in previous works and accommodate the case where the target parameter belongs to a complex nonparametric class.

...read moreread less

Journal ArticleDOI

Fast learning rates in statistical inference through aggregation

Jean-Yves Audibert

- 08 Sep 2009 -

arXiv: Statistics Theory

TL;DR: This work develops minimax optimal risk bounds for the general learning task consisting in predicting as well as the best function in a reference set G up to the smallest possible additive term, called the convergence rate.

...read moreread less

Journal ArticleDOI

Fast learning rates in statistical inference through aggregation

Jean-Yves Audibert

- 01 Aug 2009 -

Annals of Statistics

TL;DR: In this paper, the authors developed minimax optimal risk bounds for the general learning task consisting in predicting as well as the best function in a reference set up to the smallest possible additive term, called the convergence rate.

...read moreread less

Journal ArticleDOI

Fast rates in statistical and online learning

Tim van Erven, +4 more

- 01 Jan 2015 -

Journal of Machine Learning Research

TL;DR: The central condition enables a direct proof of fast rates and its equivalence to the Bernstein condition is proved, itself a generalization of the Tsybakov margin condition, both of which have played a central role in obtaining fast rates in statistical learning.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

A Probabilistic Theory of Pattern Recognition

Luc Devroye, +2 more

TL;DR: The Bayes Error and Vapnik-Chervonenkis theory are applied as guide for empirical classifier selection on the basis of explicit specification and explicit enforcement of the maximum likelihood principle.

...read moreread less

Proceedings ArticleDOI

Aggregating strategies

Volodimir G. Vovk

Journal ArticleDOI

Information-theoretic determination of minimax rates of convergence

Yuhong Yang, +1 more

- 01 Oct 1999 -

Annals of Statistics

TL;DR: Some general results determining minimax bounds on statistical risk for density estimation based on certain information-theoretic considerations are presented, which depend only on metric entropy conditions and are used to identify the minimax rates of convergence.

...read moreread less

Journal ArticleDOI

On the generalization ability of on-line learning algorithms

Nicolò Cesa-Bianchi, +2 more

- 01 Sep 2004 -

IEEE Transactions on Information Theory

TL;DR: This paper proves tight data-dependent bounds for the risk of this hypothesis in terms of an easily computable statistic M/sub n/ associated with the on-line performance of the ensemble, and obtains risk tail bounds for kernel perceptron algorithms interms of the spectrum of the empirical kernel matrix.

...read moreread less

Proceedings ArticleDOI

A game of prediction with expert advice

V. G. Vovk

TL;DR: The following problem is considered: at each point of discrete time the learner must make a prediction; he is given the predictions made by a pool of experts, and each prediction and the outcome, which is disclosed after the learners has made his prediction, determine the incurred loss.

...read moreread less

Progressive mixture rules are deviation suboptimal

Citations

Inconsistency of Bayesian inference for misspecified linear models, and a proposal for repairing it

Orthogonal Statistical Learning.

Fast learning rates in statistical inference through aggregation

Fast learning rates in statistical inference through aggregation

Fast rates in statistical and online learning

References

A Probabilistic Theory of Pattern Recognition

Aggregating strategies

Information-theoretic determination of minimax rates of convergence

On the generalization ability of on-line learning algorithms

A game of prediction with expert advice

Related Papers (5)

Optimal Rates of Aggregation

Local Rademacher complexities and oracle inequalities in risk minimization

Aggregation for Gaussian regression

Optimal aggregation of classifiers in statistical learning

Information-theoretic determination of minimax rates of convergence