scispace - formally typeset
Open AccessProceedings Article

Progressive mixture rules are deviation suboptimal

Jean-Yves Audibert
- Vol. 20, pp 41-48
Reads0
Chats0
TLDR
This work shows that, surprisingly, for appropriate reference sets G, the deviation convergence rate of the progressive mixture rule is no better than Cst/√n: it fails to achieve the expected CSt/n.
Abstract
We consider the learning task consisting in predicting as well as the best function in a finite reference set G up to the smallest possible additive term. If R(g) denotes the generalization error of a prediction function g, under reasonable assumptions on the loss function (typically satisfied by the least square loss when the output is bounded), it is known that the progressive mixture rule ĝ satisfies ER(ĝ) ≤ ming∈G R(g) + Cst log|G|/n, (1) where n denotes the size of the training set, and E denotes the expectation w.r.t. the training set distribution. This work shows that, surprisingly, for appropriate reference sets G, the deviation convergence rate of the progressive mixture rule is no better than Cst/√n: it fails to achieve the expected Cst/n. We also provide an algorithm which does not suffer from this drawback, and which is optimal in both deviation and expectation convergence rates.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Inconsistency of Bayesian inference for misspecified linear models, and a proposal for repairing it

TL;DR: It is empirically show that Bayesian inference can be inconsistent under misspecification in simple linear regression problems, both in a model averaging/selection and in a Bayesian ridge regression setting.
Posted Content

Orthogonal Statistical Learning.

TL;DR: By focusing on excess risk rather than parameter estimation, this work can give guarantees under weaker assumptions than in previous works and accommodate the case where the target parameter belongs to a complex nonparametric class.
Journal ArticleDOI

Fast learning rates in statistical inference through aggregation

TL;DR: This work develops minimax optimal risk bounds for the general learning task consisting in predicting as well as the best function in a reference set G up to the smallest possible additive term, called the convergence rate.
Journal ArticleDOI

Fast learning rates in statistical inference through aggregation

TL;DR: In this paper, the authors developed minimax optimal risk bounds for the general learning task consisting in predicting as well as the best function in a reference set up to the smallest possible additive term, called the convergence rate.
Journal ArticleDOI

Fast rates in statistical and online learning

TL;DR: The central condition enables a direct proof of fast rates and its equivalence to the Bernstein condition is proved, itself a generalization of the Tsybakov margin condition, both of which have played a central role in obtaining fast rates in statistical learning.
References
More filters
Book

A Probabilistic Theory of Pattern Recognition

TL;DR: The Bayes Error and Vapnik-Chervonenkis theory are applied as guide for empirical classifier selection on the basis of explicit specification and explicit enforcement of the maximum likelihood principle.
Proceedings ArticleDOI

Aggregating strategies

Journal ArticleDOI

Information-theoretic determination of minimax rates of convergence

TL;DR: Some general results determining minimax bounds on statistical risk for density estimation based on certain information-theoretic considerations are presented, which depend only on metric entropy conditions and are used to identify the minimax rates of convergence.
Journal ArticleDOI

On the generalization ability of on-line learning algorithms

TL;DR: This paper proves tight data-dependent bounds for the risk of this hypothesis in terms of an easily computable statistic M/sub n/ associated with the on-line performance of the ensemble, and obtains risk tail bounds for kernel perceptron algorithms interms of the spectrum of the empirical kernel matrix.
Proceedings ArticleDOI

A game of prediction with expert advice

V. G. Vovk
TL;DR: The following problem is considered: at each point of discrete time the learner must make a prediction; he is given the predictions made by a pool of experts, and each prediction and the outcome, which is disclosed after the learners has made his prediction, determine the incurred loss.