Mixing Strategies for Density Estimation
TLDR
In this article, it is shown that without knowing which strategy works best for the underlying density, a single strategy can be constructed by mixing the proposed ones to be adaptive in terms of statistical risks.Abstract:
General results on adaptive density estimation are obtained with respect to any countable collection of estimation strategies under Kullback-Leibler and squared $L_2$ losses. It is shown that without knowing which strategy works best for the underlying density, a single strategy can be constructed by mixing the proposed ones to be adaptive in terms of statistical risks. A consequence is that under some mild conditions, an asymptotically minimax-rate adaptive estimator exists for a given countable collection of density classes; that is, a single estimator can be constructed to be simultaneously minimax-rate optimal for all the function classes being considered. A demonstration is given for high-dimensional density estimation on $[0,1]^d$ where the constructed estimator adapts to smoothness and interaction-order over some piecewise Besov classes and is consistent for all the densities with finite entropy.read more
Citations
More filters
Book
Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems: École d'Été de Probabilités de Saint-Flour XXXVIII-2008
TL;DR: The purpose of these lecture notes is to provide an introduction to the general theory of empirical risk minimization with an emphasis on excess risk bounds and oracle inequalities in penalized problems.
Journal ArticleDOI
Adaptive Regression by Mixing
TL;DR: Under mild conditions, it is shown that the squared L2 risk of the estimator based on ARM is basically bounded above by the risk of each candidate procedure plus a small penalty term of order 1/n, giving the automatically optimal rate of convergence for ARM.
BookDOI
Oracle inequalities in empirical risk minimization and sparse recovery problems
TL;DR: The main tools involved in the analysis of these problems are concentration and deviation inequalities by Talagrand along with other methods of empirical processes theory (symmetrization inequalities, contraction inequality for Rademacher sums, entropy and generic chaining bounds) as discussed by the authors.
Journal ArticleDOI
PAC-Bayesian Stochastic Model Selection
TL;DR: A PAC-Bayesian performance guarantee for stochastic model selection that is superior to analogous guarantees for deterministic model selection and shown that the posterior optimizing the performance guarantee is a Gibbs distribution.
Journal ArticleDOI
Combining forecasting procedures: Some theoretical results
TL;DR: In this paper, statistical risk bounds under the square error loss are obtained under distributional assumptions on the future given the current outside information and the past observations, and the combined forecast automatically achieves the best performance among the candidate procedures up to a constant factor and an additive penalty term.
References
More filters
Book
Elements of information theory
Thomas M. Cover,Joy A. Thomas +1 more
TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Book
A Probabilistic Theory of Pattern Recognition
TL;DR: The Bayes Error and Vapnik-Chervonenkis theory are applied as guide for empirical classifier selection on the basis of explicit specification and explicit enforcement of the maximum likelihood principle.
Journal ArticleDOI
The context-tree weighting method: basic properties
TL;DR: The authors derive a natural upper bound on the cumulative redundancy of the method for individual sequences that shows that the proposed context-tree weighting procedure is optimal in the sense that it achieves the Rissanen (1984) lower bound.
Journal ArticleDOI
The Intrinsic Bayes Factor for Model Selection and Prediction
James O. Berger,Luis R. Pericchi +1 more
TL;DR: This article introduces a new criterion called the intrinsic Bayes factor, which is fully automatic in the sense of requiring only standard noninformative priors for its computation and yet seems to correspond to very reasonable actual Bayes factors.