Journal ArticleDOI
An Unrestricted Learning Procedure
Reads0
Chats0
TLDR
This work studies learning problems involving arbitrary classes of functions F, underlying measures , and targets Y, and proper learning procedures, i.e., procedures that are only allowed to select func...Abstract:
We study learning problems involving arbitrary classes of functions F, underlying measures μ, and targets Y. Because proper learning procedures, i.e., procedures that are only allowed to select functions in F, tend to perform poorly unless the problem satisfies some additional structural property (e.g., that F is convex), we consider unrestricted learning procedures that are free to choose functions outside the given class. We present a new unrestricted procedure whose sample complexity is almost the best that one can hope for and holds for (almost) any problem, including heavy-tailed situations. Moreover, the sample complexity coincides with what one could expect if F were convex, even when F is not. And if F is convex, then the unrestricted procedure turns out to be proper.read more
Citations
More filters
Journal ArticleDOI
Fast classification rates without standard margin assumptions
TL;DR: This paper considers Chow's reject option model and shows that by lowering the impact of a small fraction of hard instances, fast learning rate is achievable in an agnostic model by a specific learning algorithm, and provides the first setup in which an improper learning algorithm may significantly improve the learning rates for non-convex losses.
Posted Content
On Least Squares Estimation under Heteroscedastic and Heavy-Tailed Errors.
TL;DR: It is shown that the interplay between the moment assumptions on the error, the metric entropy of the class of functions involved, and the "local" structure of the function class around the truth drives the rate of convergence of the LSE.
Journal ArticleDOI
Covariance Estimation: Optimal Dimension-free Guarantees for Adversarial Corruption and Heavy Tails
TL;DR: A dimension-free Bai-Yin type theorem in the regime p > 4 is proved, and despite requiring the existence of only a few moments, the estimator achieves the same tail estimates as if the underlying distribution were Gaussian.
Posted Content
Fast Rates for Online Prediction with Abstention
Gergely Neu,Nikita Zhivotovskiy +1 more
TL;DR: It is shown that by allowing the learner to abstain from the prediction by paying a cost marginally smaller than $\frac 12$ (say, $0.49$), it is possible to achieve expected regret bounds that are independent of the time horizon.
Posted Content
Distribution-Free Robust Linear Regression.
TL;DR: In this article, a non-linear estimator for linear regression with a heavy-tailed response variable was proposed, and the authors established boundedness of the conditional second moment of the response variable as a necessary and sufficient condition for achieving deviation-optimal excess risk rate of convergence.
References
More filters
Book
Neural Network Learning: Theoretical Foundations
Martin Anthony,Peter L. Bartlett +1 more
TL;DR: The authors explain the role of scale-sensitive versions of the Vapnik Chervonenkis dimension in large margin classification, and in real prediction, and discuss the computational complexity of neural network learning.
Journal ArticleDOI
Local Rademacher complexities
TL;DR: New bounds on the error of learning algorithms in terms of a data-dependent notion of complexity are proposed and some applications to classification and prediction with convex function classes, and with kernel classes in particular are presented.
Journal ArticleDOI
Sharper Bounds for Gaussian and Empirical Processes
TL;DR: In this paper, the tail probability of the supremum of a Gaussian process is studied under natural conditions on a class of functions on a probability space, and near optimal bounds are given for the probabilities.
Journal ArticleDOI
Learning without Concentration
TL;DR: Sharp bounds are obtained on the estimation error of the Empirical Risk Minimization procedure, performed in a convex class and with respect to the squared loss, without assuming that class members and the target are bounded functions or have rapidly decaying tails.
Journal ArticleDOI
Learning by mirror averaging
TL;DR: This work defines a new estimator or classifier, called aggregate, which is nearly as good as the best among them with respect to a given risk criterion and shows that the aggregate satisfies sharp oracle inequalities under some general assumptions.