scispace - formally typeset
Journal ArticleDOI

An Unrestricted Learning Procedure

Shahar Mendelson
- 23 Nov 2019 - 
- Vol. 66, Iss: 6, pp 1-42
Reads0
Chats0
TLDR
This work studies learning problems involving arbitrary classes of functions F, underlying measures , and targets Y, and proper learning procedures, i.e., procedures that are only allowed to select func...
Abstract
We study learning problems involving arbitrary classes of functions F, underlying measures μ, and targets Y. Because proper learning procedures, i.e., procedures that are only allowed to select functions in F, tend to perform poorly unless the problem satisfies some additional structural property (e.g., that F is convex), we consider unrestricted learning procedures that are free to choose functions outside the given class. We present a new unrestricted procedure whose sample complexity is almost the best that one can hope for and holds for (almost) any problem, including heavy-tailed situations. Moreover, the sample complexity coincides with what one could expect if F were convex, even when F is not. And if F is convex, then the unrestricted procedure turns out to be proper.

read more

Citations
More filters
Journal ArticleDOI

Fast classification rates without standard margin assumptions

TL;DR: This paper considers Chow's reject option model and shows that by lowering the impact of a small fraction of hard instances, fast learning rate is achievable in an agnostic model by a specific learning algorithm, and provides the first setup in which an improper learning algorithm may significantly improve the learning rates for non-convex losses.
Posted Content

On Least Squares Estimation under Heteroscedastic and Heavy-Tailed Errors.

TL;DR: It is shown that the interplay between the moment assumptions on the error, the metric entropy of the class of functions involved, and the "local" structure of the function class around the truth drives the rate of convergence of the LSE.
Journal ArticleDOI

Covariance Estimation: Optimal Dimension-free Guarantees for Adversarial Corruption and Heavy Tails

Pedro Abdalla, +1 more
- 17 May 2022 - 
TL;DR: A dimension-free Bai-Yin type theorem in the regime p > 4 is proved, and despite requiring the existence of only a few moments, the estimator achieves the same tail estimates as if the underlying distribution were Gaussian.
Posted Content

Fast Rates for Online Prediction with Abstention

TL;DR: It is shown that by allowing the learner to abstain from the prediction by paying a cost marginally smaller than $\frac 12$ (say, $0.49$), it is possible to achieve expected regret bounds that are independent of the time horizon.
Posted Content

Distribution-Free Robust Linear Regression.

TL;DR: In this article, a non-linear estimator for linear regression with a heavy-tailed response variable was proposed, and the authors established boundedness of the conditional second moment of the response variable as a necessary and sufficient condition for achieving deviation-optimal excess risk rate of convergence.
References
More filters
Book

Neural Network Learning: Theoretical Foundations

TL;DR: The authors explain the role of scale-sensitive versions of the Vapnik Chervonenkis dimension in large margin classification, and in real prediction, and discuss the computational complexity of neural network learning.
Journal ArticleDOI

Local Rademacher complexities

TL;DR: New bounds on the error of learning algorithms in terms of a data-dependent notion of complexity are proposed and some applications to classification and prediction with convex function classes, and with kernel classes in particular are presented.
Journal ArticleDOI

Sharper Bounds for Gaussian and Empirical Processes

TL;DR: In this paper, the tail probability of the supremum of a Gaussian process is studied under natural conditions on a class of functions on a probability space, and near optimal bounds are given for the probabilities.
Journal ArticleDOI

Learning without Concentration

TL;DR: Sharp bounds are obtained on the estimation error of the Empirical Risk Minimization procedure, performed in a convex class and with respect to the squared loss, without assuming that class members and the target are bounded functions or have rapidly decaying tails.
Journal ArticleDOI

Learning by mirror averaging

TL;DR: This work defines a new estimator or classifier, called aggregate, which is nearly as good as the best among them with respect to a given risk criterion and shows that the aggregate satisfies sharp oracle inequalities under some general assumptions.