An Unrestricted Learning Procedure

doi:10.1145/3361699

Journal ArticleDOI

An Unrestricted Learning Procedure

Shahar Mendelson

- 23 Nov 2019 -

Journal of the ACM

- Vol. 66, Iss: 6, pp 1-42

Chats0

TLDR

This work studies learning problems involving arbitrary classes of functions F, underlying measures , and targets Y, and proper learning procedures, i.e., procedures that are only allowed to select func...

Abstract:

We study learning problems involving arbitrary classes of functions F, underlying measures μ, and targets Y. Because proper learning procedures, i.e., procedures that are only allowed to select functions in F, tend to perform poorly unless the problem satisfies some additional structural property (e.g., that F is convex), we consider unrestricted learning procedures that are free to choose functions outside the given class. We present a new unrestricted procedure whose sample complexity is almost the best that one can hope for and holds for (almost) any problem, including heavy-tailed situations. Moreover, the sample complexity coincides with what one could expect if F were convex, even when F is not. And if F is convex, then the unrestricted procedure turns out to be proper.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Fast classification rates without standard margin assumptions

Olivier Bousquet, +1 more

- 12 Apr 2021 -

Information and Inference: A Journal of ...

TL;DR: This paper considers Chow's reject option model and shows that by lowering the impact of a small fraction of hard instances, fast learning rate is achievable in an agnostic model by a specific learning algorithm, and provides the first setup in which an improper learning algorithm may significantly improve the learning rates for non-convex losses.

...read moreread less

Posted Content

On Least Squares Estimation under Heteroscedastic and Heavy-Tailed Errors.

Arun Kumar Kuchibhotla, +1 more

- 04 Sep 2019 -

arXiv: Statistics Theory

TL;DR: It is shown that the interplay between the moment assumptions on the error, the metric entropy of the class of functions involved, and the "local" structure of the function class around the truth drives the rate of convergence of the LSE.

...read moreread less

Journal ArticleDOI

Covariance Estimation: Optimal Dimension-free Guarantees for Adversarial Corruption and Heavy Tails

Pedro Abdalla, +1 more

- 17 May 2022 -

arXiv.org

TL;DR: A dimension-free Bai-Yin type theorem in the regime p > 4 is proved, and despite requiring the existence of only a few moments, the estimator achieves the same tail estimates as if the underlying distribution were Gaussian.

...read moreread less

Posted Content

Fast Rates for Online Prediction with Abstention

Gergely Neu, +1 more

- 28 Jan 2020 -

arXiv: Learning

TL;DR: It is shown that by allowing the learner to abstain from the prediction by paying a cost marginally smaller than $\frac 12$ (say, $0.49$), it is possible to achieve expected regret bounds that are independent of the time horizon.

...read moreread less

Posted Content

Distribution-Free Robust Linear Regression.

Jaouad Mourtada, +2 more

- 25 Feb 2021 -

arXiv: Statistics Theory

TL;DR: In this article, a non-linear estimator for linear regression with a heavy-tailed response variable was proposed, and the authors established boundedness of the conditional second moment of the response variable as a necessary and sufficient condition for achieving deviation-optimal excess risk rate of convergence.

...read moreread less

References

PDF

Open Access

More filters

Book

Neural Network Learning: Theoretical Foundations

Martin Anthony, +1 more

TL;DR: The authors explain the role of scale-sensitive versions of the Vapnik Chervonenkis dimension in large margin classification, and in real prediction, and discuss the computational complexity of neural network learning.

...read moreread less

Journal ArticleDOI

Local Rademacher complexities

Peter L. Bartlett, +2 more

- 01 Aug 2005 -

Annals of Statistics

TL;DR: New bounds on the error of learning algorithms in terms of a data-dependent notion of complexity are proposed and some applications to classification and prediction with convex function classes, and with kernel classes in particular are presented.

...read moreread less

Journal ArticleDOI

Sharper Bounds for Gaussian and Empirical Processes

Michel Talagrand

- 01 Jan 1994 -

Annals of Probability

TL;DR: In this paper, the tail probability of the supremum of a Gaussian process is studied under natural conditions on a class of functions on a probability space, and near optimal bounds are given for the probabilities.

...read moreread less

Journal ArticleDOI

Learning without Concentration

Shahar Mendelson

- 30 Jun 2015 -

Journal of the ACM

TL;DR: Sharp bounds are obtained on the estimation error of the Empirical Risk Minimization procedure, performed in a convex class and with respect to the squared loss, without assuming that class members and the target are bounded functions or have rapidly decaying tails.

...read moreread less

Journal ArticleDOI

Learning by mirror averaging

Anatoli Juditsky, +2 more

- 01 Oct 2008 -

Annals of Statistics

TL;DR: This work defines a new estimator or classifier, called aggregate, which is nearly as good as the best among them with respect to a given risk criterion and shows that the aggregate satisfies sharp oracle inequalities under some general assumptions.

...read moreread less

An Unrestricted Learning Procedure

Citations

Fast classification rates without standard margin assumptions

On Least Squares Estimation under Heteroscedastic and Heavy-Tailed Errors.

Covariance Estimation: Optimal Dimension-free Guarantees for Adversarial Corruption and Heavy Tails

Fast Rates for Online Prediction with Abstention

Distribution-Free Robust Linear Regression.

References

Neural Network Learning: Theoretical Foundations

Local Rademacher complexities

Sharper Bounds for Gaussian and Empirical Processes

Learning without Concentration

Learning by mirror averaging

Related Papers (5)

Sharper Bounds for Gaussian and Empirical Processes

Supplementary material to "Consistent Minimization of Clustering Objective Functions"

Invariance, minimax sequential estimation, and continuous time processes

A Theory of Learning Simple Concepts Under Simple Distributions

Generalized notions of mind change complexity