scispace - formally typeset
Open AccessPosted Content

High-Dimensional Asymptotics of Prediction: Ridge Regression and Classification

TLDR
A unified analysis of the predictive risk of ridge regression and regularized discriminant analysis in a dense random effects model in a high-dimensional asymptotic regime and finds that predictive accuracy has a nuanced dependence on the eigenvalue distribution of the covariance matrix.
Abstract
We provide a unified analysis of the predictive risk of ridge regression and regularized discriminant analysis in a dense random effects model. We work in a high-dimensional asymptotic regime where $p, n \to \infty$ and $p/n \to \gamma \in (0, \, \infty)$, and allow for arbitrary covariance among the features. For both methods, we provide an explicit and efficiently computable expression for the limiting predictive risk, which depends only on the spectrum of the feature-covariance matrix, the signal strength, and the aspect ratio $\gamma$. Especially in the case of regularized discriminant analysis, we find that predictive accuracy has a nuanced dependence on the eigenvalue distribution of the covariance matrix, suggesting that analyses based on the operator norm of the covariance matrix may not be sharp. Our results also uncover several qualitative insights about both methods: for example, with ridge regression, there is an exact inverse relation between the limiting predictive risk and the limiting estimation risk given a fixed signal strength. Our analysis builds on recent advances in random matrix theory.

read more

Citations
More filters
Posted Content

Surprises in High-Dimensional Ridgeless Least Squares Interpolation.

TL;DR: This paper recovers---in a precise quantitative way---several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.
Journal ArticleDOI

High-dimensional regression adjustments in randomized experiments

TL;DR: This work studies the problem of treatment effect estimation in randomized experiments with high-dimensional covariate information and shows that essentially any risk-consistent regression adjustment can be used to obtain efficient estimates of the average treatment effect.
Posted Content

Optimal Regularization Can Mitigate Double Descent

TL;DR: This work proves that for certain linear regression models with isotropic data distribution, optimally-tuned $\ell_2$ regularization achieves monotonic test performance as the authors grow either the sample size or the model size, and demonstrates empirically that optimalsized regularization can mitigate double descent for more general models, including neural networks.
Posted Content

Benign overfitting in ridge regression

TL;DR: This work provides non-asymptotic generalization bounds for overparametrized ridge regression that depend on the arbitrary covariance structure of the data, and shows that those bounds are tight for a range of regularization parameter values.
Posted Content

lassopack: Model selection and prediction with regularized regression in Stata

TL;DR: Lassopack as discussed by the authors is a suite of programs for regularized regression in Stata, which implements lasso, square-root lasso and elastic net, ridge regression and post-estimation OLS.
References
More filters
Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Book

An Introduction to Multivariate Statistical Analysis

TL;DR: In this article, the distribution of the Mean Vector and the Covariance Matrix and the Generalized T2-Statistic is analyzed. But the distribution is not shown to be independent of sets of Variates.
Journal ArticleDOI

Ridge regression: biased estimation for nonorthogonal problems

TL;DR: In this paper, an estimation procedure based on adding small positive quantities to the diagonal of X′X was proposed, which is a method for showing in two dimensions the effects of nonorthogonality.
Book ChapterDOI

On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities

TL;DR: This chapter reproduces the English translation by B. Seckler of the paper by Vapnik and Chervonenkis in which they gave proofs for the innovative results they had obtained in a draft form in July 1966 and announced in 1968 in their note in Soviet Mathematics Doklady.
Journal ArticleDOI

The Dantzig selector: Statistical estimation when p is much larger than n

TL;DR: In many important statistical applications, the number of variables or parameters p is much larger than the total number of observations n as discussed by the authors, and it is possible to estimate β reliably based on the noisy data y.
Related Papers (5)