Open AccessPosted Content
High-Dimensional Asymptotics of Prediction: Ridge Regression and Classification
Edgar Dobriban,Stefan Wager +1 more
TLDR
A unified analysis of the predictive risk of ridge regression and regularized discriminant analysis in a dense random effects model in a high-dimensional asymptotic regime and finds that predictive accuracy has a nuanced dependence on the eigenvalue distribution of the covariance matrix.Abstract:
We provide a unified analysis of the predictive risk of ridge regression and regularized discriminant analysis in a dense random effects model. We work in a high-dimensional asymptotic regime where $p, n \to \infty$ and $p/n \to \gamma \in (0, \, \infty)$, and allow for arbitrary covariance among the features. For both methods, we provide an explicit and efficiently computable expression for the limiting predictive risk, which depends only on the spectrum of the feature-covariance matrix, the signal strength, and the aspect ratio $\gamma$. Especially in the case of regularized discriminant analysis, we find that predictive accuracy has a nuanced dependence on the eigenvalue distribution of the covariance matrix, suggesting that analyses based on the operator norm of the covariance matrix may not be sharp. Our results also uncover several qualitative insights about both methods: for example, with ridge regression, there is an exact inverse relation between the limiting predictive risk and the limiting estimation risk given a fixed signal strength. Our analysis builds on recent advances in random matrix theory.read more
Citations
More filters
Posted Content
Surprises in High-Dimensional Ridgeless Least Squares Interpolation.
TL;DR: This paper recovers---in a precise quantitative way---several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.
Journal ArticleDOI
High-dimensional regression adjustments in randomized experiments
TL;DR: This work studies the problem of treatment effect estimation in randomized experiments with high-dimensional covariate information and shows that essentially any risk-consistent regression adjustment can be used to obtain efficient estimates of the average treatment effect.
Posted Content
Optimal Regularization Can Mitigate Double Descent
TL;DR: This work proves that for certain linear regression models with isotropic data distribution, optimally-tuned $\ell_2$ regularization achieves monotonic test performance as the authors grow either the sample size or the model size, and demonstrates empirically that optimalsized regularization can mitigate double descent for more general models, including neural networks.
Posted Content
Benign overfitting in ridge regression
TL;DR: This work provides non-asymptotic generalization bounds for overparametrized ridge regression that depend on the arbitrary covariance structure of the data, and shows that those bounds are tight for a range of regularization parameter values.
Posted Content
lassopack: Model selection and prediction with regularized regression in Stata
TL;DR: Lassopack as discussed by the authors is a suite of programs for regularized regression in Stata, which implements lasso, square-root lasso and elastic net, ridge regression and post-estimation OLS.
References
More filters
Journal ArticleDOI
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky,Jia Deng,Hao Su,Jonathan Krause,Sanjeev Satheesh,Sean Ma,Zhiheng Huang,Andrej Karpathy,Aditya Khosla,Michael S. Bernstein,Alexander C. Berg,Li Fei-Fei +11 more
TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Book
An Introduction to Multivariate Statistical Analysis
TL;DR: In this article, the distribution of the Mean Vector and the Covariance Matrix and the Generalized T2-Statistic is analyzed. But the distribution is not shown to be independent of sets of Variates.
Journal ArticleDOI
Ridge regression: biased estimation for nonorthogonal problems
TL;DR: In this paper, an estimation procedure based on adding small positive quantities to the diagonal of X′X was proposed, which is a method for showing in two dimensions the effects of nonorthogonality.
Book ChapterDOI
On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities
TL;DR: This chapter reproduces the English translation by B. Seckler of the paper by Vapnik and Chervonenkis in which they gave proofs for the innovative results they had obtained in a draft form in July 1966 and announced in 1968 in their note in Soviet Mathematics Doklady.
Journal ArticleDOI
The Dantzig selector: Statistical estimation when p is much larger than n
Emmanuel J. Candès,Terence Tao +1 more
TL;DR: In many important statistical applications, the number of variables or parameters p is much larger than the total number of observations n as discussed by the authors, and it is possible to estimate β reliably based on the noisy data y.
Related Papers (5)
High-dimensional asymptotics of prediction: Ridge regression and classification
Edgar Dobriban,Stefan Wager +1 more
The Effect of Spatial Autocorrelation on the Error Rates of the Linear Discriminant Functions
K. Dučinskas,J. Šaltytė +1 more