scispace - formally typeset
Search or ask a question

Showing papers by "Jerome H. Friedman published in 2016"


Journal ArticleDOI
TL;DR: In this paper, the authors review several variance estimators and perform a reasonably extensive simulation study in an attempt to compare their finite sample performance, and it would seem from the results that variance estimation with adaptively chosen regularisation parameters perform admirably over a broad range of sparsity and signal strength settings.
Abstract: Variance estimation in the linear model when p > n is a difficult problem. Standard least squares estimation techniques do not apply. Several variance estimators have been proposed in the literature, all with accompanying asymptotic results proving consistency and asymptotic normality under a variety of assumptions. It is found, however, that most of these estimators suffer large biases in finite samples when true underlying signals become less sparse with larger per element signal strength. One estimator seems to merit more attention than it has received in the literature: a residual sum of squares based estimator using Lasso coefficients with regularisation parameter selected adaptively (via cross-validation). In this paper, we review several variance estimators and perform a reasonably extensive simulation study in an attempt to compare their finite sample performance. It would seem from the results that variance estimators with adaptively chosen regularisation parameters perform admirably over a broad range of sparsity and signal strength settings. Finally, some intial theoretical analyses pertaining to these types of estimators are proposed and developed.

127 citations


Journal ArticleDOI
TL;DR: Wavelet-based gradient boosting takes advantages of the approximate approximate $$\ell _1$$ℓ1 penalization induced by gradient boosting to give appropriate penalized additive fits.
Abstract: A new data science tool named wavelet-based gradient boosting is proposed and tested. The approach is special case of componentwise linear least squares gradient boosting, and involves wavelet functions of the original predictors. Wavelet-based gradient boosting takes advantages of the approximate $$\ell _1$$l1 penalization induced by gradient boosting to give appropriate penalized additive fits. The method is readily implemented in R and produces parsimonious and interpretable regression fits and classifiers.

8 citations


Posted Content
TL;DR: RCOSA as mentioned in this paper is a software package interfaced to the R language that implements statistical techniques for clustering objects on subsets of attributes in multivariate data, the main output of COSA is a dissimilarity matrix that one can subsequently analyze with a variety of proximity analysis methods.
Abstract: \texttt{rCOSA} is a software package interfaced to the R language. It implements statistical techniques for clustering objects on subsets of attributes in multivariate data. The main output of COSA is a dissimilarity matrix that one can subsequently analyze with a variety of proximity analysis methods. Our package extends the original COSA software (Friedman and Meulman, 2004) by adding functions for hierarchical clustering methods, least squares multidimensional scaling, partitional clustering, and data visualization. In the many publications that cite the COSA paper by Friedman and Meulman (2004), the COSA program is actually used only a small number of times. This can be attributed to the fact that thse original implementation is not very easy to install and use. Moreover, the available software is out-of-date. Here, we introduce an up-to-date software package and a clear guidance for this advanced technique. The software package and related links are available for free at: \url{this https URL}