scispace - formally typeset
Search or ask a question

Showing papers by "Robert Tibshirani published in 1996"


Journal ArticleDOI
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

40,785 citations


Journal ArticleDOI
TL;DR: A locally adaptive form of nearest neighbor classification is proposed to try to finesse this curse of dimensionality, and a method for global dimension reduction is proposed, that combines local dimension information.
Abstract: Nearest neighbour classification expects the class conditional probabilities to be locally constant, and suffers from bias in high dimensions. We propose a locally adaptive form of nearest neighbour classification to try to ameliorate this curse of dimensionality. We use a local linear discriminant analysis to estimate an effective metric for computing neighbourhoods. We determine the local decision boundaries from centroid information, and then shrink neighbourhoods in directions orthogonal to these local decision boundaries, and elongate them parallel to the boundaries. Thereafter, any neighbourhood-based classifier can be employed, using the modified neighbourhoods. The posterior probabilities tend to be more homogeneous in the modified neighbourhoods. We also propose a method for global dimension reduction, that combines local dimension information. In a number of examples, the methods demonstrate the potential for substantial improvements over nearest neighbour classification.

908 citations


Journal ArticleDOI
TL;DR: This paper fits Gaussian mixtures to each class to facilitate effective classification in non-normal settings, especially when the classes are clustered.
Abstract: Fisher-Rao linear discriminant analysis (LDA) is a valuable tool for multigroup classification. LDA is equivalent to maximum likelihood classification assuming Gaussian distributions for each class. In this paper, we fit Gaussian mixtures to each class to facilitate effective classification in non-normal settings, especially when the classes are clustered. Low dimensional views are an important by-product of LDA-our new techniques inherit this feature. We can control the within-class spread of the subclass centres relative to the between-class spread. Our technique for fitting these models permits a natural blend with nonparametric versions of LDA.

791 citations


Journal ArticleDOI
TL;DR: In this article, the authors consider the problem of combining a collection of general regression fit vectors to obtain a better predictive model and develop a general framework for this problem and examine a cross-validation-based proposal called "model mix" or "stacking" in this context.
Abstract: We consider the problem of how to combine a collection of general regression fit vectors to obtain a better predictive model. The individual fits may be from subset linear regression, ridge regression, or something more complex like a neural network. We develop a general framework for this problem and examine a cross-validation—based proposal called “model mix” or “stacking” in this context. We also derive combination methods based on the bootstrap and analytic methods and compare them in examples. Finally, we apply these ideas to classification problems where the estimated combination weights can yield insight into the structure of the problem.

318 citations


Journal ArticleDOI
TL;DR: A number of methods for estimating the standard error of predicted values from a multilayer perceptron are discussed, including the delta method based on the Hessian, bootstrap estimators, and the sandwich estimator.
Abstract: We discuss a number of methods for estimating the standard error of predicted values from a multilayer perceptron. These methods include the delta method based on the Hessian, bootstrap estimators, and the “sandwich” estimator. The methods are described and compared in a number of examples. We find that the bootstrap methods perform best, partly because they capture variability due to the choice of starting weights.

287 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a hybrid estimator, which combines the maximum likelihood fitting within some parametric family such as the normal or by nonparametric methods such as kernel density estimation, by putting an exponential family ''through'' a non-parametric estimator.
Abstract: i this paper have Y being portions of the real line or of the plane, but the methodology applies just as well to higher dimensionalities and to more complicated spaces. . Estimates of g y are traditionally constructed in two quite different ways: by maximum likelihood fitting within some parametric family such as the normal or by nonparametric methods such as kernel density estimation. These two methods can be combined by putting an exponential family ''through'' a nonparametric estimator. The resulting hybrid estimators are the specially designed exponential families of the title. Figure 1 shows a simple example of this methodology. The y are pain i scores for n s 67 women, each obtained by averaging the results from a questionnaire administered after an operation. The scale runs from y s 0 s w x

174 citations


16 Sep 1996
TL;DR: This manual describes the preliminary release of the DELVE environment, and recommends that you exercise caution when using this version of DELVE for real work, as it is possible that bugs remain in the software.
Abstract: This manual describes the preliminary release of the DELVE environment. Some features described here have not yet implemented, as noted. Support for regression tasks is presently somewhat more developed than that for classiication tasks. We recommend that you exercise caution when using this version of DELVE for real work, as it is possible that bugs remain in the software. We hope that you will send us reports of any problems you encounter, as well as any other comments you may have on the software or manual, at the e-mail address below. Please mention the version number of the manual and/or the software with any comments you send. All Rights Reserved Permission to use, copy, modify, and distribute this software and its documentation for non-commercial purposes only is hereby granted without fee, provided that the above copyright notice appears in all copies and that both the copyright notice and this permission notice appear in supporting documentation, and that the name of The University of Toronto not be used in advertising or publicity pertaining to distribution of the software without speciic, written prior permission. The University of Toronto makes no representations about the suitability of this software for any purpose. It is provided \as is" without express or implied warranty. The University of Toronto disclaims all warranties with regard to this software, including all implied warranties of merchantability and tness. In no event shall the University of Toronto be liable for any special, indirect or consequential damages or any damages whatsoever resulting from loss of use, data or proots, whether in an action of contract, negligence or other tortious action, arising out of or in connection with the use or performance of this software. If you publish results obtained using DELVE, please cite this manual, and mention the version number of the software that you used.

79 citations


Journal ArticleDOI
TL;DR: In this article, a BCa-type bootstrap procedure for setting approximate prediction intervals for an efficient estimator θm of a scalar parameter θ, based on a future sample of size m, is investigated.
Abstract: We investigate the construction of a BCa-type bootstrap procedure for setting approximate prediction intervals for an efficient estimator θm of a scalar parameter θ, based on a future sample of size m. The results are also extended to nonparametric situations, which can be used to form bootstrap prediction intervals for a large class of statistics. These intervals are transformation-respecting and range-preserving. The asymptotic performance of our procedure is assessed by allowing both the past and future sample sizes to tend to infinity. The resulting intervals are then shown to be second-order correct and second-order accurate. These second-order properties are established in terms of min(m, n), and not the past sample size n alone. Dans cet article, nous etudions la construction d'une procedure “bootstrap” de type BCa pour determiner des intervalles de prediction approximatifs d'un estimateur efficace θm d'un parametre scalaire θ1 fonde sur un echantillon futur de taille m. Les resultats sont egalement etendus aux situations non-parametriques, qui peuvent ětre utilisees pour construire des intervalles de prediction “bootstrap” pour une grande classe de statistiques. Ces intervalles sont invariants sous les transformations et preservent l'etendue. La performance asymptotique de notre procedure est evaluee en laissant les tailles echantillonnales passee et future tendre vers l'infini. Nous demontrons alors que les intervalles obtenus sont exacts et precis au deuxieme ordre et precis de deuxieme ordre. Ces proprietes de deuxieme ordre sont etablies en terme de min(m, n) et non seulement en terme de la taille, n, de I'echantillon passe

15 citations