scispace - formally typeset
Search or ask a question

Showing papers by "Trevor Hastie published in 2004"


Journal ArticleDOI
TL;DR: A publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates is described.
Abstract: The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efficient prediction of a response variable. Least Angle Regression (LARS), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods. Three main properties are derived: (1) A simple modification of the LARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS modification efficiently implements Forward Stagewise linear regression, another promising new model selection method; this connection explains the similar numerical results previously observed for the Lasso and Stagewise, and helps us understand the properties of both methods, which are seen as constrained versions of the simpler LARS algorithm. (3) A simple approximation for the degrees of freedom of a LARS estimate is available, from which we derive a Cp estimate of prediction error; this allows a principled choice among the range of possible LARS estimates. LARS and its variants are computationally efficient: the paper describes a publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates.

7,828 citations


Journal ArticleDOI
TL;DR: In this article, the authors re-joinder to ''Least angle regression'' by Efron et al. [math.ST/0406456] is presented.
Abstract: Rejoinder to ``Least angle regression'' by Efron et al. [math.ST/0406456]

1,237 citations


Journal Article
TL;DR: An algorithm is derived that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model.
Abstract: The support vector machine (SVM) is a widely used tool for classification. Many efficient implementations exist for fitting a two-class SVM model. The user has to supply values for the tuning parameters: the regularization cost parameter, and the kernel parameters. It seems a common practice is to use a default value for the cost parameter, often leading to the least restrictive model. In this paper we argue that the choice of the cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model. We illustrate our algorithm on some examples, and use our representation to give further insight into the range of SVM solutions.

699 citations


Journal ArticleDOI
TL;DR: Least Angle Regression (LARS) as discussed by the authors is a new model selection algorithm, which is a useful and less greedy version of traditional forward selection methods such as All Subsets, Forward Selection and Backward Elimination.
Abstract: The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efficient prediction of a response variable. Least Angle Regression (LARS), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods. Three main properties are derived: (1) A simple modification of the LARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS modification efficiently implements Forward Stagewise linear regression, another promising new model selection method;

547 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed penalized logistic regression (PLR) as an alternative to the SVM for the microarray cancer diagnosis problem and showed that when using the same set of genes, PLR and SVM perform similarly in cancer classification, but PLR has the advantage of additionally providing an estimate of the underlying probability.
Abstract: Classification of patient samples is an important aspect of cancer diagnosis and treatment. The support vector machine (SVM) has been successfully applied to microarray cancer diagnosis problems. However, one weakness of the SVM is that given a tumor sample, it only predicts a cancer class label but does not provide any estimate of the underlying probability. We propose penalized logistic regression (PLR) as an alternative to the SVM for the microarray cancer diagnosis problem. We show that when using the same set of genes, PLR and the SVM perform similarly in cancer classification, but PLR has the advantage of additionally providing an estimate of the underlying probability. Often a primary goal in microarray cancer diagnosis is to identify the genes responsible for the classification, rather than class prediction. We consider two gene selection methods in this paper, univariate ranking (UR) and recursive feature elimination (RFE). Empirical results indicate that PLR combined with RFE tends to select fewer genes than other methods and also performs well in both cross-validation and test samples. A fast algorithm for solving PLR is also described.

383 citations


Journal Article
TL;DR: It is built on recent work by Efron et al. to show that boosting approximately (and in some cases exactly) minimizes its loss criterion with an l1 constraint on the coefficient vector, and shows that as the constraint is relaxed the solution converges (in the separable case) to an "l1-optimal" separating hyper-plane.
Abstract: In this paper we study boosting methods from a new perspective. We build on recent work by Efron et al. to show that boosting approximately (and in some cases exactly) minimizes its loss criterion with an l1 constraint on the coefficient vector. This helps understand the success of boosting with early stopping as regularized fitting of the loss criterion. For the two most commonly used criteria (exponential and binomial log-likelihood), we further show that as the constraint is relaxed---or equivalently as the boosting iterations proceed---the solution converges (in the separable case) to an "l1-optimal" separating hyper-plane. We prove that this l1-optimal separating hyper-plane has the property of maximizing the minimal l1-margin of the training data, as defined in the boosting literature. An interesting fundamental similarity between boosting and kernel support vector machines emerges, as both can be described as methods for regularized optimization in high-dimensional predictor space, using a computational trick to make the calculation practical, and converging to margin-maximizing solutions. While this statement describes SVMs exactly, it applies to boosting only approximately.

289 citations


Journal ArticleDOI
TL;DR: The peak probability contrast method is a potentially useful tool for sample classification from protein mass spectrometry data and performs as well or better than several methods that require the full spectra, rather than just labelled peaks.
Abstract: Motivation: Early cancer detection has always been a major research focus in solid tumor oncology. Early tumor detection can theoretically result in lower stage tumors, more treatable diseases and ultimately higher cure rates with less treatment-related morbidities. Protein mass spectrometry is a potentially powerful tool for early cancer detection. We propose a novel method for sample classification from protein mass spectrometry data. When applied to spectra from both diseased and healthy patients, the 'peak probability contrast' technique provides a list of all common peaks among the spectra, their statistical significance and their relative importance in discriminating between the two groups. We illustrate the method on matrix-assisted laser desorption and ionization mass spectrometry data from a study of ovarian cancers. Results: Compared to other statistical approaches for class prediction, the peak probability contrast method performs as well or better than several methods that require the full spectra, rather than just labelled peaks. It is also much more interpretable biologically. The peak probability contrast method is a potentially useful tool for sample classification from protein mass spectrometry data. Supplementary Information: http://www.stat.stanford.edu/~tibs/ppc

218 citations


Journal ArticleDOI
TL;DR: In this article, the authors provide improvements in semipara... In this paper, they provide an improvement in the quality of the time series analysis of air pollution and health, which is a critical component of the evidence used in the PM Criteria Document.
Abstract: In 2002, methodological issues around time series analyses of air pollution and health attracted the attention of the scientific community, policy makers, the press, and the diverse stakeholders concerned with air pollution. As the U. S. Environmental Protection Agency (EPA) was finalizing its most recent review of epidemiologic evidence on particulate matter air pollution (PM), statisticians and epidemiologists found that the S–PLUS implementation of generalized additive models (GAMs) can overestimate effects of air pollution and understate statistical uncertainty in time series studies of air pollution and health. This discovery delayed completion of the PM Criteria Document prepared as part of the review of the U. S. National Ambient Air Quality Standard, because the time series findings represented a critical component of the evidence. In addition, it raised concerns about the adequacy of current model formulations and their software implementations. In this article we provide improvements in semipara...

180 citations


01 Jan 2004
TL;DR: These SCRDA methods generalize the idea of the nearest shrunken centroids of Prediction Analysis of Microarray into the classical discriminant analysis and perform uniformly well in the multivariate classification problems, especially outperform the currently popular PAM.
Abstract: In this paper, we introduce a family of some modified versions of linear discriminant analysis, called “shrunken centroids regularized discriminant analysis” (SCRDA). These methods generalize the idea of the nearest shrunken centroids of Prediction Analysis of Microarray (PAM) into the classical discriminant analysis. These SCRDA methods are specially designed for classification problems in high dimension low sample size situations, for example microarray data. Through both simulation study and real life data, it is shown that these SCRDA methods perform uniformly well in the multivariate classification problems, especially outperform the currently popular PAM. Some of them are also suitable for feature elimination purpose and can be used as gene selection methods. The open source R codes for these methods are also available and will be added to the R libraries in the near future.

119 citations


Journal ArticleDOI
TL;DR: This article exposes a class of techniques based on quadratic regularization of linear models, including regularized (ridge) regression, logistic and multinomial regression, linear and mixture discriminant analysis, the Cox model and neural networks, and shows that dramatic computational savings are possible over naive implementations.
Abstract: SUMMARY Gene expression arrays typically have 50 to 100 samples and 1000 to 20 000 variables (genes). There have been many attempts to adapt statistical models for regression and classification to these data, and in many cases these attempts have challenged the computational resources. In this article we expose a class of techniques based on quadratic regularization of linear models, including regularized (ridge) regression, logistic and multinomial regression, linear and mixture discriminant analysis, the Cox model and neural networks. For all of these models, we show that dramatic computational savings are possible over naive implementations, using standard transformations in numerical linear algebra.

119 citations


Journal ArticleDOI
TL;DR: Evidence of somatotopic organization in the STN in patients with PD supports the current theory of highly segregated loops integrating cortex-basal ganglia connections that are preserved in chronic degenerative diseases such as PD, but may subserve a distorted body map.
Abstract: Object. The subthalamic nucleus (STN) is a key structure for motor control through the basal ganglia. The aim of this study was to show that the STN in patients with Parkinson disease (PD) has a somatotopic organization similar to that in nonhuman primates. Methods. A functional map of the STN was obtained using electrophysiological microrecording during placement of deep brain stimulation (DBS) electrodes in patients with PD. Magnetic resonance imaging was combined with ventriculography and intraoperative x-ray film to assess the position of the electrodes and the STN units, which were activated by limb movements to map the sensorimotor region of the STN. Each activated cell was located relative to the anterior commissure—posterior commissure line. Three-dimensional coordinates of the cells were analyzed statistically to determine whether those cells activated by movements of the arm and leg were segregated spatially. Three hundred seventy-nine microelectrode tracks were created during placement of 71 DB...

Proceedings ArticleDOI
21 Jul 2004
TL;DR: This work extends an existing procedure by re-interpreting it as a Naive Bayes model for document sentiment by incorporating additional derived features into the model and, where possible, using labeled data to estimate their relative influence.
Abstract: Sentiment classification is the task of labeling a review document according to the polarity of its prevailing opinion (favorable or unfavorable). In approaching this problem, a model builder often has three sources of information available: a small collection of labeled documents, a large collection of unlabeled documents, and human understanding of language. Ideally, a learning method will utilize all three sources. To accomplish this goal, we generalize an existing procedure that uses the latter two.We extend this procedure by re-interpreting it as a Naive Bayes model for document sentiment. Viewed as such, it can also be seen to extract a pair of derived features that are linearly combined to predict sentiment. This perspective allows us to improve upon previous methods, primarily through two strategies: incorporating additional derived features into the model and, where possible, using labeled data to estimate their relative influence.

Proceedings Article
01 Dec 2004
TL;DR: A method of moments for estimating this stochastic dependence using the unlabeled data in semi-supervised learning, where the "label sampling" mechanism stochastically depends on the true response (as well as potentially on the features).
Abstract: We consider the situation in semi-supervised learning, where the "label sampling" mechanism stochastically depends on the true response (as well as potentially on the features). We suggest a method of moments for estimating this stochastic dependence using the unlabeled data. This is potentially useful for two distinct purposes: a. As an input to a supervised learning procedure which can be used to "de-bias" its results using labeled data only and b. As a potentially interesting learning task in itself. We present several examples to illustrate the practical usefulness of our method.

Proceedings Article
01 Dec 2004
TL;DR: In this article, the authors argue that the choice of the SVM cost parameter can be critical and derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model.
Abstract: In this paper we argue that the choice of the SVM cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model.

Journal Article
TL;DR: Tension on the leaflets in the edge-to-edge repair is determined primarily by MA SL size, and paradoxically is lower when the contractile state is enhanced, indicating that annular and/or LV dilatation increase stitch tension and may adversely affect durability of the repair if concomitant ring annuloplasty is not performed.
Abstract: Background and aim of the study Whilst increased 'Alfieri stitch' tension may reduce the durability of 'edge-to-edge' mitral repair, the factors affecting suture tension are unknown. In order to study hemodynamics and left ventricular (LV) and annular dynamics that determine suture tension, the central edge of the mitral leaflets was approximated with a miniature force transducer to measure leaflet tension (T) at the leaflet approximation point. Methods Eight sheep were studied under open-chest conditions immediately after surgical placement of a force transducer and implantation of radiopaque markers on the left ventricle and mitral annulus (MA). Hemodynamic variables were altered by two caval occlusion steps (deltaV1 and deltaV2) and dobutamine infusion. Three-dimensional marker coordinates were obtained by simultaneous biplane videofluoroscopy to measure LV volume, MA area (MAA) and septal-lateral (SL) annular dimension throughout the cardiac cycle. Results At baseline, peak Alfieri stitch tension (0.30 +/- 0.18 N) was observed 96 +/- 61 ms prior to end-diastole coincident with peak annular SL diameter (98 +/- 58 ms before end-diastole). Dobutamine infusion decreased suture tension (from 0.30 +/- 0.18 N to 0.20 +/- 0.12 N, p = 0.01), although peak systolic pressure increased significantly (138 +/- 19 versus 115 +/- 14 mmHg; p = 0.03). A regression model was fitted with the goal of interpreting the hemodynamic and geometric predictors of tension as their influence varied with time: Tt (N) = 0.1916 + 0.2115 x SL (cm) - 0.1996 x MAA/SL (cm2/cm) + ft x LVP (mmHg), where Tt is tension at any time during the cardiac cycle and ft is the time-varying coefficient of LVP. Conclusion Tension on the leaflets in the edge-to-edge repair is determined primarily by MA SL size, and paradoxically is lower when the contractile state is enhanced. This indicates that annular and/or LV dilatation increase stitch tension and may adversely affect durability of the repair if concomitant ring annuloplasty is not performed.

Journal ArticleDOI
TL;DR: In this article, Jiang et al. discuss process consistency for AdaBoost and the Bayes-risk consistency of regularized boosting methods, including convex risk minimization, and statistical behavior and consistency of classification methods.
Abstract: Discussions of: "Process consistency for AdaBoost" [Ann. Statist. 32 (2004), no. 1, 13-29] by W. Jiang; "On the Bayes-risk consistency of regularized boosting methods" [ibid., 30-55] by G. Lugosi and N. Vayatis; and "Statistical behavior and consistency of classification methods based on convex risk minimization" [ibid., 56-85] by T. Zhang. Includes rejoinders by the authors.