Showing papers by "Trevor Hastie published in 2004"

PDF

Open Access

Journal Article•DOI•

[...]

Bradley Efron¹, Trevor Hastie¹, Iain M. Johnstone¹, Robert Tibshirani¹, Hemant Ishwaran², Keith Knight³, Jean-Michel Loubes⁴, Jean-Michel Loubes⁵, Pascal Massart⁵, Pascal Massart⁶, David Madigan⁷, David Madigan⁸, Greg Ridgeway⁷, Greg Ridgeway⁹, Saharon Rosset¹⁰, Saharon Rosset¹, Ji Zhu, Robert A. Stine¹¹, Berwin A. Turlach¹², Sanford Weisberg¹³ - Show less +16 more•Institutions (13)

Stanford University¹, Cleveland Clinic², University of Toronto³, Centre national de la recherche scientifique⁴, Université Paris-Saclay⁵, University of Paris-Sud⁶, Rutgers University⁷, Avaya⁸, RAND Corporation⁹, IBM¹⁰, University of Pennsylvania¹¹, University of Western Australia¹², University of Minnesota¹³

01 Apr 2004-Annals of Statistics

TL;DR: A publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates is described.

...read moreread less

Abstract: The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efficient prediction of a response variable. Least Angle Regression (LARS), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods. Three main properties are derived: (1) A simple modification of the LARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS modification efficiently implements Forward Stagewise linear regression, another promising new model selection method; this connection explains the similar numerical results previously observed for the Lasso and Stagewise, and helps us understand the properties of both methods, which are seen as constrained versions of the simpler LARS algorithm. (3) A simple approximation for the degrees of freedom of a LARS estimate is available, from which we derive a Cp estimate of prediction error; this allows a principled choice among the range of possible LARS estimates. LARS and its variants are computationally efficient: the paper describes a publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates.

...read moreread less

7,828 citations

Journal Article•DOI•

Rejoinder to "least angle regression" by efron et al.

[...]

Bradley Efron, Trevor Hastie, Iain M. Johnstone, Robert

23 Jun 2004-arXiv: Statistics Theory

TL;DR: In this article, the authors re-joinder to ''Least angle regression'' by Efron et al. [math.ST/0406456] is presented.

...read moreread less

Abstract: Rejoinder to ``Least angle regression'' by Efron et al. [math.ST/0406456]

...read moreread less

1,237 citations

Journal Article•

The Entire Regularization Path for the Support Vector Machine

[...]

Trevor Hastie¹, Saharon Rosset², Robert Tibshirani¹, Ji Zhu•Institutions (2)

Stanford University¹, IBM²

01 Dec 2004-Journal of Machine Learning Research

TL;DR: An algorithm is derived that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model.

...read moreread less

Abstract: The support vector machine (SVM) is a widely used tool for classification. Many efficient implementations exist for fitting a two-class SVM model. The user has to supply values for the tuning parameters: the regularization cost parameter, and the kernel parameters. It seems a common practice is to use a default value for the cost parameter, often leading to the least restrictive model. In this paper we argue that the choice of the cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model. We illustrate our algorithm on some examples, and use our representation to give further insight into the range of SVM solutions.

...read moreread less

699 citations

Journal Article•DOI•

Least Angle Regression

[...]

Bradley Efron, Trevor Hastie, Iain M. Johnstone, Robert Tibshirani

23 Jun 2004-arXiv: Statistics Theory

TL;DR: Least Angle Regression (LARS) as discussed by the authors is a new model selection algorithm, which is a useful and less greedy version of traditional forward selection methods such as All Subsets, Forward Selection and Backward Elimination.

...read moreread less

547 citations

Journal Article•DOI•

Classification of gene microarrays by penalized logistic regression

[...]

Ji Zhu¹, Trevor Hastie²•Institutions (2)

University of Michigan¹, Stanford University²

01 Jul 2004-Biostatistics

TL;DR: In this paper, the authors proposed penalized logistic regression (PLR) as an alternative to the SVM for the microarray cancer diagnosis problem and showed that when using the same set of genes, PLR and SVM perform similarly in cancer classification, but PLR has the advantage of additionally providing an estimate of the underlying probability.

...read moreread less

Abstract: Classification of patient samples is an important aspect of cancer diagnosis and treatment. The support vector machine (SVM) has been successfully applied to microarray cancer diagnosis problems. However, one weakness of the SVM is that given a tumor sample, it only predicts a cancer class label but does not provide any estimate of the underlying probability. We propose penalized logistic regression (PLR) as an alternative to the SVM for the microarray cancer diagnosis problem. We show that when using the same set of genes, PLR and the SVM perform similarly in cancer classification, but PLR has the advantage of additionally providing an estimate of the underlying probability. Often a primary goal in microarray cancer diagnosis is to identify the genes responsible for the classification, rather than class prediction. We consider two gene selection methods in this paper, univariate ranking (UR) and recursive feature elimination (RFE). Empirical results indicate that PLR combined with RFE tends to select fewer genes than other methods and also performs well in both cross-validation and test samples. A fast algorithm for solving PLR is also described.

...read moreread less

383 citations

Journal Article•

Boosting as a Regularized Path to a Maximum Margin Classifier

[...]

Saharon Rosset¹, Ji Zhu, Trevor Hastie²•Institutions (2)

IBM¹, Stanford University²

01 Dec 2004-Journal of Machine Learning Research

TL;DR: It is built on recent work by Efron et al. to show that boosting approximately (and in some cases exactly) minimizes its loss criterion with an l1 constraint on the coefficient vector, and shows that as the constraint is relaxed the solution converges (in the separable case) to an "l1-optimal" separating hyper-plane.

...read moreread less

Abstract: In this paper we study boosting methods from a new perspective. We build on recent work by Efron et al. to show that boosting approximately (and in some cases exactly) minimizes its loss criterion with an l1 constraint on the coefficient vector. This helps understand the success of boosting with early stopping as regularized fitting of the loss criterion. For the two most commonly used criteria (exponential and binomial log-likelihood), we further show that as the constraint is relaxed---or equivalently as the boosting iterations proceed---the solution converges (in the separable case) to an "l1-optimal" separating hyper-plane. We prove that this l1-optimal separating hyper-plane has the property of maximizing the minimal l1-margin of the training data, as defined in the boosting literature. An interesting fundamental similarity between boosting and kernel support vector machines emerges, as both can be described as methods for regularized optimization in high-dimensional predictor space, using a computational trick to make the calculation practical, and converging to margin-maximizing solutions. While this statement describes SVMs exactly, it applies to boosting only approximately.

...read moreread less

289 citations

Journal Article•DOI•

Sample classification from protein mass spectrometry, by 'peak probability contrasts'

[...]

Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, Scott G. Soltys¹, Gongyi Shi¹, Albert C. Koong¹, Quynh-Thu Le¹ - Show less +3 more•Institutions (1)

Stanford University¹

22 Nov 2004-Bioinformatics

TL;DR: The peak probability contrast method is a potentially useful tool for sample classification from protein mass spectrometry data and performs as well or better than several methods that require the full spectra, rather than just labelled peaks.

...read moreread less

Abstract: Motivation: Early cancer detection has always been a major research focus in solid tumor oncology. Early tumor detection can theoretically result in lower stage tumors, more treatable diseases and ultimately higher cure rates with less treatment-related morbidities. Protein mass spectrometry is a potentially powerful tool for early cancer detection. We propose a novel method for sample classification from protein mass spectrometry data. When applied to spectra from both diseased and healthy patients, the 'peak probability contrast' technique provides a list of all common peaks among the spectra, their statistical significance and their relative importance in discriminating between the two groups. We illustrate the method on matrix-assisted laser desorption and ionization mass spectrometry data from a study of ovarian cancers. Results: Compared to other statistical approaches for class prediction, the peak probability contrast method performs as well or better than several methods that require the full spectra, rather than just labelled peaks. It is also much more interpretable biologically. The peak probability contrast method is a potentially useful tool for sample classification from protein mass spectrometry data. Supplementary Information: http://www.stat.stanford.edu/~tibs/ppc

...read moreread less

218 citations

Journal Article•DOI•

Improved semiparametric time series models of air pollution and mortality

[...]

Francesca Dominici¹, Aidian McDermott¹, Trevor Hastie²•Institutions (2)

Johns Hopkins University¹, Stanford University²

01 Dec 2004-Journal of the American Statistical Association

TL;DR: In this article, the authors provide improvements in semipara... In this paper, they provide an improvement in the quality of the time series analysis of air pollution and health, which is a critical component of the evidence used in the PM Criteria Document.

...read moreread less

Abstract: In 2002, methodological issues around time series analyses of air pollution and health attracted the attention of the scientific community, policy makers, the press, and the diverse stakeholders concerned with air pollution. As the U. S. Environmental Protection Agency (EPA) was finalizing its most recent review of epidemiologic evidence on particulate matter air pollution (PM), statisticians and epidemiologists found that the S–PLUS implementation of generalized additive models (GAMs) can overestimate effects of air pollution and understate statistical uncertainty in time series studies of air pollution and health. This discovery delayed completion of the PM Criteria Document prepared as part of the review of the U. S. National Ambient Air Quality Standard, because the time series findings represented a critical component of the evidence. In addition, it raised concerns about the adequacy of current model formulations and their software implementations. In this article we provide improvements in semipara...

...read moreread less

180 citations

Regularized Discriminant Analysis and Its Application in Microarrays

[...]

Yaqian Guo, Trevor Hastie, Robert Tibshirani¹•Institutions (1)

Stanford University¹

01 Jan 2004

TL;DR: These SCRDA methods generalize the idea of the nearest shrunken centroids of Prediction Analysis of Microarray into the classical discriminant analysis and perform uniformly well in the multivariate classification problems, especially outperform the currently popular PAM.

...read moreread less

Abstract: In this paper, we introduce a family of some modified versions of linear discriminant analysis, called “shrunken centroids regularized discriminant analysis” (SCRDA). These methods generalize the idea of the nearest shrunken centroids of Prediction Analysis of Microarray (PAM) into the classical discriminant analysis. These SCRDA methods are specially designed for classification problems in high dimension low sample size situations, for example microarray data. Through both simulation study and real life data, it is shown that these SCRDA methods perform uniformly well in the multivariate classification problems, especially outperform the currently popular PAM. Some of them are also suitable for feature elimination purpose and can be used as gene selection methods. The open source R codes for these methods are also available and will be added to the R libraries in the near future.

...read moreread less

119 citations

Journal Article•DOI•

Efficient quadratic regularization for expression arrays.

[...]

Trevor Hastie¹, Robert Tibshirani¹•Institutions (1)

Stanford University¹

01 Jul 2004-Biostatistics

TL;DR: This article exposes a class of techniques based on quadratic regularization of linear models, including regularized (ridge) regression, logistic and multinomial regression, linear and mixture discriminant analysis, the Cox model and neural networks, and shows that dramatic computational savings are possible over naive implementations.

...read moreread less

Abstract: SUMMARY Gene expression arrays typically have 50 to 100 samples and 1000 to 20 000 variables (genes). There have been many attempts to adapt statistical models for regression and classification to these data, and in many cases these attempts have challenged the computational resources. In this article we expose a class of techniques based on quadratic regularization of linear models, including regularized (ridge) regression, logistic and multinomial regression, linear and mixture discriminant analysis, the Cox model and neural networks. For all of these models, we show that dramatic computational savings are possible over naive implementations, using standard transformations in numerical linear algebra.

...read moreread less

119 citations

Journal Article•DOI•

Microelectrode recording revealing a somatotopic body map in the subthalamic nucleus in humans with Parkinson disease.

[...]

Pantaleo Romanelli¹, Gary Heit, Bruce C. Hill, Alli Kraus, Trevor Hastie, Helen Bronte-Stewart - Show less +2 more•Institutions (1)

Stanford University¹

01 Apr 2004-Journal of Neurosurgery

TL;DR: Evidence of somatotopic organization in the STN in patients with PD supports the current theory of highly segregated loops integrating cortex-basal ganglia connections that are preserved in chronic degenerative diseases such as PD, but may subserve a distorted body map.

...read moreread less

Abstract: Object. The subthalamic nucleus (STN) is a key structure for motor control through the basal ganglia. The aim of this study was to show that the STN in patients with Parkinson disease (PD) has a somatotopic organization similar to that in nonhuman primates. Methods. A functional map of the STN was obtained using electrophysiological microrecording during placement of deep brain stimulation (DBS) electrodes in patients with PD. Magnetic resonance imaging was combined with ventriculography and intraoperative x-ray film to assess the position of the electrodes and the STN units, which were activated by limb movements to map the sensorimotor region of the STN. Each activated cell was located relative to the anterior commissure—posterior commissure line. Three-dimensional coordinates of the cells were analyzed statistically to determine whether those cells activated by movements of the arm and leg were segregated spatially. Three hundred seventy-nine microelectrode tracks were created during placement of 71 DB...

...read moreread less

Proceedings Article•DOI•

The Sentimental Factor: Improving Review Classification Via Human-Provided Information

[...]

Philip Beineke¹, Trevor Hastie¹, Shivakumar Vaithyanathan²•Institutions (2)

Stanford University¹, IBM²

21 Jul 2004

TL;DR: This work extends an existing procedure by re-interpreting it as a Naive Bayes model for document sentiment by incorporating additional derived features into the model and, where possible, using labeled data to estimate their relative influence.

...read moreread less

Abstract: Sentiment classification is the task of labeling a review document according to the polarity of its prevailing opinion (favorable or unfavorable). In approaching this problem, a model builder often has three sources of information available: a small collection of labeled documents, a large collection of unlabeled documents, and human understanding of language. Ideally, a learning method will utilize all three sources. To accomplish this goal, we generalize an existing procedure that uses the latter two.We extend this procedure by re-interpreting it as a Naive Bayes model for document sentiment. Viewed as such, it can also be seen to extract a pair of derived features that are linearly combined to predict sentiment. This perspective allows us to improve upon previous methods, primarily through two strategies: incorporating additional derived features into the model and, where possible, using labeled data to estimate their relative influence.

...read moreread less

Proceedings Article•

A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning

[...]

Saharon Rosset¹, Ji Zhu², Hui Zou³, Trevor Hastie³•Institutions (3)

IBM¹, University of Michigan², Stanford University³

01 Dec 2004

TL;DR: A method of moments for estimating this stochastic dependence using the unlabeled data in semi-supervised learning, where the "label sampling" mechanism stochastically depends on the true response (as well as potentially on the features).

...read moreread less

Abstract: We consider the situation in semi-supervised learning, where the "label sampling" mechanism stochastically depends on the true response (as well as potentially on the features). We suggest a method of moments for estimating this stochastic dependence using the unlabeled data. This is potentially useful for two distinct purposes: a. As an input to a supervised learning procedure which can be used to "de-bias" its results using labeled data only and b. As a potentially interesting learning task in itself. We present several examples to illustrate the practical usefulness of our method.

...read moreread less

Proceedings Article•

The Entire Regularization Path for the Support Vector Machine

[...]

Saharon Rosset¹, Robert Tibshirani², Ji Zhu³, Trevor Hastie²•Institutions (3)

IBM¹, Stanford University², University of Michigan³

01 Dec 2004

TL;DR: In this article, the authors argue that the choice of the SVM cost parameter can be critical and derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model.

...read moreread less

Abstract: In this paper we argue that the choice of the SVM cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model.

...read moreread less

Journal Article•

Mitral annular size predicts Alfieri stitch tension in mitral edge-to-edge repair.

[...]

Tomasz A. Timek¹, Sten Lyager Nielsen², David T. Lai¹, Frederick A. Tibayan¹, David Liang¹, George T. Daughters³, George T. Daughters¹, Philip Beineke¹, Trevor Hastie¹, Neil B. Ingels¹, Neil B. Ingels³, D. Craig Miller¹ - Show less +8 more•Institutions (3)

Stanford University¹, Aarhus University², Palo Alto Medical Foundation³

01 Mar 2004-Journal of Heart Valve Disease

TL;DR: Tension on the leaflets in the edge-to-edge repair is determined primarily by MA SL size, and paradoxically is lower when the contractile state is enhanced, indicating that annular and/or LV dilatation increase stitch tension and may adversely affect durability of the repair if concomitant ring annuloplasty is not performed.

...read moreread less

Abstract: Background and aim of the study Whilst increased 'Alfieri stitch' tension may reduce the durability of 'edge-to-edge' mitral repair, the factors affecting suture tension are unknown. In order to study hemodynamics and left ventricular (LV) and annular dynamics that determine suture tension, the central edge of the mitral leaflets was approximated with a miniature force transducer to measure leaflet tension (T) at the leaflet approximation point. Methods Eight sheep were studied under open-chest conditions immediately after surgical placement of a force transducer and implantation of radiopaque markers on the left ventricle and mitral annulus (MA). Hemodynamic variables were altered by two caval occlusion steps (deltaV1 and deltaV2) and dobutamine infusion. Three-dimensional marker coordinates were obtained by simultaneous biplane videofluoroscopy to measure LV volume, MA area (MAA) and septal-lateral (SL) annular dimension throughout the cardiac cycle. Results At baseline, peak Alfieri stitch tension (0.30 +/- 0.18 N) was observed 96 +/- 61 ms prior to end-diastole coincident with peak annular SL diameter (98 +/- 58 ms before end-diastole). Dobutamine infusion decreased suture tension (from 0.30 +/- 0.18 N to 0.20 +/- 0.12 N, p = 0.01), although peak systolic pressure increased significantly (138 +/- 19 versus 115 +/- 14 mmHg; p = 0.03). A regression model was fitted with the goal of interpreting the hemodynamic and geometric predictors of tension as their influence varied with time: Tt (N) = 0.1916 + 0.2115 x SL (cm) - 0.1996 x MAA/SL (cm2/cm) + ft x LVP (mmHg), where Tt is tension at any time during the cardiac cycle and ft is the time-varying coefficient of LVP. Conclusion Tension on the leaflets in the edge-to-edge repair is determined primarily by MA SL size, and paradoxically is lower when the contractile state is enhanced. This indicates that annular and/or LV dilatation increase stitch tension and may adversely affect durability of the repair if concomitant ring annuloplasty is not performed.

...read moreread less

Journal Article•DOI•

Discussions of boosting papers, and rejoinders

[...]

Peter L. Bartlett, Peter J. Bickel, Peter Bühlmann, Yoav Freund, Jerome H. Friedman, Trevor Hastie, Wenxin Jiang, Michael J. Jordan, Vladimir Koltchinskii, Gábor Lugosi, Jon McAuliffe, Ya'acov Ritov, Saharan Rosset, Robert E. Schapire, Robert Tibshirani, Nicolas Vayatis, Bin Yu, Tong Zhang, Ji Zhu - Show less +15 more

01 Feb 2004-Annals of Statistics

TL;DR: In this article, Jiang et al. discuss process consistency for AdaBoost and the Bayes-risk consistency of regularized boosting methods, including convex risk minimization, and statistical behavior and consistency of classification methods.

...read moreread less

Abstract: Discussions of: "Process consistency for AdaBoost" [Ann. Statist. 32 (2004), no. 1, 13-29] by W. Jiang; "On the Bayes-risk consistency of regularized boosting methods" [ibid., 30-55] by G. Lugosi and N. Vayatis; and "Statistical behavior and consistency of classification methods based on convex risk minimization" [ibid., 56-85] by T. Zhang. Includes rejoinders by the authors.

...read moreread less