scispace - formally typeset
Search or ask a question

Showing papers by "Robert Tibshirani published in 2011"


Journal ArticleDOI
TL;DR: In this article, the authors give a brief review of the basic idea and some history and then discuss some developments since the original paper on regression shrinkage and selection via the lasso.
Abstract: Summary. In the paper I give a brief review of the basic idea and some history and then discuss some developments since the original paper on regression shrinkage and selection via the lasso.

3,054 citations


Journal ArticleDOI
TL;DR: This work introduces a pathwise algorithm for the Cox proportional hazards model, regularized by convex combinations of ℓ1 andℓ2 penalties (elastic net), and employs warm starts to find a solution along a regularization path.
Abstract: We introduce a pathwise algorithm for the Cox proportional hazards model, regularized by convex combinations of l1 and l2 penalties (elastic net). Our algorithm fits via cyclical coordinate descent, and employs warm starts to find a solution along a regularization path. We demonstrate the efficacy of our algorithm on real and simulated data sets, and find considerable speedup between our algorithm and competing methods.

1,579 citations


Journal ArticleDOI
TL;DR: This work proposes penalized LDA, which is a general approach for penalizing the discriminant vectors in Fisher's discriminant problem in a way that leads to greater interpretability, and uses a minorization–maximization approach to optimize it efficiently when convex penalties are applied to the discriminating vectors.
Abstract: We consider the supervised classification setting, in which the data consist of p features measured on n observations, each of which belongs to one of K classes. Linear discriminant analysis (LDA) is a classical method for this problem. However, in the high-dimensional setting where p ≫ n, LDA is not appropriate for two reasons. First, the standard estimate for the within-class covariance matrix is singular, and so the usual discriminant rule cannot be applied. Second, when p is large, it is difficult to interpret the classification rule obtained from LDA, since it involves all p features. We propose penalized LDA, a general approach for penalizing the discriminant vectors in Fisher's discriminant problem in a way that leads to greater interpretability. The discriminant problem is not convex, so we use a minorization-maximization approach in order to efficiently optimize it when convex penalties are applied to the discriminant vectors. In particular, we consider the use of L(1) and fused lasso penalties. Our proposal is equivalent to recasting Fisher's discriminant problem as a biconvex problem. We evaluate the performances of the resulting methods on a simulation study, and on three gene expression data sets. We also survey past methods for extending LDA to the high-dimensional setting, and explore their relationships with our proposal.

405 citations


Journal ArticleDOI
TL;DR: The proposed penalized maximum likelihood problem is not convex, so the method can be used to solve a previously studied special case in which a desired sparsity pattern is prespecified, and it uses a majorize-minimize approach in which it iteratively solve convex approximations to the original nonconvex problem.
Abstract: We suggest a method for estimating a covariance matrix on the basis of a sample of vectors drawn from a multivariate normal distribution. In particular, we penalize the likelihood with a lasso penalty on the entries of the covariance matrix. This penalty plays two important roles: it reduces the effective number of parameters, which is important even when the dimension of the vectors is smaller than the sample size since the number of parameters grows quadratically in the number of variables, and it produces an estimate which is sparse. In contrast to sparse inverse covariance estimation, our method’s close relative, the sparsity attained here is in the covariance matrix itself rather than in the inverse matrix. Zeros in the covariance matrix correspond to marginal independencies; thus, our method performs model selection while providing a positive definite estimate of the covariance. The proposed penalized maximum likelihood problem is not convex, so we use a majorize-minimize approach in which we iteratively solve convex approximations to the original nonconvex problem. We discuss tuning parameter selection and demonstrate on a flow-cytometry dataset how our method produces an interpretable graphical display of the relationship between variables. We perform simulations that suggest that simple elementwise thresholding of the empirical covariance matrix is competitive with our method for identifying the sparsity structure. Additionally, we show how our method can be used to solve a previously studied special case in which a desired sparsity pattern is prespecified.

307 citations


Journal ArticleDOI
TL;DR: It is proved that minimax linkage has a number of desirable theoretical properties; for example, minimax-linkage dendrograms cannot have inversions (unlike centroid linkage) and is robust against certain perturbations of a dataset.
Abstract: Agglomerative hierarchical clustering is a popular class of methods for understanding the structure of a dataset. The nature of the clustering depends on the choice of linkage-that is, on how one measures the distance between clusters. In this article we investigate minimax linkage, a recently introduced but little-studied linkage. Minimax linkage is unique in naturally associating a prototype chosen from the original dataset with every interior node of the dendrogram. These prototypes can be used to greatly enhance the interpretability of a hierarchical clustering. Furthermore, we prove that minimax linkage has a number of desirable theoretical properties; for example, minimax-linkage dendrograms cannot have inversions (unlike centroid linkage) and is robust against certain perturbations of a dataset. We provide an efficient implementation and illustrate minimax linkage's strengths as a data analysis and visualization tool on a study of words from encyclopedia articles and on a dataset of images of human faces.

157 citations


Journal ArticleDOI
TL;DR: The expression of specific miRNAs may be useful for DLBCL survival prediction and their role in the pathogenesis of this disease should be examined further.
Abstract: Purpose: Diffuse large B-cell lymphoma (DLBCL) heterogeneity has prompted investigations for new biomarkers that can accurately predict survival. A previously reported 6-gene model combined with the International Prognostic Index (IPI) could predict patients9 outcome. However, even these predictors are not capable of unambiguously identifying outcome, suggesting that additional biomarkers might improve their predictive power. Experimental Design: We studied expression of 11 microRNAs (miRNA) that had previously been reported to have variable expression in DLBCL tumors. We measured the expression of each miRNA by quantitative real-time PCR analyses in 176 samples from uniformly treated DLBCL patients and correlated the results to survival. Results: In a univariate analysis, the expression of miR-18a correlated with overall survival (OS), whereas the expression of miR-181a and miR-222 correlated with progression-free survival (PFS). A multivariate Cox regression analysis including the IPI, the 6-gene model–derived mortality predictor score and expression of the miR-18a, miR-181a, and miR-222, revealed that all variables were independent predictors of survival except the expression of miR-222 for OS and the expression of miR-18a for PFS. Conclusion: The expression of specific miRNAs may be useful for DLBCL survival prediction and their role in the pathogenesis of this disease should be examined further. Clin Cancer Res; 17(12); 4125–35. ©2011 AACR .

138 citations


Journal ArticleDOI
TL;DR: In this article, a method for selecting prototypes in the classification setting (in which the samples fall into known discrete categories) is discussed, based on three basic properties that a good prototype set should satisfy.
Abstract: Prototype methods seek a minimal subset of samples that can serve as a distillation or condensed view of a data set. As the size of modern data sets grows, being able to present a domain specialist with a short list of “representative” samples chosen from the data set is of increasing interpretative value. While much recent statistical research has been focused on producing sparse-in-the-variables methods, this paper aims at achieving sparsity in the samples. We discuss a method for selecting prototypes in the classification setting (in which the samples fall into known discrete categories). Our method of focus is derived from three basic properties that we believe a good prototype set should satisfy. This intuition is translated into a set cover optimization problem, which we solve approximately using standard approaches. While prototype selection is usually viewed as purely a means toward building an efficient classifier, in this paper we emphasize the inherent value of having a set of prototypical elements. That said, by using the nearest-neighbor rule on the set of prototypes, we can of course discuss our method as a classifier as well. We demonstrate the interpretative value of producing prototypes on the well-known USPS ZIP code digits data set and show that as a classifier it performs reasonably well. We apply the method to a proteomics data set in which the samples are strings and therefore not naturally embedded in a vector space. Our method is compatible with any dissimilarity measure, making it amenable to situations in which using a non-Euclidean metric is desirable or even necessary.

126 citations


Journal ArticleDOI
TL;DR: A simple algorithm is devised to solve for the path of solutions, which can be viewed as a modified version of the well-known pool adjacent violators algorithm, and computes the entire path in O(n) operations (n being the number of data points).
Abstract: We consider the problem of approximating a sequence of data points with a “nearly-isotonic,” or nearly-monotone function. This is formulated as a convex optimization problem that yields a family of solutions, with one extreme member being the standard isotonic regression fit. We devise a simple algorithm to solve for the path of solutions, which can be viewed as a modified version of the well-known pool adjacent violators algorithm, and computes the entire path in O(n log n) operations (n being the number of data points). In practice, the intermediate fits can be used to examine the assumption of monotonicity. Nearly-isotonic regression admits a nice property in terms of its degrees of freedom: at any point along the path, the number of joined pieces in the solution is an unbiased estimate of its degrees of freedom. We also extend the ideas to provide “nearly-convex” approximations.

118 citations


Journal ArticleDOI
TL;DR: A procedure called the Fused Lasso Latent Feature Model (FLLat) is proposed that provides a statistical framework for modeling multi-sample aCGH data and identifying regions of copy number variation (CNV) and a method for estimating the false discovery rate.
Abstract: Array-based comparative genomic hybridization (aCGH) enables the measurement of DNA copy number across thousands of locations in a genome. The main goals of analyzing aCGH data are to identify the regions of copy number variation (CNV) and to quantify the amount of CNV. Although there are many methods for analyzing single-sample aCGH data, the analysis of multi-sample aCGH data is a relatively new area of research. Further, many of the current approaches for analyzing multi-sample aCGH data do not appropriately utilize the additional information present in the multiple samples. We propose a procedure called the Fused Lasso Latent Feature Model (FLLat) that provides a statistical framework for modeling multi-sample aCGH data and identifying regions of CNV. The procedure involves modeling each sample of aCGH data as a weighted sum of a fixed number of features. Regions of CNV are then identified through an application of the fused lasso penalty to each feature. Some simulation analyses show that FLLat outperforms single-sample methods when the simulated samples share common information. We also propose a method for estimating the false discovery rate. An analysis of an aCGH data set obtained from human breast tumors, focusing on chromosomes 8 and 17, shows that FLLat and Significance Testing of Aberrant Copy number (an alternative, existing approach) identify similar regions of CNV that are consistent with previous findings. However, through the estimated features and their corresponding weights, FLLat is further able to discern specific relationships between the samples, for example, identifying 3 distinct groups of samples based on their patterns of CNV for chromosome 17.

74 citations


Journal ArticleDOI
TL;DR: A fast data-driven procedure for automatically constructing indices for linear, logistic, and Cox regression models and extending the procedure to create indices for detecting treatment-marker interactions is proposed.
Abstract: We use the term "index predictor" to denote a score that consists of K binary rules such as "age > 60" or "blood pressure > 120 mm Hg." The index predictor is the sum of these binary scores, yielding a value from 0 to K. Such indices as often used in clinical studies to stratify population risk: They are usually derived from subject area considerations. In this paper, we propose a fast data-driven procedure for automatically constructing such indices for linear, logistic, and Cox regression models. We also extend the procedure to create indices for detecting treatment-marker interactions. The methods are illustrated on a study with protein biomarkers as well as a large microarray gene expression study.

41 citations


Journal ArticleDOI
TL;DR: In this article, a supervised multidimensional scaling (SMDS) method is proposed to find a set of configuration points z"1,...,z"n@?R^S such that D"i"j~@?z"i-z"j@?"2, and such that z"i''s>z''s for s = 1,...,S tends to occur when y''i>y''j.

Journal ArticleDOI
TL;DR: In this paper, the authors evaluated the outcome of patients with leiomyosarcoma (LMS) from a single institution according to the number of TAMs evaluated through 3 CSF1 associated proteins.
Abstract: INTRODUCTION High numbers of tumor-associated macrophages (TAMs) have been associated with poor outcome in several solid tumors. In 2 previous studies, we showed that colony stimulating factor-1 (CSF1) is secreted by leiomyosarcoma (LMS) and that the increase in macrophages and CSF1 associated proteins are markers for poor prognosis in both gynecologic and nongynecologic LMS in a multicentered study. The purpose of this study is to evaluate the outcome of patients with LMS from a single institution according to the number of TAMs evaluated through 3 CSF1 associated proteins. METHODS Patients with LMS treated at Stanford University with adequate archived tissue and clinical data were eligible for this retrospective study. Data from chart reviews included tumor site, size, grade, stage, treatment, and disease status at the time of last follow-up. The 3 CSF1 associated proteins (CD163, CD16, and cathepsin L) were evaluated by immunohistochemistry on tissue microarrays. Kaplan-Meier survival curves and univariate Cox proportional hazards models were fit to assess the association of clinical predictors as well as CSF1 associated proteins with overall survival. RESULTS A total of 52 patients diagnosed from 1983 to 2007 were evaluated. Univariate Cox proportional hazards models were fit to assess the significance of grade, size, stage, and the 3 CSF1 associated proteins in predicting OS. Grade, size, and stage were not significantly associated with survival in the full patient cohort, but grade and stage were significant predictors of survival in the gynecologic (GYN) LMS samples (P = 0.038 and P = 0.0164, respectively). Increased cathepsin L was associated with a worse outcome in GYN LMS (P = 0.049). Similar findings were seen with CD16 (P < 0.0001). In addition, CSF1 response enriched (all 3 stains positive) GYN LMS had a poor overall survival when compared with CSF1 response poor tumors (P = 0.001). These results were not seen in non-GYN LMS. CONCLUSIONS Our data form an independent confirmation of the prognostic significance of TAMs and the CSF1 associated proteins in LMS. More aggressive or targeted therapies could be considered in the subset of LMS patients that highly express these markers.

Journal ArticleDOI
TL;DR: This work applies Bayesian gene set analysis to a gene expression microarray data set on 50 cancer cell lines to identify pathways that are associated with the mutational status in the gene p53, and identifies several significant pathways with strong biological connections.
Abstract: We propose a hierarchical Bayesian model for analyzing gene expression data to identify pathways differentiating between two biological states (e.g., cancer vs. non-cancer and mutant vs. normal). Finding significant pathways can improve our understanding of biological processes. When the biological process of interest is related to a specific disease, eliciting a better understanding of the underlying pathways can lead to designing a more effective treatment. We apply our method to data obtained by interrogating the mutational status of p53 in 50 cancer cell lines (33 mutated and 17 normal). We identify several significant pathways with strong biological connections. We show that our approach provides a natural framework for incorporating prior biological information, and it has the best overall performance in terms of correctly identifying significant pathways compared to several alternative methods.

Posted Content
TL;DR: In this paper, a regularized model which adaptively pools elements of the precision matrices is proposed, which decreases the variance of our estimates without overly biasing them, and is shown to be effective on real and simulated datasets.
Abstract: Linear and Quadratic Discriminant analysis (LDA/QDA) are common tools for classification problems. For these methods we assume observations are normally distributed within group. We estimate a mean and covariance matrix for each group and classify using Bayes theorem. With LDA, we estimate a single, pooled covariance matrix, while for QDA we estimate a separate covariance matrix for each group. Rarely do we believe in a homogeneous covariance structure between groups, but often there is insufficient data to separately estimate covariance matrices. We propose L1- PDA, a regularized model which adaptively pools elements of the precision matrices. Adaptively pooling these matrices decreases the variance of our estimates (as in LDA), without overly biasing them. In this paper, we propose and discuss this method, give an efficient algorithm to fit it for moderate sized problems, and show its efficacy on real and simulated datasets.