scispace - formally typeset
Search or ask a question

Showing papers by "Robert Tibshirani published in 2007"


Journal ArticleDOI
TL;DR: It is shown that coordinate descent is very competitive with the well-known LARS procedure in large lasso problems, can deliver a path of solutions efficiently, and can be applied to many other convex statistical problems such as the garotte and elastic net.
Abstract: We consider ``one-at-a-time'' coordinate-wise descent algorithms for a class of convex optimization problems. An algorithm of this kind has been proposed for the $L_1$-penalized regression (lasso) in the literature, but it seems to have been largely ignored. Indeed, it seems that coordinate-wise algorithms are not often used in convex optimization. We show that this algorithm is very competitive with the well-known LARS (or homotopy) procedure in large lasso problems, and that it can be applied to related methods such as the garotte and elastic net. It turns out that coordinate-wise descent does not work in the ``fused lasso,'' however, so we derive a generalized algorithm that yields the solution in much less time that a standard convex optimizer. Finally, we generalize the procedure to the two-dimensional fused lasso, and demonstrate its performance on some image smoothing problems.

1,785 citations


Journal ArticleDOI
TL;DR: In this paper, coordinate-wise descent is used to solve the L1-penalized regression problem in the fused lasso problem, which is a non-separable problem in which coordinate descent does not work.
Abstract: We consider “one-at-a-time” coordinate-wise descent algorithms for a class of convex optimization problems. An algorithm of this kind has been proposed for the L1-penalized regression (lasso) in the literature, but it seems to have been largely ignored. Indeed, it seems that coordinate-wise algorithms are not often used in convex optimization. We show that this algorithm is very competitive with the well-known LARS (or homotopy) procedure in large lasso problems, and that it can be applied to related methods such as the garotte and elastic net. It turns out that coordinate-wise descent does not work in the “fused lasso,” however, so we derive a generalized algorithm that yields the solution in much less time that a standard convex optimizer. Finally, we generalize the procedure to the two-dimensional fused lasso, and demonstrate its performance on some image smoothing problems. 1. Introduction. In this paper we consider statistical models that lead to convex optimization problems with inequality constraints. Typically, the optimization for these problems is carried out using a standard quadratic programming algorithm. The purpose of this paper is to explore “one-at-a-time” coordinate-wise descent algorithms for these problems. The equivalent of a coordinate descent algorithm has been proposed for the L1-penalized regression (lasso) in the literature, but it is not commonly used. Moreover, coordinate-wise algorithms seem too simple, and they are not often used in convex optimization, perhaps because they only work in specialized problems. We ourselves never appreciated the value of coordinate descent methods for convex statistical problems before working on this paper. In this paper we show that coordinate descent is very competitive with the wellknown LARS (or homotopy) procedure in large lasso problems, can deliver a path of solutions efficiently, and can be applied to many other convex statistical problems such as the garotte and elastic net. We then go on to explore a nonseparable problem in which coordinate-wise descent does not work—the “fused lasso.” We derive a generalized algorithm that yields the solution in much less time that a standard convex optimizer. Finally, we generalize the procedure to

1,619 citations


Journal ArticleDOI
TL;DR: Biological analysis of the 18 proteins found in blood plasma points to systemic dysregulation of hematopoiesis, immune responses, apoptosis and neuronal support in presymptomatic Alzheimer's disease.
Abstract: A molecular test for Alzheimer's disease could lead to better treatment and therapies. We found 18 signaling proteins in blood plasma that can be used to classify blinded samples from Alzheimer's and control subjects with close to 90% accuracy and to identify patients who had mild cognitive impairment that progressed to Alzheimer's disease 2-6 years later. Biological analysis of the 18 proteins points to systemic dysregulation of hematopoiesis, immune responses, apoptosis and neuronal support in presymptomatic Alzheimer's disease.

1,038 citations


Journal ArticleDOI
TL;DR: The number of nonzero coefficients is an unbiased estimate for the degrees of freedom of the lasso—a conclusion that requires no special assumption on the predictors and the unbiased estimator is shown to be asymptotically consistent.
Abstract: We study the effective degrees of freedom of the lasso in the framework of Stein’s unbiased risk estimation (SURE). We show that the number of nonzero coefficients is an unbiased estimate for the degrees of freedom of the lasso—a conclusion that requires no special assumption on the predictors. In addition, the unbiased estimator is shown to be asymptotically consistent. With these results on hand, various model selection criteria—Cp, AIC and BIC—are available, which, along with the LARS algorithm, provide a principled and efficient approach to obtaining the optimal lasso fit with the computational effort of a single ordinary least-squares fit.

894 citations


Journal ArticleDOI
TL;DR: This paper discusses the problem of identifyingially expressed groups of genes from a microarray experiment and proposes two potential improvements to GSEA: the maxmean statistic for summarizing gene-sets, and restandardization for more accurate inferences.
Abstract: This paper discusses the problem of identifying differentially expressed groups of genes from a microarray experiment. The groups of genes are externally defined, for example, sets of gene pathways derived from biological databases. Our starting point is the interesting Gene Set Enrichment Analysis (GSEA) procedure of Subramanian et al. [Proc. Natl. Acad. Sci. USA 102 (2005) 15545–15550]. We study the problem in some generality and propose two potential improvements to GSEA: the maxmean statistic for summarizing gene-sets, and restandardization for more accurate inferences. We discuss a variety of examples and extensions, including the use of gene-set scores for class predictions. We also describe a new R language package GSA that implements our ideas.

730 citations


Journal ArticleDOI
TL;DR: Through both simulated data and real life data, it is shown that this method performs very well in multivariate classification problems, often outperforms the PAM method and can be as competitive as the support vector machines classifiers.
Abstract: In this paper, we introduce a modified version of linear discriminant analysis, called the "shrunken centroids regularized discriminant analysis" (SCRDA). This method generalizes the idea of the "nearest shrunken centroids" (NSC) (Tibshirani and others, 2003) into the classical discriminant analysis. The SCRDA method is specially designed for classification problems in high dimension low sample size situations, for example, microarray data. Through both simulated data and real life data, it is shown that this method performs very well in multivariate classification problems, often outperforms the PAM method (using the NSC algorithm) and can be as competitive as the support vector machines classifiers. It is also suitable for feature elimination purpose and can be used as gene selection method. The open source R package for this method (named "rda") is available on CRAN (http://www.r-project.org) for download and testing.

602 citations


Journal ArticleDOI
01 Jul 2007-Blood
TL;DR: T-ALL cell growth was suppressed in a highly synergistic manner by simultaneous treatment with the mTOR inhibitor rapamycin and GSI, which represents a rational drug combination for treating this aggressive human malignancy.

276 citations


Journal ArticleDOI
TL;DR: The least angle regression and forward stagewise algorithms for solving penalized least squares regression problems are considered and a condition under which the coefficient paths of the lasso are monotone is studied, and hence the different algorithms coincide.
Abstract: We consider the least angle regression and forward stagewise algorithms for solving penalized least squares regression problems. In Efron, Hastie, Johnstone & Tibshirani (2004) it is proved that the least angle regression algorithm, with a small modification, solves the lasso regression problem. Here we give an analogous result for incremental forward stagewise regression, showing that it solves a version of the lasso problem that enforces monotonicity. One consequence of this is as follows: while lasso makes optimal progress in terms of reducing the residual sum-of-squares per unit increase in $L_1$-norm of the coefficient $\beta$, forward stage-wise is optimal per unit $L_1$ arc-length traveled along the coefficient path. We also study a condition under which the coefficient paths of the lasso are monotone, and hence the different algorithms coincide. Finally, we compare the lasso and forward stagewise procedures in a simulation study involving a large number of correlated predictors.

195 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider the least angle regression and forward stagewise algorithms for solving penalized least squares regression problems, and show that the latter is a monotone version of the lasso.
Abstract: We consider the least angle regression and forward stagewise algorithms for solving penalized least squares regression problems. In Efron et al. (2004) it is proven that the least angle regression algorithm, with a small modication, solves the lasso (L1 constrained) regression problem. Here we give an analogous result for incremental forward stagewise regression, showing that it ts a monotone version of the lasso. We also study a condition under which the coecien t paths of the lasso are monotone, and hence the dieren t algorithms all coincide. Finally, we compare the lasso and forward stagewise procedures in a simulation study involving a large number of correlated predictors.

182 citations


Journal ArticleDOI
TL;DR: By averaging the genes within the clusters obtained from hierarchical clustering, supergenes are defined and used to fit regression models, thereby attaining concise interpretation and accuracy in regression of DNA microarray data.
Abstract: SUMMARY Although averaging is a simple technique, it plays an important role in reducing variance. We use this essential property of averaging in regression of the DNA microarray data, which poses the challenge of having far more features than samples. In this paper, we introduce a two-step procedure that combines (1) hierarchical clustering and (2) Lasso. By averaging the genes within the clusters obtained from hierarchical clustering, we define supergenes and use them to fit regression models, thereby attaining concise interpretation and accuracy. Our methods are supported with theoretical justifications and demonstrated on simulated and real data sets.

179 citations


Journal ArticleDOI
TL;DR: A method for detecting genes that, in a disease group, exhibit unusually high gene expression in some but not all samples, which can be particularly useful in cancer studies, where mutations that can amplify or turn off gene expression often occur in only a minority of samples.
Abstract: We propose a method for detecting genes that, in a disease group, exhibit unusually high gene expression in some but not all samples. This can be particularly useful in cancer studies, where mutations that can amplify or turn off gene expression often occur in only a minority of samples. In real and simulated examples, the new method often exhibits lower false discovery rates than simple t-statistic thresholding. We also compare our approach to the recent cancer profile outlier analysis proposal of Tomlins and others (2005).

Journal ArticleDOI
TL;DR: The connection between reproducibility and prediction accuracy is taken advantage to develop a validation procedure for clusters found in datasets independent of the one in which they were characterized and the IGP is the best measure of prediction accuracy.
Abstract: In many microarray studies, a cluster defined on one dataset is sought in an independent dataset. If the cluster is found in the new dataset, the cluster is said to be "reproducible" and may be biologically significant. Classifying a new datum to a previously defined cluster can be seen as predicting which of the previously defined clusters is most similar to the new datum. If the new data classified to a cluster are similar, molecularly or clinically, to the data already present in the cluster, then the cluster is reproducible and the corresponding prediction accuracy is high. Here, we take advantage of the connection between reproducibility and prediction accuracy to develop a validation procedure for clusters found in datasets independent of the one in which they were characterized. We define a cluster quality measure called the "in-group proportion" (IGP) and introduce a general procedure for individually validating clusters. Using simulations and real breast cancer datasets, the IGP is compared to four other popular cluster quality measures (homogeneity score, separation score, silhouette width, and weighted average discrepant pairs score). Moreover, simulations and the real breast cancer datasets are used to compare the four versions of the validation procedure which all use the IGP, but differ in the way in which the null distributions are generated. We find that the IGP is the best measure of prediction accuracy, and one version of the validation procedure is the more widely applicable than the other three. An implementation of this algorithm is in a package called "clusterRepro" available through The Comprehensive R Archive Network (http://cran.r-project.org).

Journal ArticleDOI
14 Sep 2007-Science
TL;DR: Although the biological methodology in Sjöblom et al. is sound, more samples are needed to achieve sufficient power, and few genes with significantly elevated mutation rates remain.
Abstract: Sjoblom et al (Research Article, 13 October 2006, p 268) reported nearly 200 novel cancer genes said to have a 90% probability of being involved in colon or breast cancer However, their analysis raises two statistical concerns When these concerns are addressed, few genes with significantly elevated mutation rates remain Although the biological methodology in Sjoblom et al is sound, more samples are needed to achieve sufficient power

Journal ArticleDOI
TL;DR: A new method is presented by identifying genes that undergo abrupt transitions in expression level, and the time at which the transitions occur, by identifying groups of genes that change in similar ways and at similar times.
Abstract: This article presents a new method for analyzing microarray time courses by identifying genes that undergo abrupt transitions in expression level, and the time at which the transitions occur. The algorithm matches the sequence of expression levels for each gene against temporal patterns having one or two transitions between two expression levels. The algorithm reports a P-value for the matching pattern of each gene, and a global false discovery rate can also be computed. After matching, genes can be sorted by the direction and time of transitions. Genes can be partitioned into sets based on the direction and time of change for further analysis, such as comparison with Gene Ontology annotations or binding site motifs. The method is evaluated on simulated and actual time-course data. On microarray data for budding yeast, it is shown that the groups of genes that change in similar ways and at similar times have significant and relevant Gene Ontology annotations.

Journal ArticleDOI
TL;DR: A panel of hypoxia-related tissue markers that correlates with treatment outcomes in HNSCC is identified and validated, with a strong correlation between lysyl oxidase, ephrin A1, and galectin-1 and CA IX staining.
Abstract: Purpose To investigate the expression pattern of hypoxia-induced proteins identified as being involved in malignant progression of head-and-neck squamous cell carcinoma (HNSCC) and to determine their relationship to tumor pO 2 and prognosis. Methods and Materials We performed immunohistochemical staining of hypoxia-induced proteins (carbonic anhydrase IX [CA IX], BNIP3L, connective tissue growth factor, osteopontin, ephrin A1, hypoxia inducible gene-2, dihydrofolate reductase, galectin-1, IκB kinase β, and lysyl oxidase) on tumor tissue arrays of 101 HNSCC patients with pretreatment pO 2 measurements. Analysis of variance and Fisher's exact tests were used to evaluate the relationship between marker expression, tumor pO 2 , and CA IX staining. Cox proportional hazard model and log–rank tests were used to determine the relationship between markers and prognosis. Results Osteopontin expression correlated with tumor pO 2 (Eppendorf measurements) ( p = 0.04). However, there was a strong correlation between lysyl oxidase, ephrin A1, and galectin-1 and CA IX staining. These markers also predicted for cancer-specific survival and overall survival on univariate analysis. A hypoxia score of 0–5 was assigned to each patient, on the basis of the presence of strong staining for these markers, whereby a higher score signifies increased marker expression. On multivariate analysis, increasing hypoxia score was an independent prognostic factor for cancer-specific survival ( p = 0.015) and was borderline significant for overall survival ( p = 0.057) when adjusted for other independent predictors of outcomes (hemoglobin and age). Conclusions We identified a panel of hypoxia-related tissue markers that correlates with treatment outcomes in HNSCC. Validation of these markers will be needed to determine their utility in identifying patients for hypoxia-targeted therapy.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a method for variable selection that first estimates the regression function, yielding a "pre-conditioned" response variable, and then applies a standard procedure such as forward stepwise selection or the LASSO to the preconditioned response variable.
Abstract: We consider regression problems where the number of predictors greatly exceeds the number of observations. We propose a method for variable selection that first estimates the regression function, yielding a "pre-conditioned" response variable. The primary method used for this initial regression is supervised principal components. Then we apply a standard procedure such as forward stepwise selection or the LASSO to the pre-conditioned response variable. In a number of simulated and real data examples, this two-step procedure outperforms forward stepwise selection or the usual LASSO (applied directly to the raw outcome). We also show that under a certain Gaussian latent variable model, application of the LASSO to the pre-conditioned response variable is consistent as the number of predictors and observations increases. Moreover, when the observational noise is rather large, the suggested procedure can give a more accurate estimate than LASSO. We illustrate our method on some real problems, including survival analysis with microarray data.

Journal ArticleDOI
16 Nov 2007-Blood
TL;DR: It is demonstrated that the prognostic value of the 6-gene model remains significant in the era of R-CHOP treatment and that the model can be applied to routine FFPE tissue from initial diagnostic biopsies.

Posted Content
TL;DR: A simple algorithm, using a coordinate descent procedure for the lasso, is developed that solves a 1000 node problem in at most a minute, and is 30 to 4000 times faster than competing methods.
Abstract: We consider the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. Using a coordinate descent procedure for the lasso, we develop a simple algorithm| the Graphical Lasso| that is remarkably fast: it solves a 1000 node problem (» 500; 000 parameters) in at most a minute, and is 30 to 4000 times faster than competing methods. It also provides a conceptual link between the exact problem and the approximation suggested by Meinshausen & B˜ uhlmann (2006). We illustrate the method on some cell-signaling data from proteomics.

Journal ArticleDOI
TL;DR: Clustering and statistical analysis supports the finding that malignant tumors generally show broad mis regulation of mitotic APC/C substrates not seen in benign tumors, suggesting that a "mitotic profile" in tumors may result from misregulation of the APC-C destruction pathway.
Abstract: The fidelity of cell division is dependent on the accumulation and ordered destruction of critical protein regulators. By triggering the appropriately timed, ubiquitin-dependent proteolysis of the mitotic regulatory proteins securin, cyclin B, aurora A kinase, and polo-like kinase 1, the anaphase promoting complex/cyclosome (APC/C) ubiquitin ligase plays an essential role in maintaining genomic stability. Misexpression of these APC/C substrates, individually, has been implicated in genomic instability and cancer. However, no comprehensive survey of the extent of their misregulation in tumors has been performed. Here, we analyzed more than 1600 benign and malignant tumors by immunohistochemical staining of tissue microarrays and found frequent overexpression of securin, polo-like kinase 1, aurora A, and Skp2 in malignant tumors. Positive and negative APC/C regulators, Cdh1 and Emi1, respectively, were also more strongly expressed in malignant versus benign tumors. Clustering and statistical analysis supports the finding that malignant tumors generally show broad misregulation of mitotic APC/C substrates not seen in benign tumors, suggesting that a “mitotic profile” in tumors may result from misregulation of the APC/C destruction pathway. This profile of misregulated mitotic APC/C substrates and regulators in malignant tumors suggests that analysis of this pathway may be diagnostically useful and represent a potentially important therapeutic target.

Journal Article
TL;DR: The margin tree has accuracy that is competitive with other methods and offers additional interpretability in its putative grouping of the classes, and is compared to the closely related "all-pairs" support vector machine, and nearest centroids on a number of cancer microarray data sets.
Abstract: We propose a method for the classification of more than two classes, from high-dimensional features. Our approach is to build a binary decision tree in a top-down manner, using the optimal margin classifier at each split. We implement an exact greedy algorithm for this task, and compare its performance to less greedy procedures based on clustering of the matrix of pairwise margins. We compare the performance of the "margin tree" to the closely related "all-pairs" (one versus one) support vector machine, and nearest centroids on a number of cancer microarray data sets. We also develop a simple method for feature selection. We find that the margin tree has accuracy that is competitive with other methods and offers additional interpretability in its putative grouping of the classes.

Journal ArticleDOI
TL;DR: Polymorphisms in HIF1A were associated with development of stable exertional angina rather than acute MI as the initial clinical presentation of CAD.

Journal ArticleDOI
TL;DR: It was found that diffuse large B-cell lymphoma specimens showing higher local vascular endothelial growth factor expression showed correspondingly higher microvessel density, implying that lymphoma cells induce local tumor angiogenesis.
Abstract: Angiogenesis is known to play a major role in neoplasia, including hematolymphoid neoplasia. We assessed the relationships among angiogenesis and expression of vascular endothelial growth factor and its receptors in the context of clinically and biologically relevant subtypes of diffuse large B-cell lymphoma using immunohistochemical evaluation of tissue microarrays. We found that diffuse large B-cell lymphoma specimens showing higher local vascular endothelial growth factor expression showed correspondingly higher microvessel density, implying that lymphoma cells induce local tumor angiogenesis. In addition, local vascular endothelial growth factor expression was higher in those specimens showing higher expression of the receptors of the growth factor, suggesting an autocrine growth-promoting feedback loop. The germinal center-like and nongerminal center-like subtypes of diffuse large B-cell lymphoma were biologically and prognostically distinct. Interestingly, only in the more clinically aggressive nongerminal center-like subtype were microvessel densities significantly higher in specimens showing higher vascular endothelial growth factor expression; the same was true for the finding of higher vascular endothelial growth factor receptor-1 expression in conjunction with higher vascular endothelial growth factor expression. These differences may have important implications for the responsiveness of the two diffuse large B-cell lymphoma subtypes to anti-vascular endothelial growth factor and anti-angiogenic therapies.

Journal ArticleDOI
TL;DR: In this paper, the choice of predictor variables in large-scale linear models is discussed and the relationship between the Dantzig Selector (DS) and the Lasso algorithm is explored.
Abstract: 1. Introduction. This is a fascinating paper on an important topic: the choice of predictor variables in large-scale linear models. A previous paper in these pages attacked the same problem using the “LARS” algorithm (Efron, Hastie, Johnstone and Tibshirani [3]); actually three algorithms including the Lasso as middle case. There are tantalizing similarities between the Dantzig Selector (DS) and the LARS methods, but they are not the same and produce somewhat different models. We explore this relationship with the Lasso and LARS here. 2. Dantzig selector and the Lasso. The definition of the Dantzig selector (DS) in (1.7) can be re-expressed as

Journal ArticleDOI
16 Nov 2007-Blood
TL;DR: It is found that achieving a complete response/complete response unconfirmed (CR/CRu) to CVP and making an anti-idiotype antibody are 2 independent factors that each correlated with longer OS at 10 years.

Journal ArticleDOI
TL;DR: A method for the analysis of pathologic biology that unravels the disease characteristics of high dimensional data, disease-specific genomic analysis (DSGA), intended to precede standard techniques like clustering or class prediction, and enhance their performance and ability to detect disease.
Abstract: Motivation: Genomic high-throughput technology generates massive data, providing opportunities to understand countless facets of the functioning genome. It also raises profound issues in identifying data relevant to the biology being studied. Results: We introduce a method for the analysis of pathologic biology that unravels the disease characteristics of high dimensional data. The method, disease-specific genomic analysis (DSGA), is intended to precede standard techniques like clustering or class prediction, and enhance their performance and ability to detect disease. DSGA measures the extent to which the disease deviates from a continuous range of normal phenotypes, and isolates the aberrant component of data. In several microarray cancer datasets, we show that DSGA outperforms standard methods. We then use DSGA to highlight a novel subdivision of an important class of genes in breast cancer, the estrogen receptor (ER) cluster. We also identify new markers distinguishing ductal and lobular breast cancers. Although our examples focus on microarrays, DSGA generalizes to any high dimensional genomic/proteomic data. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Journal ArticleDOI
16 Nov 2007-Blood
TL;DR: The presence of improved survival in a cohort of patients whose lymphomas potentially depend on autocrine signaling via VEGFR1 suggests that dependence on this pathway may render patients susceptible to the effects of anthracycline-based therapy.

Journal ArticleDOI
TL;DR: In this article, the effective degrees of freedom of the lasso in the framework of Stein's unbiased risk estimation (SURE) was studied, and it was shown that the number of nonzero coefficients is an unbiased estimate for the degree of freedom, a conclusion that requires no special assumption on the predictors.
Abstract: We study the effective degrees of freedom of the lasso in the framework of Stein's unbiased risk estimation (SURE). We show that the number of nonzero coefficients is an unbiased estimate for the degrees of freedom of the lasso--a conclusion that requires no special assumption on the predictors. In addition, the unbiased estimator is shown to be asymptotically consistent. With these results on hand, various model selection criteria--$C_p$, AIC and BIC--are available, which, along with the LARS algorithm, provide a principled and efficient approach to obtaining the optimal lasso fit with the computational effort of a single ordinary least-squares fit.

Journal ArticleDOI
16 Nov 2007-Blood
TL;DR: OS of patients with FL managed at Stanford University significantly improved in the 1986–2003 eras, particularly among younger and advanced stage patients.

Journal ArticleDOI
TL;DR: This correction article not only describes what makes the published Table 5 incorrect, it also presents the correct Table 5.
Abstract: Following the publication of our recent article (Kapp et al., BMC Genomics 2006, 7:231), we (the authors) regrettably found several errors in the published Table 5. This correction article not only describes what makes the published Table 5 incorrect, it also presents the correct Table 5.

Journal ArticleDOI
16 Nov 2007-Blood
TL;DR: The prognostic value of LMO2 protein expression remains significant in the era of R-CHOP treatment and is recommended in all newly diagnosed DLBCL patients to confirm these results and eventually to optimize patient management.