Showing papers by "Robert Tibshirani published in 2007"

PDF

Open Access

Journal Article•DOI•

[...]

Jerome H. Friedman, Trevor Hastie, Holger Höfling, Robert Tibshirani

10 Aug 2007-arXiv: Computation

TL;DR: It is shown that coordinate descent is very competitive with the well-known LARS procedure in large lasso problems, can deliver a path of solutions efficiently, and can be applied to many other convex statistical problems such as the garotte and elastic net.

...read moreread less

Abstract: We consider ``one-at-a-time'' coordinate-wise descent algorithms for a class of convex optimization problems. An algorithm of this kind has been proposed for the $L_1$-penalized regression (lasso) in the literature, but it seems to have been largely ignored. Indeed, it seems that coordinate-wise algorithms are not often used in convex optimization. We show that this algorithm is very competitive with the well-known LARS (or homotopy) procedure in large lasso problems, and that it can be applied to related methods such as the garotte and elastic net. It turns out that coordinate-wise descent does not work in the ``fused lasso,'' however, so we derive a generalized algorithm that yields the solution in much less time that a standard convex optimizer. Finally, we generalize the procedure to the two-dimensional fused lasso, and demonstrate its performance on some image smoothing problems.

...read moreread less

1,785 citations

Journal Article•DOI•

Pathwise coordinate optimization

[...]

Jerome H. Friedman, Trevor Hastie, Holger Höfling, Robert Tibshirani

01 Dec 2007-The Annals of Applied Statistics

TL;DR: In this paper, coordinate-wise descent is used to solve the L1-penalized regression problem in the fused lasso problem, which is a non-separable problem in which coordinate descent does not work.

...read moreread less

Abstract: We consider “one-at-a-time” coordinate-wise descent algorithms for a class of convex optimization problems. An algorithm of this kind has been proposed for the L1-penalized regression (lasso) in the literature, but it seems to have been largely ignored. Indeed, it seems that coordinate-wise algorithms are not often used in convex optimization. We show that this algorithm is very competitive with the well-known LARS (or homotopy) procedure in large lasso problems, and that it can be applied to related methods such as the garotte and elastic net. It turns out that coordinate-wise descent does not work in the “fused lasso,” however, so we derive a generalized algorithm that yields the solution in much less time that a standard convex optimizer. Finally, we generalize the procedure to the two-dimensional fused lasso, and demonstrate its performance on some image smoothing problems. 1. Introduction. In this paper we consider statistical models that lead to convex optimization problems with inequality constraints. Typically, the optimization for these problems is carried out using a standard quadratic programming algorithm. The purpose of this paper is to explore “one-at-a-time” coordinate-wise descent algorithms for these problems. The equivalent of a coordinate descent algorithm has been proposed for the L1-penalized regression (lasso) in the literature, but it is not commonly used. Moreover, coordinate-wise algorithms seem too simple, and they are not often used in convex optimization, perhaps because they only work in specialized problems. We ourselves never appreciated the value of coordinate descent methods for convex statistical problems before working on this paper. In this paper we show that coordinate descent is very competitive with the wellknown LARS (or homotopy) procedure in large lasso problems, can deliver a path of solutions efficiently, and can be applied to many other convex statistical problems such as the garotte and elastic net. We then go on to explore a nonseparable problem in which coordinate-wise descent does not work—the “fused lasso.” We derive a generalized algorithm that yields the solution in much less time that a standard convex optimizer. Finally, we generalize the procedure to

...read moreread less

1,619 citations

Journal Article•DOI•

Classification and prediction of clinical Alzheimer's diagnosis based on plasma signaling proteins

[...]

Sandip Ray, Markus Britschgi¹, Charles Herbert, Yoshiko Takeda-Uchimura¹, Adam L. Boxer, Kaj Blennow², Leah Friedman¹, Douglas Galasko³, Marek Jutel⁴, Anna Karydas, Jeffrey Kaye⁵, Jerzy Leszek⁴, Bruce L. Miller, Lennart Minthon⁶, Joseph F. Quinn⁵, Gil D. Rabinovici, William H. Robinson¹, Marwan N. Sabbagh, Yuen T. So¹, D. Larry Sparks, Massimo Tabaton⁷, Jared R. Tinklenberg¹, Jerome A. Yesavage¹, Robert Tibshirani¹, Tony Wyss-Coray⁸, Tony Wyss-Coray¹ - Show less +22 more•Institutions (8)

Stanford University¹, Sahlgrenska University Hospital², University of California, San Diego³, Wrocław Medical University⁴, Oregon Health & Science University⁵, Lund University⁶, University of Genoa⁷, Veterans Health Administration⁸

01 Nov 2007-Nature Medicine

TL;DR: Biological analysis of the 18 proteins found in blood plasma points to systemic dysregulation of hematopoiesis, immune responses, apoptosis and neuronal support in presymptomatic Alzheimer's disease.

...read moreread less

Abstract: A molecular test for Alzheimer's disease could lead to better treatment and therapies. We found 18 signaling proteins in blood plasma that can be used to classify blinded samples from Alzheimer's and control subjects with close to 90% accuracy and to identify patients who had mild cognitive impairment that progressed to Alzheimer's disease 2-6 years later. Biological analysis of the 18 proteins points to systemic dysregulation of hematopoiesis, immune responses, apoptosis and neuronal support in presymptomatic Alzheimer's disease.

...read moreread less

1,038 citations

Journal Article•DOI•

On the “degrees of freedom” of the lasso

[...]

Hui Zou, Trevor Hastie¹, Robert Tibshirani¹•Institutions (1)

Stanford University¹

01 Oct 2007-Annals of Statistics

TL;DR: The number of nonzero coefficients is an unbiased estimate for the degrees of freedom of the lasso—a conclusion that requires no special assumption on the predictors and the unbiased estimator is shown to be asymptotically consistent.

...read moreread less

Abstract: We study the effective degrees of freedom of the lasso in the framework of Stein’s unbiased risk estimation (SURE). We show that the number of nonzero coefficients is an unbiased estimate for the degrees of freedom of the lasso—a conclusion that requires no special assumption on the predictors. In addition, the unbiased estimator is shown to be asymptotically consistent. With these results on hand, various model selection criteria—Cp, AIC and BIC—are available, which, along with the LARS algorithm, provide a principled and efficient approach to obtaining the optimal lasso fit with the computational effort of a single ordinary least-squares fit.

...read moreread less

894 citations

Journal Article•DOI•

On testing the significance of sets of genes

[...]

Bradley Efron, Robert Tibshirani

01 Jun 2007-The Annals of Applied Statistics

TL;DR: This paper discusses the problem of identifyingially expressed groups of genes from a microarray experiment and proposes two potential improvements to GSEA: the maxmean statistic for summarizing gene-sets, and restandardization for more accurate inferences.

...read moreread less

Abstract: This paper discusses the problem of identifying differentially expressed groups of genes from a microarray experiment. The groups of genes are externally defined, for example, sets of gene pathways derived from biological databases. Our starting point is the interesting Gene Set Enrichment Analysis (GSEA) procedure of Subramanian et al. [Proc. Natl. Acad. Sci. USA 102 (2005) 15545–15550]. We study the problem in some generality and propose two potential improvements to GSEA: the maxmean statistic for summarizing gene-sets, and restandardization for more accurate inferences. We discuss a variety of examples and extensions, including the use of gene-set scores for class predictions. We also describe a new R language package GSA that implements our ideas.

...read moreread less

730 citations

Journal Article•DOI•

Regularized linear discriminant analysis and its application in microarrays

[...]

Yaqian Guo¹, Trevor Hastie¹, Robert Tibshirani¹•Institutions (1)

Stanford University¹

01 Jan 2007-Biostatistics

TL;DR: Through both simulated data and real life data, it is shown that this method performs very well in multivariate classification problems, often outperforms the PAM method and can be as competitive as the support vector machines classifiers.

...read moreread less

Abstract: In this paper, we introduce a modified version of linear discriminant analysis, called the "shrunken centroids regularized discriminant analysis" (SCRDA). This method generalizes the idea of the "nearest shrunken centroids" (NSC) (Tibshirani and others, 2003) into the classical discriminant analysis. The SCRDA method is specially designed for classification problems in high dimension low sample size situations, for example, microarray data. Through both simulated data and real life data, it is shown that this method performs very well in multivariate classification problems, often outperforms the PAM method (using the NSC algorithm) and can be as competitive as the support vector machines classifiers. It is also suitable for feature elimination purpose and can be used as gene selection method. The open source R package for this method (named "rda") is available on CRAN (http://www.r-project.org) for download and testing.

...read moreread less

602 citations

Journal Article•DOI•

Notch signals positively regulate activity of the mTOR pathway in T-cell acute lymphoblastic leukemia

[...]

Steven M. Chan¹, Andrew P. Weng¹, Robert Tibshirani¹, Jon C. Aster¹, Jon C. Aster², Paul J. Utz¹ - Show less +2 more•Institutions (2)

Stanford University¹, Brigham and Women's Hospital²

01 Jul 2007-Blood

TL;DR: T-ALL cell growth was suppressed in a highly synergistic manner by simultaneous treatment with the mTOR inhibitor rapamycin and GSI, which represents a rational drug combination for treating this aggressive human malignancy.

...read moreread less

276 citations

Journal Article•DOI•

Forward stagewise regression and the monotone lasso

[...]

Trevor Hastie¹, Jonathan Taylor¹, Robert Tibshirani¹, Guenther Walther¹•Institutions (1)

Stanford University¹

02 May 2007-arXiv: Statistics Theory

TL;DR: The least angle regression and forward stagewise algorithms for solving penalized least squares regression problems are considered and a condition under which the coefficient paths of the lasso are monotone is studied, and hence the different algorithms coincide.

...read moreread less

Abstract: We consider the least angle regression and forward stagewise algorithms for solving penalized least squares regression problems. In Efron, Hastie, Johnstone & Tibshirani (2004) it is proved that the least angle regression algorithm, with a small modification, solves the lasso regression problem. Here we give an analogous result for incremental forward stagewise regression, showing that it solves a version of the lasso problem that enforces monotonicity. One consequence of this is as follows: while lasso makes optimal progress in terms of reducing the residual sum-of-squares per unit increase in $L_1$-norm of the coefficient $\beta$, forward stage-wise is optimal per unit $L_1$ arc-length traveled along the coefficient path. We also study a condition under which the coefficient paths of the lasso are monotone, and hence the different algorithms coincide. Finally, we compare the lasso and forward stagewise procedures in a simulation study involving a large number of correlated predictors.

...read moreread less

195 citations

Journal Article•DOI•

Forward Stagewise Regression and the Monotone Lasso

[...]

Trevor Hastie¹, Jonathan Taylor¹, Robert Tibshirani¹, Guenther Walther¹•Institutions (1)

Stanford University¹

01 Jan 2007-Electronic Journal of Statistics

TL;DR: In this paper, the authors consider the least angle regression and forward stagewise algorithms for solving penalized least squares regression problems, and show that the latter is a monotone version of the lasso.

...read moreread less

Abstract: We consider the least angle regression and forward stagewise algorithms for solving penalized least squares regression problems. In Efron et al. (2004) it is proven that the least angle regression algorithm, with a small modication, solves the lasso (L1 constrained) regression problem. Here we give an analogous result for incremental forward stagewise regression, showing that it ts a monotone version of the lasso. We also study a condition under which the coecien t paths of the lasso are monotone, and hence the dieren t algorithms all coincide. Finally, we compare the lasso and forward stagewise procedures in a simulation study involving a large number of correlated predictors.

...read moreread less

182 citations

Journal Article•DOI•

Averaged gene expressions for regression

[...]

Mee Young Park¹, Trevor Hastie², Robert Tibshirani²•Institutions (2)

Google¹, Stanford University²

01 Apr 2007-Biostatistics

TL;DR: By averaging the genes within the clusters obtained from hierarchical clustering, supergenes are defined and used to fit regression models, thereby attaining concise interpretation and accuracy in regression of DNA microarray data.

...read moreread less

Abstract: SUMMARY Although averaging is a simple technique, it plays an important role in reducing variance. We use this essential property of averaging in regression of the DNA microarray data, which poses the challenge of having far more features than samples. In this paper, we introduce a two-step procedure that combines (1) hierarchical clustering and (2) Lasso. By averaging the genes within the clusters obtained from hierarchical clustering, we define supergenes and use them to fit regression models, thereby attaining concise interpretation and accuracy. Our methods are supported with theoretical justifications and demonstrated on simulated and real data sets.

...read moreread less

179 citations

Journal Article•DOI•

Outlier sums for differential gene expression analysis

[...]

Robert Tibshirani¹, Trevor Hastie¹•Institutions (1)

Stanford University¹

01 Jan 2007-Biostatistics

TL;DR: A method for detecting genes that, in a disease group, exhibit unusually high gene expression in some but not all samples, which can be particularly useful in cancer studies, where mutations that can amplify or turn off gene expression often occur in only a minority of samples.

...read moreread less

Abstract: We propose a method for detecting genes that, in a disease group, exhibit unusually high gene expression in some but not all samples. This can be particularly useful in cancer studies, where mutations that can amplify or turn off gene expression often occur in only a minority of samples. In real and simulated examples, the new method often exhibits lower false discovery rates than simple t-statistic thresholding. We also compare our approach to the recent cancer profile outlier analysis proposal of Tomlins and others (2005).

...read moreread less

Journal Article•DOI•

Are clusters found in one dataset present in another dataset

[...]

Amy V. Kapp¹, Robert Tibshirani¹•Institutions (1)

Stanford University¹

01 Jan 2007-Biostatistics

TL;DR: The connection between reproducibility and prediction accuracy is taken advantage to develop a validation procedure for clusters found in datasets independent of the one in which they were characterized and the IGP is the best measure of prediction accuracy.

...read moreread less

Abstract: In many microarray studies, a cluster defined on one dataset is sought in an independent dataset. If the cluster is found in the new dataset, the cluster is said to be "reproducible" and may be biologically significant. Classifying a new datum to a previously defined cluster can be seen as predicting which of the previously defined clusters is most similar to the new datum. If the new data classified to a cluster are similar, molecularly or clinically, to the data already present in the cluster, then the cluster is reproducible and the corresponding prediction accuracy is high. Here, we take advantage of the connection between reproducibility and prediction accuracy to develop a validation procedure for clusters found in datasets independent of the one in which they were characterized. We define a cluster quality measure called the "in-group proportion" (IGP) and introduce a general procedure for individually validating clusters. Using simulations and real breast cancer datasets, the IGP is compared to four other popular cluster quality measures (homogeneity score, separation score, silhouette width, and weighted average discrepant pairs score). Moreover, simulations and the real breast cancer datasets are used to compare the four versions of the validation procedure which all use the IGP, but differ in the way in which the null distributions are generated. We find that the IGP is the best measure of prediction accuracy, and one version of the validation procedure is the more widely applicable than the other three. An implementation of this algorithm is in a package called "clusterRepro" available through The Comprehensive R Archive Network (http://cran.r-project.org).

...read moreread less

Journal Article•DOI•

Comment on "The consensus coding sequences of human breast and colorectal cancers".

[...]

Gad Getz¹, Holger Höfling², Jill P. Mesirov¹, Todd R. Golub, Matthew Meyerson³, Matthew Meyerson¹, Robert Tibshirani², Eric S. Lander¹, Eric S. Lander³ - Show less +5 more•Institutions (3)

Massachusetts Institute of Technology¹, Stanford University², Harvard University³

14 Sep 2007-Science

TL;DR: Although the biological methodology in Sjöblom et al. is sound, more samples are needed to achieve sufficient power, and few genes with significantly elevated mutation rates remain.

...read moreread less

Abstract: Sjoblom et al (Research Article, 13 October 2006, p 268) reported nearly 200 novel cancer genes said to have a 90% probability of being involved in colon or breast cancer However, their analysis raises two statistical concerns When these concerns are addressed, few genes with significantly elevated mutation rates remain Although the biological methodology in Sjoblom et al is sound, more samples are needed to achieve sufficient power

...read moreread less

Journal Article•DOI•

Extracting binary signals from microarray time-course data

[...]

Debashis Sahoo¹, David L. Dill¹, Robert Tibshirani¹, Sylvia K. Plevritis¹•Institutions (1)

Stanford University¹

01 Jun 2007-Nucleic Acids Research

TL;DR: A new method is presented by identifying genes that undergo abrupt transitions in expression level, and the time at which the transitions occur, by identifying groups of genes that change in similar ways and at similar times.

...read moreread less

Abstract: This article presents a new method for analyzing microarray time courses by identifying genes that undergo abrupt transitions in expression level, and the time at which the transitions occur. The algorithm matches the sequence of expression levels for each gene against temporal patterns having one or two transitions between two expression levels. The algorithm reports a P-value for the matching pattern of each gene, and a global false discovery rate can also be computed. After matching, genes can be sorted by the direction and time of transitions. Genes can be partitioned into sets based on the direction and time of change for further analysis, such as comparison with Gene Ontology annotations or binding site motifs. The method is evaluated on simulated and actual time-course data. On microarray data for budding yeast, it is shown that the groups of genes that change in similar ways and at similar times have significant and relevant Gene Ontology annotations.

...read moreread less

Journal Article•DOI•

Expression and Prognostic Significance of a Panel of Tissue Hypoxia Markers in Head-and-Neck Squamous Cell Carcinomas

[...]

Q.T. Le¹, Christina S. Kong¹, Phillip W. Lavori¹, Kenneth J. O'Byrne², Janine T. Erler¹, Xin Huang¹, Yujin Chen¹, Hongbin Cao¹, Robert Tibshirani¹, Nick Denko¹, Amato J. Giaccia¹, Albert C. Koong¹ - Show less +8 more•Institutions (2)

Stanford University¹, University Hospitals of Leicester NHS Trust²

01 Sep 2007-International Journal of Radiation Oncology Biology Physics

TL;DR: A panel of hypoxia-related tissue markers that correlates with treatment outcomes in HNSCC is identified and validated, with a strong correlation between lysyl oxidase, ephrin A1, and galectin-1 and CA IX staining.

...read moreread less

Abstract: Purpose To investigate the expression pattern of hypoxia-induced proteins identified as being involved in malignant progression of head-and-neck squamous cell carcinoma (HNSCC) and to determine their relationship to tumor pO 2 and prognosis. Methods and Materials We performed immunohistochemical staining of hypoxia-induced proteins (carbonic anhydrase IX [CA IX], BNIP3L, connective tissue growth factor, osteopontin, ephrin A1, hypoxia inducible gene-2, dihydrofolate reductase, galectin-1, IκB kinase β, and lysyl oxidase) on tumor tissue arrays of 101 HNSCC patients with pretreatment pO 2 measurements. Analysis of variance and Fisher's exact tests were used to evaluate the relationship between marker expression, tumor pO 2 , and CA IX staining. Cox proportional hazard model and log–rank tests were used to determine the relationship between markers and prognosis. Results Osteopontin expression correlated with tumor pO 2 (Eppendorf measurements) ( p = 0.04). However, there was a strong correlation between lysyl oxidase, ephrin A1, and galectin-1 and CA IX staining. These markers also predicted for cancer-specific survival and overall survival on univariate analysis. A hypoxia score of 0–5 was assigned to each patient, on the basis of the presence of strong staining for these markers, whereby a higher score signifies increased marker expression. On multivariate analysis, increasing hypoxia score was an independent prognostic factor for cancer-specific survival ( p = 0.015) and was borderline significant for overall survival ( p = 0.057) when adjusted for other independent predictors of outcomes (hemoglobin and age). Conclusions We identified a panel of hypoxia-related tissue markers that correlates with treatment outcomes in HNSCC. Validation of these markers will be needed to determine their utility in identifying patients for hypoxia-targeted therapy.

...read moreread less

Journal Article•DOI•

"Pre-conditioning" for feature selection and regression in high-dimensional problems

[...]

Debashis Paul, Eric Bair, Trevor Hastie, Robert Tibshirani

28 Mar 2007-arXiv: Statistics Theory

TL;DR: In this article, the authors proposed a method for variable selection that first estimates the regression function, yielding a "pre-conditioned" response variable, and then applies a standard procedure such as forward stepwise selection or the LASSO to the preconditioned response variable.

...read moreread less

Abstract: We consider regression problems where the number of predictors greatly exceeds the number of observations. We propose a method for variable selection that first estimates the regression function, yielding a "pre-conditioned" response variable. The primary method used for this initial regression is supervised principal components. Then we apply a standard procedure such as forward stepwise selection or the LASSO to the pre-conditioned response variable. In a number of simulated and real data examples, this two-step procedure outperforms forward stepwise selection or the usual LASSO (applied directly to the raw outcome). We also show that under a certain Gaussian latent variable model, application of the LASSO to the pre-conditioned response variable is consistent as the number of predictors and observations increases. Moreover, when the observational noise is rather large, the suggested procedure can give a more accurate estimate than LASSO. We illustrate our method on some real problems, including survival analysis with microarray data.

...read moreread less

Journal Article•DOI•

Paraffin-based 6-gene model predicts outcome in diffuse large B-cell lymphoma patients treated with R-CHOP.

[...]

Raquel Malumbres¹, Jun Chen¹, Robert Tibshirani², Nathalie A. Johnson, Laurie H. Sehn, Yaso Natkunam², Javier Briones³, Ranjana H. Advani², Joseph M. Connors, Gerald E. Byrne¹, Ronald Levy², Randy D. Gascoyne, Izidore S. Lossos¹ - Show less +9 more•Institutions (3)

University of Miami¹, Stanford University², Autonomous University of Barcelona³

16 Nov 2007-Blood

TL;DR: It is demonstrated that the prognostic value of the 6-gene model remains significant in the era of R-CHOP treatment and that the model can be applied to routine FFPE tissue from initial diagnostic biopsies.

...read moreread less

Posted Content•

Sparse inverse covariance estimation with the lasso

[...]

Jerome H. Friedman¹, Trevor Hastie¹, Robert Tibshirani¹•Institutions (1)

Stanford University¹

27 Aug 2007-arXiv: Methodology

TL;DR: A simple algorithm, using a coordinate descent procedure for the lasso, is developed that solves a 1000 node problem in at most a minute, and is 30 to 4000 times faster than competing methods.

...read moreread less

Abstract: We consider the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. Using a coordinate descent procedure for the lasso, we develop a simple algorithm| the Graphical Lasso| that is remarkably fast: it solves a 1000 node problem (» 500; 000 parameters) in at most a minute, and is 30 to 4000 times faster than competing methods. It also provides a conceptual link between the exact problem and the approximation suggested by Meinshausen & B˜ uhlmann (2006). We illustrate the method on some cell-signaling data from proteomics.

...read moreread less

Journal Article•DOI•

Oncogenic regulators and substrates of the anaphase promoting complex/cyclosome are frequently overexpressed in malignant tumors.

[...]

Norman L. Lehman¹, Robert Tibshirani¹, Jerry Y. Hsu¹, Yasodha Natkunam¹, Brent T. Harris¹, Robert B. West¹, Marilyn Masek¹, Kelli Montgomery¹, Matt van de Rijn¹, Peter K. Jackson¹ - Show less +6 more•Institutions (1)

Stanford University¹

01 May 2007-American Journal of Pathology

TL;DR: Clustering and statistical analysis supports the finding that malignant tumors generally show broad mis regulation of mitotic APC/C substrates not seen in benign tumors, suggesting that a "mitotic profile" in tumors may result from misregulation of the APC-C destruction pathway.

...read moreread less

Abstract: The fidelity of cell division is dependent on the accumulation and ordered destruction of critical protein regulators. By triggering the appropriately timed, ubiquitin-dependent proteolysis of the mitotic regulatory proteins securin, cyclin B, aurora A kinase, and polo-like kinase 1, the anaphase promoting complex/cyclosome (APC/C) ubiquitin ligase plays an essential role in maintaining genomic stability. Misexpression of these APC/C substrates, individually, has been implicated in genomic instability and cancer. However, no comprehensive survey of the extent of their misregulation in tumors has been performed. Here, we analyzed more than 1600 benign and malignant tumors by immunohistochemical staining of tissue microarrays and found frequent overexpression of securin, polo-like kinase 1, aurora A, and Skp2 in malignant tumors. Positive and negative APC/C regulators, Cdh1 and Emi1, respectively, were also more strongly expressed in malignant versus benign tumors. Clustering and statistical analysis supports the finding that malignant tumors generally show broad misregulation of mitotic APC/C substrates not seen in benign tumors, suggesting that a “mitotic profile” in tumors may result from misregulation of the APC/C destruction pathway. This profile of misregulated mitotic APC/C substrates and regulators in malignant tumors suggests that analysis of this pathway may be diagnostically useful and represent a potentially important therapeutic target.

...read moreread less

Journal Article•

Margin Trees for High-dimensional Classification

[...]

Robert Tibshirani, Trevor Hastie

01 May 2007-Journal of Machine Learning Research

TL;DR: The margin tree has accuracy that is competitive with other methods and offers additional interpretability in its putative grouping of the classes, and is compared to the closely related "all-pairs" support vector machine, and nearest centroids on a number of cancer microarray data sets.

...read moreread less

Abstract: We propose a method for the classification of more than two classes, from high-dimensional features. Our approach is to build a binary decision tree in a top-down manner, using the optimal margin classifier at each split. We implement an exact greedy algorithm for this task, and compare its performance to less greedy procedures based on clustering of the matrix of pairwise margins. We compare the performance of the "margin tree" to the closely related "all-pairs" (one versus one) support vector machine, and nearest centroids on a number of cancer microarray data sets. We also develop a simple method for feature selection. We find that the margin tree has accuracy that is competitive with other methods and offers additional interpretability in its putative grouping of the classes.

...read moreread less

Journal Article•DOI•

Polymorphisms in hypoxia inducible factor 1 and the initial clinical presentation of coronary disease.

[...]

Mark A. Hlatky¹, Thomas Quertermous¹, Derek B. Boothroyd¹, James R. Priest¹, Alec J. Glassford¹, Richard M. Myers¹, Stephen P. Fortmann¹, Carlos Iribarren², Holly K. Tabor¹, Themistocles L. Assimes¹, Robert Tibshirani¹, Alan S. Go², Alan S. Go³ - Show less +9 more•Institutions (3)

Stanford University¹, Kaiser Permanente², University of California, San Francisco³

01 Dec 2007-American Heart Journal

TL;DR: Polymorphisms in HIF1A were associated with development of stable exertional angina rather than acute MI as the initial clinical presentation of CAD.

...read moreread less

Journal Article•DOI•

Microvessel Density and Expression of Vascular Endothelial Growth Factor and Its Receptors in Diffuse Large B-Cell Lymphoma Subtypes

[...]

Dita Gratzinger¹, Shuchun Zhao¹, Robert J. Marinelli¹, Amy V. Kapp¹, Robert Tibshirani¹, Anne Hammer², Stephen Hamilton-Dutoit², Yasodha Natkunam¹ - Show less +4 more•Institutions (2)

Stanford University¹, Aarhus University Hospital²

01 Apr 2007-American Journal of Pathology

TL;DR: It was found that diffuse large B-cell lymphoma specimens showing higher local vascular endothelial growth factor expression showed correspondingly higher microvessel density, implying that lymphoma cells induce local tumor angiogenesis.

...read moreread less

Abstract: Angiogenesis is known to play a major role in neoplasia, including hematolymphoid neoplasia. We assessed the relationships among angiogenesis and expression of vascular endothelial growth factor and its receptors in the context of clinically and biologically relevant subtypes of diffuse large B-cell lymphoma using immunohistochemical evaluation of tissue microarrays. We found that diffuse large B-cell lymphoma specimens showing higher local vascular endothelial growth factor expression showed correspondingly higher microvessel density, implying that lymphoma cells induce local tumor angiogenesis. In addition, local vascular endothelial growth factor expression was higher in those specimens showing higher expression of the receptors of the growth factor, suggesting an autocrine growth-promoting feedback loop. The germinal center-like and nongerminal center-like subtypes of diffuse large B-cell lymphoma were biologically and prognostically distinct. Interestingly, only in the more clinically aggressive nongerminal center-like subtype were microvessel densities significantly higher in specimens showing higher vascular endothelial growth factor expression; the same was true for the finding of higher vascular endothelial growth factor receptor-1 expression in conjunction with higher vascular endothelial growth factor expression. These differences may have important implications for the responsiveness of the two diffuse large B-cell lymphoma subtypes to anti-vascular endothelial growth factor and anti-angiogenic therapies.

...read moreread less

Journal Article•DOI•

Discussion: The Dantzig selector: Statistical estimation when p is much larger than n

[...]

Bradley Efron, Trevor Hastie, Robert Tibshirani

01 Dec 2007-Annals of Statistics

TL;DR: In this paper, the choice of predictor variables in large-scale linear models is discussed and the relationship between the Dantzig Selector (DS) and the Lasso algorithm is explored.

...read moreread less

Abstract: 1. Introduction. This is a fascinating paper on an important topic: the choice of predictor variables in large-scale linear models. A previous paper in these pages attacked the same problem using the “LARS” algorithm (Efron, Hastie, Johnstone and Tibshirani [3]); actually three algorithms including the Lasso as middle case. There are tantalizing similarities between the Dantzig Selector (DS) and the LARS methods, but they are not the same and produce somewhat different models. We explore this relationship with the Lasso and LARS here. 2. Dantzig selector and the Lasso. The definition of the Dantzig selector (DS) in (1.7) can be re-expressed as

...read moreread less

Journal Article•DOI•

Anti-idiotype antibody response after vaccination correlates with better overall survival in follicular lymphoma.

[...]

Weiyun Z. Ai¹, Robert Tibshirani², Behnaz Taidi², Debra K. Czerwinski², Ronald Levy² - Show less +1 more•Institutions (2)

University of California, San Francisco¹, Stanford University²

16 Nov 2007-Blood

TL;DR: It is found that achieving a complete response/complete response unconfirmed (CR/CRu) to CVP and making an anti-idiotype antibody are 2 independent factors that each correlated with longer OS at 10 years.

...read moreread less

Journal Article•DOI•

Disease-specific genomic analysis

[...]

Monica Nicolau¹, Robert Tibshirani¹, Anne Lise Børresen-Dale, Stefanie S. Jeffrey¹•Institutions (1)

Stanford University¹

15 Mar 2007-Bioinformatics

TL;DR: A method for the analysis of pathologic biology that unravels the disease characteristics of high dimensional data, disease-specific genomic analysis (DSGA), intended to precede standard techniques like clustering or class prediction, and enhance their performance and ability to detect disease.

...read moreread less

Abstract: Motivation: Genomic high-throughput technology generates massive data, providing opportunities to understand countless facets of the functioning genome. It also raises profound issues in identifying data relevant to the biology being studied. Results: We introduce a method for the analysis of pathologic biology that unravels the disease characteristics of high dimensional data. The method, disease-specific genomic analysis (DSGA), is intended to precede standard techniques like clustering or class prediction, and enhance their performance and ability to detect disease. DSGA measures the extent to which the disease deviates from a continuous range of normal phenotypes, and isolates the aberrant component of data. In several microarray cancer datasets, we show that DSGA outperforms standard methods. We then use DSGA to highlight a novel subdivision of an important class of genes in breast cancer, the estrogen receptor (ER) cluster. We also identify new markers distinguishing ductal and lobular breast cancers. Although our examples focus on microarrays, DSGA generalizes to any high dimensional genomic/proteomic data. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

...read moreread less

Journal Article•DOI•

Prognostic Significance of VEGF, VEGF Receptors, and Microvessel Density in Diffuse Large B Cell Lymphoma Treated with Anthracycline-Based Chemotherapy.

[...]

Dita Gratzinger¹, Shuchun Zhao¹, Robert Tibshirani¹, Eric D. Hsi², Christine P. Hans³, Brad Pohlman², Martin Bast³, Abraham Avigdor⁴, Ginette Schiby⁴, Arnon Nagler⁴, Gerald E. Byrne⁵, Izidore S. Lossos⁵, Yasodha Natkunam¹ - Show less +9 more•Institutions (5)

Stanford University¹, Cleveland Clinic², University of Nebraska Medical Center³, Sheba Medical Center⁴, University of Miami⁵

16 Nov 2007-Blood

TL;DR: The presence of improved survival in a cohort of patients whose lymphomas potentially depend on autocrine signaling via VEGFR1 suggests that dependence on this pathway may render patients susceptible to the effects of anthracycline-based therapy.

...read moreread less

Journal Article•DOI•

On the "degrees of freedom" of the lasso

[...]

Hui Zou, Trevor Hastie¹, Robert Tibshirani¹•Institutions (1)

Stanford University¹

06 Dec 2007-arXiv: Statistics Theory

TL;DR: In this article, the effective degrees of freedom of the lasso in the framework of Stein's unbiased risk estimation (SURE) was studied, and it was shown that the number of nonzero coefficients is an unbiased estimate for the degree of freedom, a conclusion that requires no special assumption on the predictors.

...read moreread less

Abstract: We study the effective degrees of freedom of the lasso in the framework of Stein's unbiased risk estimation (SURE). We show that the number of nonzero coefficients is an unbiased estimate for the degrees of freedom of the lasso--a conclusion that requires no special assumption on the predictors. In addition, the unbiased estimator is shown to be asymptotically consistent. With these results on hand, various model selection criteria--$C_p$, AIC and BIC--are available, which, along with the LARS algorithm, provide a principled and efficient approach to obtaining the optimal lasso fit with the computational effort of a single ordinary least-squares fit.

...read moreread less

Journal Article•DOI•

Survival in Follicular Lymphoma: The Stanford Experience, 1960–2003.

[...]

Daryl Tan¹, Saul A. Rosenberg¹, Ronald Levy¹, Philip W. Lavori¹, Robert Tibshirani¹, Richard T. Hoppe¹, Roger A. Warnke¹, Ranjana H. Advani¹, Yasodha Natkunam¹, Alan Yuen¹, Sandra J. Horning¹ - Show less +7 more•Institutions (1)

Stanford University¹

16 Nov 2007-Blood

TL;DR: OS of patients with FL managed at Stanford University significantly improved in the 1986–2003 eras, particularly among younger and advanced stage patients.

...read moreread less

Journal Article•DOI•

Erratum To: Discovery and validation of breast cancer subtypes

[...]

Amy V. Kapp¹, Stefanie S. Jeffrey¹, Anita Langerød², Anne Lise Børresen-Dale², Anne Lise Børresen-Dale³, Wonshik Han⁴, Dong Young Noh⁴, Ida R. K. Bukholm³, Ida R. K. Bukholm⁵, Monica Nicolau¹, Patrick O. Brown¹, Robert Tibshirani¹ - Show less +8 more•Institutions (5)

Stanford University¹, Rikshospitalet–Radiumhospitalet², University of Oslo³, New Generation University College⁴, Akershus University Hospital⁵

13 Apr 2007-BMC Genomics

TL;DR: This correction article not only describes what makes the published Table 5 incorrect, it also presents the correct Table 5.

...read moreread less

Abstract: Following the publication of our recent article (Kapp et al., BMC Genomics 2006, 7:231), we (the authors) regrettably found several errors in the published Table 5. This correction article not only describes what makes the published Table 5 incorrect, it also presents the correct Table 5.

...read moreread less

Journal Article•DOI•

LMO2 Protein Expression Predicts Survival in Patients with Diffuse Large B-Cell Lymphoma in the Pre- and Post-Rituximab Treatment Eras.

[...]

Yasodha Natkunam¹, Pedro Farinha², Eric D. Hsi³, Christine P. Hans⁴, Robert Tibshirani¹, Laurie H. Sehn², Joseph M. Connors², Shuchun Zhao¹, Brad Pohlman³, John J. Spinelli², Martin Bast⁴, Arnon Nagler⁵, Ronald Levy¹, Randy D. Gascoyne², Izidore S. Lossos⁶ - Show less +11 more•Institutions (6)

Stanford University¹, BC Cancer Agency², Cleveland Clinic³, University of Nebraska Medical Center⁴, Sheba Medical Center⁵, University of Miami⁶

16 Nov 2007-Blood

TL;DR: The prognostic value of LMO2 protein expression remains significant in the era of R-CHOP treatment and is recommended in all newly diagnosed DLBCL patients to confirm these results and eventually to optimize patient management.

...read moreread less