Home
/
Authors
/
Robert Tibshirani

Author

Robert Tibshirani

Other affiliations: University of Toronto, University of California, University of Nebraska Medical Center ...read more

Bio: Robert Tibshirani is an academic researcher from Stanford University. The author has contributed to research in topics: Lasso (statistics) & Elastic net regularization. The author has an hindex of 147, co-authored 593 publications receiving 326580 citations. Previous affiliations of Robert Tibshirani include University of Toronto & University of California.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

The Lasso for Linear Models

[...]

Trevor Hastie, Robert Tibshirani, Martin J. Wainwright

07 May 2015

3 citations

Posted Content•

A Component Lasso

[...]

Nadine Hussami¹, Robert Tibshirani¹•Institutions (1)

Stanford University¹

18 Nov 2013-arXiv: Machine Learning

TL;DR: In this paper, a new sparse regression method called the component lasso is proposed, which uses the connected-components structure of the sample covariance matrix to split the problem into smaller ones and then solves the subproblems separately, obtaining a coefficient vector for each one.

...read moreread less

Abstract: We propose a new sparse regression method called the component lasso, based on a simple idea. The method uses the connected-components structure of the sample covariance matrix to split the problem into smaller ones. It then solves the subproblems separately, obtaining a coefficient vector for each one. Then, it uses non-negative least squares to recombine the different vectors into a single solution. This step is useful in selecting and reweighting components that are correlated with the response. Simulated and real data examples show that the component lasso can outperform standard regression methods such as the lasso and elastic net, achieving a lower mean squared error as well as better support recovery.

...read moreread less

3 citations

Proceedings Article•DOI•

Abstract LB-343: Development of plasma cell-free DNA (cfDNA) assays for early cancer detection: first insights from the Circulating Cell-Free Genome Atlas Study (CCGA)

[...]

Alex Aravanis, Geoffrey R. Oxnard¹, Tara Maddala, Earl Hubbell, Oliver Venn, Arash Jamshidi, Ling Shen, Hamed Amini, John A. Beausang, Craig Betts, Daniel Civello, Konstantin Davydov, Saniya Fazullina, Darya Filippova, Sante Gnerre, Samuel Gross, Chenlu Hou, Roger Jiang, Byoungsok Jung, Kathryn N. Kurtzman, Collin Melton, Shivani Nautiyal, Jonathan Newman, Joshua Newman, Cosmos Nicolaou, Richard P. Rava, Onur Sakarya, Ravi Vijaya Satya, Seyedmehdi Shojaee, Kristan Steffen, Anton Valouev, Hui Xu, Jeanne Yue, Nan Zhang, José Baselga², Rosanna Lapham, Daron G. Davis³, David J. Smith, Donald A. Richards⁴, Michael V. Seiden, Charles Swanton⁵, Timothy J. Yeatman, Robert Tibshirani⁶, Christina Curtis⁶, Sylvia K. Plevritis⁶, Richard J. Williams, Eric A. Klein⁷, Anne-Renee Hartman, Minetta C. Liu⁸ - Show less +45 more•Institutions (8)

Harvard University¹, Memorial Sloan Kettering Cancer Center², Baptist Health³, Texas Oncology⁴, Francis Crick Institute⁵, Stanford University⁶, Cleveland Clinic⁷, Mayo Clinic⁸

01 Jul 2018-Cancer Research

TL;DR: A consistent “cancer-like” signal was observed in 99% specificity for invasive cancer, and support the promise of cfDNA assay for early cancer detection, as well as learnings from multiple cfDNA assays reported here.

...read moreread less

Abstract: CCGA [NCT02889978] is the largest study of cfDNA-based early cancer detection; the first CCGA learnings from multiple cfDNA assays are reported here. This prospective, multi-center, observational study has enrolled 10,012 of 15,000 demographically-balanced participants at 141 sites. Blood was collected from participants with newly diagnosed therapy-naive cancer (C, case) and participants without a diagnosis of cancer (noncancer [NC], control) as defined at enrollment. This preplanned substudy included 878 cases, 580 controls, and 169 assay controls (n=1627) across 20 tumor types and all clinical stages. All samples were analyzed by: 1) Paired cfDNA and white blood cell (WBC)-targeted sequencing (60,000X, 507 gene panel); a joint caller removed WBC-derived somatic variants and residual technical noise; 2) Paired cfDNA and WBC whole-genome sequencing (WGS; 35X); a novel machine learning algorithm generated cancer-related signal scores; joint analysis identified shared events; and 3) cfDNA whole-genome bisulfite sequencing (WGBS; 34X); normalized scores were generated using abnormally methylated fragments. In the targeted assay, non-tumor WBC-matched cfDNA somatic variants (SNVs/indels) accounted for 76% of all variants in NC and 65% in C. Consistent with somatic mosaicism (i.e., clonal hematopoiesis), WBC-matched variants increased with age; several were non-canonical loss-of-function mutations not previously reported. After WBC variant removal, canonical driver somatic variants were highly specific to C (e.g., in EGFR and PIK3CA, 0 NC had variants vs 11 and 30, respectively, of C). Similarly, of 8 NC with somatic copy number alterations (SCNAs) detected with WGS, 4 were derived from WBCs. WGBS data revealed informative hyper- and hypo-fragment level CpGs (1:2 ratio); a subset was used to calculate methylation scores. A consistent “cancer-like” signal was observed in 99% specificity for invasive cancer, and support the promise of cfDNA assay for early cancer detection. Additional data will be presented on detected plasma:tissue variant concordance and on multi-assay modeling. Citation Format: Alexander A. Aravanis, Geoffrey R. Oxnard, Tara Maddala, Earl Hubbell, Oliver Venn, Arash Jamshidi, Ling Shen, Hamed Amini, John A. Beausang, Craig Betts, Daniel Civello, Konstantin Davydov, Saniya Fazullina, Darya Filippova, Sante Gnerre, Samuel Gross, Chenlu Hou, Roger Jiang, Byoungsok Jung, Kathryn Kurtzman, Collin Melton, Shivani Nautiyal, Jonathan Newman, Joshua Newman, Cosmos Nicolaou, Richard Rava, Onur Sakarya, Ravi Vijaya Satya, Seyedmehdi Shojaee, Kristan Steffen, Anton Valouev, Hui Xu, Jeanne Yue, Nan Zhang, Jose Baselga, Rosanna Lapham, Daron G. Davis, David Smith, Donald Richards, Michael V. Seiden, Charles Swanton, Timothy J. Yeatman, Robert Tibshirani, Christina Curtis, Sylvia K. Plevritis, Richard Williams, Eric Klein, Anne-Renee Hartman, Minetta C. Liu. Development of plasma cell-free DNA (cfDNA) assays for early cancer detection: first insights from the Circulating Cell-Free Genome Atlas Study (CCGA) [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2018; 2018 Apr 14-18; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2018;78(13 Suppl):Abstract nr LB-343.

...read moreread less

3 citations

Book•

Regression methods for microarray data

[...]

Robert Tibshirani¹, Ron Yu¹•Institutions (1)

Stanford University¹

01 Jan 2005

TL;DR: Two new methods, based on lasso, that can produce sparse, interpretable regression models that relate clusters of co-expressed genes to a quantitative phenotype are proposed, and a need for supervised clustering of genes is discussed, that is, the phenotype ought to have an influence on how genes are clustered.

...read moreread less

Abstract: In the past decade, DNA and oligonucleotide microarray technology has been developed, allowing gene expression levels to be measured on a genome-wide scale. Use of this massive amount of molecular information appears to be promising for discovering genetic networks. Classification based on microarray experiments has been studied extensively. In comparison, microarray gene expression data has been analyzed less frequently in a regression set-up. From a statistical point of view, the challenge with analyzing microarray gene expression data is due to the very large number of genes, which far exceeds the sample size, i.e., the so-called “large p, small n” scenario. The lasso (least absolute shrinkage and selection operator) method is a promising regression method that incorporates automatic variable selection by imposing an L1 penalty on the regression coefficients. However the lasso method has its limitations in the “large p, small n” scenario. When p > n, the lasso method can select up to n variables before it saturates. And the lasso method does not offer a “grouped selection” effect. Therefore we propose two new methods, based on lasso, that are particularly suitable for microarray data regression analysis. The methods can produce sparse, interpretable regression models that relate clusters of co-expressed genes to a quantitative phenotype. Our methods are tested on simulated data sets as well as real microarray data sets. Besides the proposal of novel regression methods, we also propose quantitative definitions for evaluating the strength of the “grouped variable” effect in fitted regression models. The new definitions allow us to compare regression models quantitatively. We then discuss a need for supervised clustering of genes, that is, the phenotype ought to have an influence on how genes are clustered. One potential approach is to re-define the distances between pairs of genes by incorporating the phenotype into the definition of the new distance metric.

...read moreread less

3 citations

Journal Article•DOI•

Estimation of relaxation time distributions in magnetic resonance imaging

[...]

Edward Susko¹, Michael Bronskill², Simon J. Graham², Robert Tibshirani³•Institutions (3)

Dalhousie University¹, Sunnybrook Health Sciences Centre², Stanford University³

01 Sep 2001-Canadian Journal of Statistics-revue Canadienne De Statistique

TL;DR: In this paper, the authors developed a new technique to estimate the integral of the distribution of T2 relaxation time without imposing any constraint other than the monotonicity of the underlying cumulative relaxation time distribution.

...read moreread less

Abstract: Magnetic resonance imaging techniques can be used to measure some biophysical properties of tissue. In this context, the T2 relaxation time is an important parameter for soft-tissue contrast. The authors develop a new technique to estimate the integral of the distribution of T2 relaxation time without imposing any constraint other than the monotonicity of the underlying cumulative relaxation time distribution. They explore the properties of the estimation and its applications for the analysis of breast tissue data. As they show, an extension of linear discriminant analysis is found to distinguish well between two classes of breast tissue. Estimation de la loi du temps de decontraction en imagerie par resonance magnetique Les techniques d'imagerie par resonance magnetique permettent de mesurer certaines proprietes biophysiques des tissus. Dans ce contexte, le temps de decontraction T2 est un parametre important pour l'identification des tissus mous. Les auteurs proposent une nouvelle technique d'estimation de l'integrate de la loi du temps de decontraction T2 sans imposer d'autres contraintes que la monotonicite de la fonction de repartition de la variable sous-jacente. Ils explorent les proprietes de l'estimateur et montrent son utilite dans l'analyse de tissus mammaires. Comme ils le font valoir, une generalisation de l'analyse discriminante lineaire permet de distinguer nettement entre deux types de tissus mammaires.

...read moreread less

2 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
…
105
106
107
108
109
110
111
…
112
113
114
115
116
117
118
119
120
121
122
123
124

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

Scikit-learn: Machine Learning in Python

[...]

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel¹, Peter Prettenhofer², Ron Weiss³, Vincent Dubourg, Jake Vanderplas⁴, Alexandre Passos⁵, David Cournapeau, Matthieu Brucher⁶, Matthieu Perrot, Edouard Duchesnay - Show less +12 more•Institutions (6)

Kobe University¹, Bauhaus University, Weimar², Google³, University of Washington⁴, University of Massachusetts Amherst⁵, Total S.A.⁶

01 Feb 2011-Journal of Machine Learning Research

TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.

...read moreread less

Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.

...read moreread less

47,974 citations

Journal Article•DOI•

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

[...]

Michael I. Love¹, Michael I. Love², Wolfgang Huber, Simon Anders•Institutions (2)

Max Planck Society¹, Harvard University²

05 Dec 2014-Genome Biology

TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.

...read moreread less

Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

...read moreread less

47,038 citations

Journal Article•DOI•

Regression Shrinkage and Selection via the Lasso

[...]

Robert Tibshirani

01 Jan 1996-Journal of the royal statistical society series b-methodological

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.

...read moreread less

Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

...read moreread less

40,785 citations

Proceedings Article•DOI•

Going deeper with convolutions

[...]

Christian Szegedy¹, Wei Liu², Yangqing Jia¹, Pierre Sermanet¹, Scott Reed³, Dragomir Anguelov¹, Dumitru Erhan¹, Vincent Vanhoucke¹, Andrew Rabinovich - Show less +5 more•Institutions (3)

Google¹, University of North Carolina at Chapel Hill², University of Michigan³

07 Jun 2015

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

...read moreread less

40,257 citations

Book•

Deep Learning

[...]

Ian Goodfellow¹, Yoshua Bengio², Aaron Courville²•Institutions (2)

Google¹, Université de Montréal²

18 Nov 2016

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

...read moreread less

Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

...read moreread less

38,208 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse