Home
/
Authors
/
Robert Tibshirani

Author

Robert Tibshirani

Other affiliations: University of Toronto, University of California, University of Nebraska Medical Center ...read more

Bio: Robert Tibshirani is an academic researcher from Stanford University. The author has contributed to research in topics: Lasso (statistics) & Elastic net regularization. The author has an hindex of 147, co-authored 593 publications receiving 326580 citations. Previous affiliations of Robert Tibshirani include University of Toronto & University of California.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Gene Expression Changes Induced by Genistein in the Prostate CancerCell Line LNCaP

[...]

Suvarna Bhamre, Debashis Sahoo, Robert Tibshirani, David L. Dill, James D. Brooks¹ - Show less +1 more•Institutions (1)

Stanford University¹

23 Oct 2010-The Open Prostate Cancer Journal

TL;DR: Genistein produces diverse effects on gene expression that are dose-dependent and this has important implications in developing genistein as a putative prostate cancer preventive agent.

...read moreread less

Abstract: Epidemiological evidence suggests that soy consumption is associated with a decreased risk of prostate cancer. The isoflavone genistein is found at high levels in soy and a large body of evidence suggests it is important in mediating the cancer preventive effects of soy. The mechanisms through which genistein acts in prostate cancer cells have not been fully defined. We used gene expression profiling to identify genes significantly modulated by low and high doses of ge- nistein in LNCaP cells. Significant genes were identified using StepMiner analysis and significantly altered pathways with Ingenuity Pathways analysis. Genistein significantly altered expression of transcripts involved in cell growth, carcinogen defenses and steroid signaling pathways. The effects of genistein on these pathways were confirmed by directly assessing dose-related effects on LNCaP cell growth, NQO-1 enzymatic activity and PSA protein expression. Genistein produces diverse effects on gene expression that are dose-dependent and this has important implications in developing genistein as a putative prostate cancer preventive agent.

...read moreread less

11 citations

Posted Content•

Discriminant Analysis with Adaptively Pooled Covariance

[...]

Noah Simon, Robert Tibshirani

07 Nov 2011-arXiv: Machine Learning

TL;DR: In this paper, a regularized model which adaptively pools elements of the precision matrices is proposed, which decreases the variance of our estimates without overly biasing them, and is shown to be effective on real and simulated datasets.

...read moreread less

Abstract: Linear and Quadratic Discriminant analysis (LDA/QDA) are common tools for classification problems. For these methods we assume observations are normally distributed within group. We estimate a mean and covariance matrix for each group and classify using Bayes theorem. With LDA, we estimate a single, pooled covariance matrix, while for QDA we estimate a separate covariance matrix for each group. Rarely do we believe in a homogeneous covariance structure between groups, but often there is insufficient data to separately estimate covariance matrices. We propose L1- PDA, a regularized model which adaptively pools elements of the precision matrices. Adaptively pooling these matrices decreases the variance of our estimates (as in LDA), without overly biasing them. In this paper, we propose and discuss this method, give an efficient algorithm to fit it for moderate sized problems, and show its efficacy on real and simulated datasets.

...read moreread less

11 citations

Journal Article•DOI•

Discussion: Multivariate Adaptive Regression Splines

[...]

Andreas Buja, Diane Duffy, Trevor Hastie, Robert Tibshirani

01 Mar 1991-Annals of Statistics

11 citations

Journal Article•DOI•

Gene expression deconvolution in linear space

[...]

Shai S. Shen-Orr¹, Robert Tibshirani², Atul J. Butte²•Institutions (2)

Rappaport Faculty of Medicine¹, Stanford University²

01 Jan 2012-Nature Methods

TL;DR: It is proved that when log-transformed signal is used as the input for signal reconstruction, it will always yield an underestimation of the true signal.

...read moreread less

Abstract: gene-expression values1. The authors mixed complementary RNA from the tissues and observed similar off-diagonal effects. They concluded that the off-diagonal effects are due to technical reasons, such as nonlinear sample amplification or probe crosshybridization, rather than statistical deconvolution. We found that this deviation of signal reconstruction was the result of data transformation. In microarray studies, expression data are logarithm-transformed for variance stabilization or for approximation of a normal distribution2. However, we argue that in the context of expression-profile deconvolution, the log transformation will produce biased estimation. Deconvolution is modeled by a linear equation O = S × W, where O is the expression data for mixed tissue samples, S is the tissue-specific expression profile, and W is the cell-type frequency matrix. If the signal is log-transformed, the linearity will no longer be preserved. The concavity feature of the log function will induce a downward bias to the reconstructed signal (Fig. 1a and Supplementary Fig. 1). Mathematically, it can be shown that the deconvolution model used on log-transformed signals is log(O ́) = log(S) × W, where O ́ is the csSAM estimate of gene-expression profiles. As W is a frequency matrix and its column values sum to 1, the following is true by the properties of concave functions3: log(S × W) > log(S) × W. Taking these two equations together, we can conclude that log(O ́) < log(S × W) = log(O). Thus, we proved that when log-transformed signal is used as the input for signal reconstruction, it will always yield an underestimation of the true signal. By taking an anti-log transformation, we obtained an unbiased reconstruction of the mixed tissue samples (Fig. 1b and Supplementary Fig. 2). The log transformation also introduced a large bias to the results of deconvolution (Fig. 1c and Supplementary Fig. 3). A substantial portion of the genes were off diagonal in the deconvolved cell type–specif ic gene-expression profiles. By performing the deconvolution in linear space, we achieved a considerably more accurate result (Fig. 1d and Supplementary Fig. 3). In summary, an incorrect transformation of data can greatly bias the final results of deconvolution. In the context of geneexpression deconvolution, a linear model achieves better accuracy. Accurate deconvolution of expression profiles is important for downstream analysis, such as gene expression analysis and pathway-enrichment analysis. We urge caution in selecting datatransformation functions and any preprocessing steps in modelbased statistical analysis.

...read moreread less

11 citations

In praise of sparsity and convexity

[...]

Robert Tibshirani

01 Jan 2013

TL;DR: The field of machine learning, discussed in this volume by my friend Larry Wasserman, has exploded and brought along with it the computational side of statistical research as mentioned in this paper, which is a thriving discipline, more and more an essential part of science, business and societal activities.

...read moreread less

Abstract: When asked to reflect on an anniversary of their field, scientists in most fields would sing the praises of their subject. As a statistician, I will do the same. However, here the praise is justified! Statistics is a thriving discipline, more and more an essential part of science, business and societal activities. Class enrollments are up — it seems that everyone wants to be a statistician — and there are jobs everywhere. The field of machine learning, discussed in this volume by my friend Larry Wasserman, has exploded and brought along with it the computational side of statistical research. Hal Varian, Chief Economist at Google, said “I keep saying that the sexy job in the next 10 years will be statisticians. And I’m not kidding.” Nate Silver, creator of the New York Times political forecasting blog “538” was constantly in the news and on talk shows in the runup to the 2012 US election. Using careful statistical modelling, he forecasted the election with near 100% accuracy (in contrast to many others). Although his training is in economics, he (proudly?) calls himself a statistician. When meeting people at a party, the label “Statistician” used to kill one’s chances of making a new friend. But no longer! In the midst of all this excitement about the growing importance of statistics, there are fascinating developments within the field itself. Here I will discuss one that has been the focus my research and that of many other statisticians.

...read moreread less

11 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
…
86
87
88
89
90
91
92
…
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

Scikit-learn: Machine Learning in Python

[...]

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel¹, Peter Prettenhofer², Ron Weiss³, Vincent Dubourg, Jake Vanderplas⁴, Alexandre Passos⁵, David Cournapeau, Matthieu Brucher⁶, Matthieu Perrot, Edouard Duchesnay - Show less +12 more•Institutions (6)

Kobe University¹, Bauhaus University, Weimar², Google³, University of Washington⁴, University of Massachusetts Amherst⁵, Total S.A.⁶

01 Feb 2011-Journal of Machine Learning Research

TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.

...read moreread less

Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.

...read moreread less

47,974 citations

Journal Article•DOI•

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

[...]

Michael I. Love¹, Michael I. Love², Wolfgang Huber, Simon Anders•Institutions (2)

Max Planck Society¹, Harvard University²

05 Dec 2014-Genome Biology

TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.

...read moreread less

Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

...read moreread less

47,038 citations

Journal Article•DOI•

Regression Shrinkage and Selection via the Lasso

[...]

Robert Tibshirani

01 Jan 1996-Journal of the royal statistical society series b-methodological

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.

...read moreread less

Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

...read moreread less

40,785 citations

Proceedings Article•DOI•

Going deeper with convolutions

[...]

Christian Szegedy¹, Wei Liu², Yangqing Jia¹, Pierre Sermanet¹, Scott Reed³, Dragomir Anguelov¹, Dumitru Erhan¹, Vincent Vanhoucke¹, Andrew Rabinovich - Show less +5 more•Institutions (3)

Google¹, University of North Carolina at Chapel Hill², University of Michigan³

07 Jun 2015

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

...read moreread less

40,257 citations

Book•

Deep Learning

[...]

Ian Goodfellow¹, Yoshua Bengio², Aaron Courville²•Institutions (2)

Google¹, Université de Montréal²

18 Nov 2016

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

...read moreread less

Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

...read moreread less

38,208 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse