scispace - formally typeset
Author

Robert Tibshirani

Bio: Robert Tibshirani is a academic researcher from Stanford University. The author has contributed to research in topic(s): Lasso (statistics) & Gene expression profiling. The author has an hindex of 147, co-authored 593 publication(s) receiving 326580 citation(s). Previous affiliations of Robert Tibshirani include University of Toronto & University of California.

...read more

Papers
  More

Open accessBook
Bradley Efron1, Robert TibshiraniInstitutions (1)
01 Jan 1993-
Abstract: This article presents bootstrap methods for estimation, using simple arguments. Minitab macros for implementing these methods are given.

...read more

Topics: Bootstrap aggregating (57%)

36,497 Citations


Journal ArticleDOI: 10.1111/J.2517-6161.1996.TB02080.X
Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

...read more

Topics: Lasso (statistics) (70%), Elastic net regularization (68%), Residual sum of squares (58%) ...read more

36,018 Citations


Open accessBook
28 Jul 2013-
Abstract: During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression and path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for ``wide'' data (p bigger than n), including multiple testing and false discovery rates. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.

...read more

Topics: Least-angle regression (59%), Lasso (statistics) (55%), Ensemble learning (54%) ...read more

18,981 Citations


Open accessBook
01 Jan 2001-
Topics: Algorithmic learning theory (60%), Semi-supervised learning (55%), Ensemble learning (54%) ...read more

18,681 Citations


Open accessJournal ArticleDOI: 10.1073/PNAS.091062498
Abstract: Microarrays can measure the expression of thousands of genes to identify changes in expression between different biological states. Methods are needed to determine the significance of these changes while accounting for the enormous number of genes. We describe a method, Significance Analysis of Microarrays (SAM), that assigns a score to each gene on the basis of change in gene expression relative to the standard deviation of repeated measurements. For genes with scores greater than an adjustable threshold, SAM uses permutations of the repeated measurements to estimate the percentage of genes identified by chance, the false discovery rate (FDR). When the transcriptional response of human cells to ionizing radiation was measured by microarrays, SAM identified 34 genes that changed at least 1.5-fold with an estimated FDR of 12%, compared with FDRs of 60 and 84% by using conventional methods of analysis. Of the 34 genes, 19 were involved in cell cycle regulation and 3 in apoptosis. Surprisingly, four nucleotide excision repair genes were induced, suggesting that this repair pathway for UV-damaged DNA might play a previously unrecognized role in repairing DNA damaged by ionizing radiation.

...read more

11,833 Citations


Cited by
  More

Open accessBook
Bradley Efron1, Robert TibshiraniInstitutions (1)
01 Jan 1993-
Abstract: This article presents bootstrap methods for estimation, using simple arguments. Minitab macros for implementing these methods are given.

...read more

Topics: Bootstrap aggregating (57%)

36,497 Citations


Journal ArticleDOI: 10.1111/J.2517-6161.1996.TB02080.X
Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

...read more

Topics: Lasso (statistics) (70%), Elastic net regularization (68%), Residual sum of squares (58%) ...read more

36,018 Citations


Open accessJournal Article
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.

...read more

33,540 Citations


Open accessJournal ArticleDOI: 10.1186/S13059-014-0550-8
05 Dec 2014-Genome Biology
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

...read more

Topics: MRNA Sequencing (54%), Integrator complex (51%), Count data (50%) ...read more

29,675 Citations


Open accessProceedings ArticleDOI: 10.1109/CVPR.2015.7298594
Christian Szegedy1, Wei Liu2, Yangqing Jia1, Pierre Sermanet1  +5 moreInstitutions (3)
07 Jun 2015-
Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

...read more

29,453 Citations


Performance
Metrics

Author's H-index: 147

No. of papers from the Author in previous years
YearPapers
202130
202032
201924
201830
201721
201619

Top Attributes

Show by:

Author's top 5 most impactful journals

arXiv: Methodology

36 papers, 1.1K citations

Blood

26 papers, 1.9K citations

Annals of Statistics

22 papers, 18.8K citations

Biostatistics

19 papers, 8.5K citations

Network Information
Related Authors (5)
Wenfei Du

10 papers, 199 citations

86% related
Holger Höfling

9 papers, 3.5K citations

84% related
Ash A. Alizadeh

225 papers, 36.3K citations

84% related
Chih Long Liu

88 papers, 18K citations

83% related
Andrew J. Gentles

119 papers, 15.3K citations

83% related