scispace - formally typeset
Search or ask a question
Author

Robert Tibshirani

Bio: Robert Tibshirani is an academic researcher from Stanford University. The author has contributed to research in topics: Lasso (statistics) & Elastic net regularization. The author has an hindex of 147, co-authored 593 publications receiving 326580 citations. Previous affiliations of Robert Tibshirani include University of Toronto & University of California.


Papers
More filters
Posted Content
TL;DR: Using the nuclear norm as a regularizer, the convex relaxation techniques are used to provide a sequence of solutions to the matrix completion problem, and an algorithm iteratively replaces the missing elements with those obtained from a thresholded SVD.
Abstract: We use convex relaxation techniques to provide a sequence of solutions to the matrix completionproblem. Using the nuclear norm as a regularizer, we provide simple and very efficient algorithms forminimizing the reconstruction error subject to a bound on the nuclear norm. Our algorithm iterativelyreplaces the missing elements with those obtained from a thresholded SVD. With warm starts this allowsus to efficiently compute an entire regularization path of solutions. 1 Introduction In many applications measured data can be represented in a matrix X m×n , for which only a relativelysmall number of entries are observed. The problem is to “complete” the matrix based on the observedentries, and has been dubbed the matrix completion problem [CCS08, CR08, RFP07, CT09, KOM09]. The“Netflix” competition is a primary example, where the data is the basis for a recommender system. Therows correspond to viewers and the columns to movies, with the entry X ij being the rating ∈{1,...,5}byviewer i for movie j. There are 480K viewers and 18K movies, and hence 8.6 billion (8.6 ×10

10 citations

Posted Content
TL;DR: This paper uses a "stacking" idea that collects the features and outcomes of the survival data in a large data frame, and then treats it as a classification problem and shows that this approach is approximately equivalent to the Cox proportional hazards model with both theoretical analysis and simulation studies.
Abstract: In this paper, we explore a method for treating survival analysis as a classification problem. The method uses a "stacking" idea that collects the features and outcomes of the survival data in a large data frame, and then treats it as a classification problem. In this framework, various statistical learning algorithms (including logistic regression, random forests, gradient boosting machines and neural networks) can be applied to estimate the parameters and make predictions. For stacking with logistic regression, we show that this approach is approximately equivalent to the Cox proportional hazards model with both theoretical analysis and simulation studies. For stacking with other machine learning algorithms, we show through simulation studies that our method can outperform Cox proportional hazards model in terms of estimated survival curves. This idea is not new, but we believe that it should be better known by statistiicians and other data scientists.

10 citations

Journal Article
TL;DR: LassoNet as discussed by the authors enforces a hierarchy: a feature can participate in a hidden unit only if its linear representative is active, and integrates feature selection with the parameter learning directly, as a result it delivers an entire regularization path of solutions with a range of feature sparsity.
Abstract: Much work has been done recently to make neural networks more interpretable, and one obvious approach is to arrange for the network to use only a subset of the available features. In linear models, Lasso (or $\ell_1$-regularized) regression assigns zero weights to the most irrelevant or redundant features, and is widely used in data science. However the Lasso only applies to linear models. Here we introduce LassoNet, a neural network framework with global feature selection. Our approach enforces a hierarchy: specifically a feature can participate in a hidden unit only if its linear representative is active. Unlike other approaches to feature selection for neural nets, our method uses a modified objective function with constraints, and so integrates feature selection with the parameter learning directly. As a result, it delivers an entire regularization path of solutions with a range of feature sparsity. On systematic experiments, LassoNet significantly outperforms state-of-the-art methods for feature selection and regression. The LassoNet method uses projected proximal gradient descent, and generalizes directly to deep networks. It can be implemented by adding just a few lines of code to a standard neural network.

9 citations

Journal ArticleDOI
TL;DR: The procedure is shown to be more effective than CART in uncovering “main effects” and it can lead to a simpler description of the data, although it is not as effective in terms of classification error.
Abstract: We describe a procedure for the classification and description of binary response data. The model is a special case of the multivariate adaptive regression splines (MARS) model; the emphasis is on piecewise constant basis functions, as in the classification and regression trees (CART) approach of Breiman, Friedman, Olshen, and Stone. The procedure is based on the logistic model for binary data. A binary logistic model is built up as the sum of products of indicator functions, as in MARS. The model is then pruned in a backward stepwise manner, using cross-validation as a guide. The pruning is strictly hierarchical (as in CART) to preserve interpretability of the final model. Through simulated and real examples, the procedure is shown to be more effective than CART in uncovering “main effects” and it can lead to a simpler description of the data. On the other hand, it is not as effective as CART in terms of classification error.

9 citations

Posted ContentDOI
06 Sep 2021-medRxiv
TL;DR: In this article, a systematic assessment of polygenic risk score (PRS) prediction across more than 1,600 traits using genetic and phenotype data in the UK Biobank is presented.
Abstract: We present a systematic assessment of polygenic risk score (PRS) prediction across more than 1,600 traits using genetic and phenotype data in the UK Biobank. We report 428 sparse PRS models with significant (p < 2.5 x 10-5) incremental predictive performance when compared against the covariate-only model that considers age, sex, and the genotype principal components. We report a significant correlation between the number of genetic variants selected in the sparse PRS model and the incremental predictive performance in quantitative traits (Spearmans {rho} = 0.54, p = 1.4 x 10-15), but not in binary traits ({rho} = 0.059, p = 0.35). The sparse PRS model trained on European individuals showed limited transferability when evaluated on individuals from non-European individuals in the UK Biobank. We provide the PRS model weights on the Global Biobank Engine (https://biobankengine.stanford.edu/prs).

9 citations


Cited by
More filters
Journal Article
TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.

47,974 citations

Journal ArticleDOI
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

47,038 citations

Journal ArticleDOI
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

40,785 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

40,257 citations

Book
18 Nov 2016
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

38,208 citations