scispace - formally typeset
Search or ask a question
Author

Sanford Weisberg

Other affiliations: Ford Motor Company
Bio: Sanford Weisberg is an academic researcher from University of Minnesota. The author has contributed to research in topics: Linear regression & Regression analysis. The author has an hindex of 42, co-authored 138 publications receiving 34431 citations. Previous affiliations of Sanford Weisberg include Ford Motor Company.


Papers
More filters
Book
29 Nov 2010
TL;DR: This tutorial jumps right in to the power of R without dragging you through the basic concepts of the programming language.
Abstract: Preface 1. Getting Started With R 2. Reading and Manipulating Data 3. Exploring and Transforming Data 4. Fitting Linear Models 5. Fitting Generalized Linear Models 6. Diagnosing Problems in Linear and Generalized Linear Models 7. Drawing Graphs 8. Writing Programs References Author Index Subject Index Command Index Data Set Index Package Index About the Authors

9,947 citations

Journal ArticleDOI
TL;DR: A publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates is described.
Abstract: The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efficient prediction of a response variable. Least Angle Regression (LARS), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods. Three main properties are derived: (1) A simple modification of the LARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS modification efficiently implements Forward Stagewise linear regression, another promising new model selection method; this connection explains the similar numerical results previously observed for the Lasso and Stagewise, and helps us understand the properties of both methods, which are seen as constrained versions of the simpler LARS algorithm. (3) A simple approximation for the degrees of freedom of a LARS estimate is available, from which we derive a Cp estimate of prediction error; this allows a principled choice among the range of possible LARS estimates. LARS and its variants are computationally efficient: the paper describes a publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates.

7,828 citations

Book
01 Jan 1980
TL;DR: In this paper, the authors present a method to estimate the least squares of a scatterplot matrix using a simple linear regression model, and compare it with the mean function of the scatterplot matrices.
Abstract: Preface.1 Scatterplots and Regression.1.1 Scatterplots.1.2 Mean Functions.1.3 Variance Functions.1.4 Summary Graph.1.5 Tools for Looking at Scatterplots.1.5.1 Size.1.5.2 Transformations.1.5.3 Smoothers for the Mean Function.1.6 Scatterplot Matrices.Problems.2 Simple Linear Regression.2.1 Ordinary Least Squares Estimation.2.2 Least Squares Criterion.2.3 Estimating sigma 2.2.4 Properties of Least Squares Estimates.2.5 Estimated Variances.2.6 Comparing Models: The Analysis of Variance.2.6.1 The F-Test for Regression.2.6.2 Interpreting p-values.2.6.3 Power of Tests.2.7 The Coefficient of Determination, R2.2.8 Confidence Intervals and Tests.2.8.1 The Intercept.2.8.2 Slope.2.8.3 Prediction.2.8.4 Fitted Values.2.9 The Residuals.Problems.3 Multiple Regression.3.1 Adding a Term to a Simple Linear Regression Model.3.1.1 Explaining Variability.3.1.2 Added-Variable Plots.3.2 The Multiple Linear Regression Model.3.3 Terms and Predictors.3.4 Ordinary Least Squares.3.4.1 Data and Matrix Notation.3.4.2 Variance-Covariance Matrix of e.3.4.3 Ordinary Least Squares Estimators.3.4.4 Properties of the Estimates.3.4.5 Simple Regression in Matrix Terms.3.5 The Analysis of Variance.3.5.1 The Coefficient of Determination.3.5.2 Hypotheses Concerning One of the Terms.3.5.3 Relationship to the t -Statistic.3.5.4 t-Tests and Added-Variable Plots.3.5.5 Other Tests of Hypotheses.3.5.6 Sequential Analysis of Variance Tables.3.6 Predictions and Fitted Values.Problems.4 Drawing Conclusions.4.1 Understanding Parameter Estimates.4.1.1 Rate of Change.4.1.2 Signs of Estimates.4.1.3 Interpretation Depends on Other Terms in the Mean Function.4.1.4 Rank Deficient and Over-Parameterized Mean Functions.4.1.5 Tests.4.1.6 Dropping Terms.4.1.7 Logarithms.4.2 Experimentation Versus Observation.4.3 Sampling from a Normal Population.4.4 More on R2.4.4.1 Simple Linear Regression and R2.4.4.2 Multiple Linear Regression.4.4.3 Regression through the Origin.4.5 Missing Data.4.5.1 Missing at Random.4.5.2 Alternatives.4.6 Computationally Intensive Methods.4.6.1 Regression Inference without Normality.4.6.2 Nonlinear Functions of Parameters.4.6.3 Predictors Measured with Error.Problems.5 Weights, Lack of Fit, and More.5.1 Weighted Least Squares.5.1.1 Applications of Weighted Least Squares.5.1.2 Additional Comments.5.2 Testing for Lack of Fit, Variance Known.5.3 Testing for Lack of Fit, Variance Unknown.5.4 General F Testing.5.4.1 Non-null Distributions.5.4.2 Additional Comments.5.5 Joint Confidence Regions.Problems.6 Polynomials and Factors.6.1 Polynomial Regression.6.1.1 Polynomials with Several Predictors.6.1.2 Using the Delta Method to Estimate a Minimum or a Maximum.6.1.3 Fractional Polynomials.6.2 Factors.6.2.1 No Other Predictors.6.2.2 Adding a Predictor: Comparing Regression Lines.6.2.3 Additional Comments.6.3 Many Factors.6.4 Partial One-Dimensional Mean Functions.6.5 Random Coefficient Models.Problems.7 Transformations.7.1 Transformations and Scatterplots.7.1.1 Power Transformations.7.1.2 Transforming Only the Predictor Variable.7.1.3 Transforming the Response Only.7.1.4 The Box and Cox Method.7.2 Transformations and Scatterplot Matrices.7.2.1 The 1D Estimation Result and Linearly Related Predictors.7.2.2 Automatic Choice of Transformation of Predictors.7.3 Transforming the Response.7.4 Transformations of Nonpositive Variables.Problems.8 Regression Diagnostics: Residuals.8.1 The Residuals.8.1.1 Difference Between e and e.8.1.2 The Hat Matrix.8.1.3 Residuals and the Hat Matrix with Weights.8.1.4 The Residuals When the Model Is Correct.8.1.5 The Residuals When the Model Is Not Correct.8.1.6 Fuel Consumption Data.8.2 Testing for Curvature.8.3 Nonconstant Variance.8.3.1 Variance Stabilizing Transformations.8.3.2 A Diagnostic for Nonconstant Variance.8.3.3 Additional Comments.8.4 Graphs for Model Assessment.8.4.1 Checking Mean Functions.8.4.2 Checking Variance Functions.Problems.9 Outliers and Influence.9.1 Outliers.9.1.1 An Outlier Test.9.1.2 Weighted Least Squares.9.1.3 Significance Levels for the Outlier Test.9.1.4 Additional Comments.9.2 Influence of Cases.9.2.1 Cook's Distance.9.2.2 Magnitude of Di .9.2.3 Computing Di .9.2.4 Other Measures of Influence.9.3 Normality Assumption.Problems.10 Variable Selection.10.1 The Active Terms.10.1.1 Collinearity.10.1.2 Collinearity and Variances.10.2 Variable Selection.10.2.1 Information Criteria.10.2.2 Computationally Intensive Criteria.10.2.3 Using Subject-Matter Knowledge.10.3 Computational Methods.10.3.1 Subset Selection Overstates Significance.10.4 Windmills.10.4.1 Six Mean Functions.10.4.2 A Computationally Intensive Approach.Problems.11 Nonlinear Regression.11.1 Estimation for Nonlinear Mean Functions.11.2 Inference Assuming Large Samples.11.3 Bootstrap Inference.11.4 References.Problems.12 Logistic Regression.12.1 Binomial Regression.12.1.1 Mean Functions for Binomial Regression.12.2 Fitting Logistic Regression.12.2.1 One-Predictor Example.12.2.2 Many Terms.12.2.3 Deviance.12.2.4 Goodness-of-Fit Tests.12.3 Binomial Random Variables.12.3.1 Maximum Likelihood Estimation.12.3.2 The Log-Likelihood for Logistic Regression.12.4 Generalized Linear Models.Problems.Appendix.A.1 Web Site.A.2 Means and Variances of Random Variables.A.2.1 E Notation.A.2.2 Var Notation.A.2.3 Cov Notation.A.2.4 Conditional Moments.A.3 Least Squares for Simple Regression.A.4 Means and Variances of Least Squares Estimates.A.5 Estimating E(Y |X) Using a Smoother.A.6 A Brief Introduction to Matrices and Vectors.A.6.1 Addition and Subtraction.A.6.2 Multiplication by a Scalar.A.6.3 Matrix Multiplication.A.6.4 Transpose of a Matrix.A.6.5 Inverse of a Matrix.A.6.6 Orthogonality.A.6.7 Linear Dependence and Rank of a Matrix.A.7 Random Vectors.A.8 Least Squares Using Matrices.A.8.1 Properties of Estimates.A.8.2 The Residual Sum of Squares.A.8.3 Estimate of Variance.A.9 The QR Factorization.A.10 Maximum Likelihood Estimates.A.11 The Box-Cox Method for Transformations.A.11.1 Univariate Case.A.11.2 Multivariate Case.A.12 Case Deletion in Linear Regression.References.Author Index.Subject Index.

3,215 citations

Book
21 Oct 1982

2,660 citations

Journal ArticleDOI

2,537 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this article, a model is described in an lmer call by a formula, in this case including both fixed-and random-effects terms, and the formula and data together determine a numerical representation of the model from which the profiled deviance or the profeatured REML criterion can be evaluated as a function of some of model parameters.
Abstract: Maximum likelihood or restricted maximum likelihood (REML) estimates of the parameters in linear mixed-effects models can be determined using the lmer function in the lme4 package for R. As for most model-fitting functions in R, the model is described in an lmer call by a formula, in this case including both fixed- and random-effects terms. The formula and data together determine a numerical representation of the model from which the profiled deviance or the profiled REML criterion can be evaluated as a function of some of the model parameters. The appropriate criterion is optimized, using one of the constrained optimization functions in R, to provide the parameter estimates. We describe the structure of the model, the steps in evaluating the profiled deviance or REML criterion, and the structure of classes or types that represents such a model. Sufficient detail is included to allow specialization of these structures by users who wish to write functions to fit specialized linear mixed models, such as models incorporating pedigrees or smoothing splines, that are not easily expressible in the formula language used by lmer.

50,607 citations

Journal Article
TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.

47,974 citations

Journal ArticleDOI
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

47,038 citations

Journal ArticleDOI
TL;DR: It is shown that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation, and an algorithm called LARS‐EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lamba.
Abstract: Summary. We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together.The elastic net is particularly useful when the number of predictors (p) is much bigger than the number of observations (n). By contrast, the lasso is not a very satisfactory variable selection method in the

16,538 citations

Journal ArticleDOI
TL;DR: In comparative timings, the new algorithms are considerably faster than competing methods and can handle large problems and can also deal efficiently with sparse features.
Abstract: We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, two-class logistic regression, and multinomial regression problems while the penalties include l(1) (the lasso), l(2) (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.

13,656 citations