scispace - formally typeset
Search or ask a question
Topic

Linear model

About: Linear model is a research topic. Over the lifetime, 19008 publications have been published within this topic receiving 1054229 citations. The topic is also known as: linear models.


Papers
More filters
Journal ArticleDOI
TL;DR: In this article, an easily interpretable index of predictive discrimination as well as methods for assessing calibration of predicted survival probabilities are discussed, which are particularly needed for binary, ordinal, and time-to-event outcomes.
Abstract: Multivariable regression models are powerful tools that are used frequently in studies of clinical outcomes. These models can use a mixture of categorical and continuous variables and can handle partially observed (censored) responses. However, uncritical application of modelling techniques can result in models that poorly fit the dataset at hand, or, even more likely, inaccurately predict outcomes on new subjects. One must know how to measure qualities of a model's fit in order to avoid poorly fitted or overfitted models. Measurement of predictive accuracy can be difficult for survival time data in the presence of censoring. We discuss an easily interpretable index of predictive discrimination as well as methods for assessing calibration of predicted survival probabilities. Both types of predictive accuracy should be unbiasedly validated using bootstrapping or cross-validation, before using predictions in a new data series. We discuss some of the hazards of poorly fitted and overfitted regression models and present one modelling strategy that avoids many of the problems discussed. The methods described are applicable to all regression models, but are particularly needed for binary, ordinal, and time-to-event outcomes. Methods are illustrated with a survival analysis in prostate cancer using Cox regression.

7,879 citations

Journal ArticleDOI
TL;DR: A publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates is described.
Abstract: The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efficient prediction of a response variable. Least Angle Regression (LARS), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods. Three main properties are derived: (1) A simple modification of the LARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS modification efficiently implements Forward Stagewise linear regression, another promising new model selection method; this connection explains the similar numerical results previously observed for the Lasso and Stagewise, and helps us understand the properties of both methods, which are seen as constrained versions of the simpler LARS algorithm. (3) A simple approximation for the degrees of freedom of a LARS estimate is available, from which we derive a Cp estimate of prediction error; this allows a principled choice among the range of possible LARS estimates. LARS and its variants are computationally efficient: the paper describes a publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates.

7,828 citations

Journal ArticleDOI
TL;DR: In this article, the authors make a case for the importance of reporting variance explained (R2) as a relevant summarizing statistic of mixed-effects models, which is rare, even though R2 is routinely reported for linear models and also generalized linear models (GLM).
Abstract: Summary The use of both linear and generalized linear mixed-effects models (LMMs and GLMMs) has become popular not only in social and medical sciences, but also in biological sciences, especially in the field of ecology and evolution. Information criteria, such as Akaike Information Criterion (AIC), are usually presented as model comparison tools for mixed-effects models. The presentation of ‘variance explained’ (R2) as a relevant summarizing statistic of mixed-effects models, however, is rare, even though R2 is routinely reported for linear models (LMs) and also generalized linear models (GLMs). R2 has the extremely useful property of providing an absolute value for the goodness-of-fit of a model, which cannot be given by the information criteria. As a summary statistic that describes the amount of variance explained, R2 can also be a quantity of biological interest. One reason for the under-appreciation of R2 for mixed-effects models lies in the fact that R2 can be defined in a number of ways. Furthermore, most definitions of R2 for mixed-effects have theoretical problems (e.g. decreased or negative R2 values in larger models) and/or their use is hindered by practical difficulties (e.g. implementation). Here, we make a case for the importance of reporting R2 for mixed-effects models. We first provide the common definitions of R2 for LMs and GLMs and discuss the key problems associated with calculating R2 for mixed-effects models. We then recommend a general and simple method for calculating two types of R2 (marginal and conditional R2) for both LMMs and GLMMs, which are less susceptible to common problems. This method is illustrated by examples and can be widely employed by researchers in any fields of research, regardless of software packages used for fitting mixed-effects models. The proposed method has the potential to facilitate the presentation of R2 for a wide range of circumstances.

7,749 citations

BookDOI
01 Jan 2001
TL;DR: In this article, the authors present a case study in least squares fitting and interpretation of a linear model, where they use nonparametric transformations of X and Y to fit a linear regression model.
Abstract: Introduction * General Aspects of Fitting Regression Models * Missing Data * Multivariable Modeling Strategies * Resampling, Validating, Describing, and Simplifying the Model * S-PLUS Software * Case Study in Least Squares Fitting and Interpretation of a Linear Model * Case Study in Imputation and Data Reduction * Overview of Maximum Likelihood Estimation * Binary Logistic Regression * Logistic Model Case Study 1: Predicting Cause of Death * Logistic Model Case Study 2: Survival of Titanic Passengers * Ordinal Logistic Regression * Case Study in Ordinal Regrssion, Data Reduction, and Penalization * Models Using Nonparametic Transformations of X and Y * Introduction to Survival Analysis * Parametric Survival Models * Case Study in Parametric Survival Modeling and Model Approximation * Cox Proportional Hazards Regression Model * Case Study in Cox Regression

7,264 citations

Book
01 Jan 1982
TL;DR: Within-subject and mixed designs of Factorial Design have been studied in this article, where the Principal Two-Factor Within-Factor Effects and Simple Effects have been used to estimate the effect size and power of interaction components.
Abstract: I. INTRODUCTION. 1. Experimental Design. II. SINGLE FACTOR EXPERIMENTS. 2. Sources of Variability and Sums of Squares. 3. Variance Estimates and F Ratio. 4. Analytical Comparisons Among Means. 5. Analysis of Trend. 6. Simultaneous Comparisons. 7. The Linear Model and Its Assumptions. 8. Effect Size and Power. 9. Using Statistical Software. III. FACTORIAL EXPERIMENTS WITH TWO FACTORS. 10. Introduction to the Factorial Design. 11. The Principal Two-Factor Effects. 12. Main Effects and Simple Effects. 13. The Analysis of Interaction Components. IV. NONORTHOGONALITY AND THE GENERAL LINEAR MODEL. 14. General Linear Model. 15. The Analysis of Covariance. V. WITHIN-SUBJECT DESIGNS. 16. The Single-Factor Within-Subject Design. 17. Further Within-Subject Topics. 18. The Two-Factor Within-Subject Design. 19. The Mixed Design: Overall Analysis. 20. The Mixed Design: Analytical Analyses. VI. HIGHER FACTORIAL DESIGNS AND OTHER EXTENSIONS. 21. The Overall Three-Factor Design. 22. The Three-Way Analytical Analysis. 23. Within-Subject and Mixed Designs. 24. Random Factors and Generalization. 25. Nested Factors. 26. Higher-Order Designs. Appendix A: Statistical Tables.

6,216 citations


Network Information
Related Topics (5)
Estimator
97.3K papers, 2.6M citations
92% related
Regression analysis
31K papers, 1.7M citations
89% related
Sampling (statistics)
65.3K papers, 1.2M citations
86% related
Inference
36.8K papers, 1.3M citations
85% related
Markov chain
51.9K papers, 1.3M citations
85% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023135
2022253
2021679
2020751
2019759
2018695