scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A general and simple method for obtaining R2 from generalized linear mixed-effects models

01 Feb 2013-Methods in Ecology and Evolution (Wiley/Blackwell (10.1111))-Vol. 4, Iss: 2, pp 133-142
TL;DR: In this article, the authors make a case for the importance of reporting variance explained (R2) as a relevant summarizing statistic of mixed-effects models, which is rare, even though R2 is routinely reported for linear models and also generalized linear models (GLM).
Abstract: Summary The use of both linear and generalized linear mixed-effects models (LMMs and GLMMs) has become popular not only in social and medical sciences, but also in biological sciences, especially in the field of ecology and evolution. Information criteria, such as Akaike Information Criterion (AIC), are usually presented as model comparison tools for mixed-effects models. The presentation of ‘variance explained’ (R2) as a relevant summarizing statistic of mixed-effects models, however, is rare, even though R2 is routinely reported for linear models (LMs) and also generalized linear models (GLMs). R2 has the extremely useful property of providing an absolute value for the goodness-of-fit of a model, which cannot be given by the information criteria. As a summary statistic that describes the amount of variance explained, R2 can also be a quantity of biological interest. One reason for the under-appreciation of R2 for mixed-effects models lies in the fact that R2 can be defined in a number of ways. Furthermore, most definitions of R2 for mixed-effects have theoretical problems (e.g. decreased or negative R2 values in larger models) and/or their use is hindered by practical difficulties (e.g. implementation). Here, we make a case for the importance of reporting R2 for mixed-effects models. We first provide the common definitions of R2 for LMs and GLMs and discuss the key problems associated with calculating R2 for mixed-effects models. We then recommend a general and simple method for calculating two types of R2 (marginal and conditional R2) for both LMMs and GLMMs, which are less susceptible to common problems. This method is illustrated by examples and can be widely employed by researchers in any fields of research, regardless of software packages used for fitting mixed-effects models. The proposed method has the potential to facilitate the presentation of R2 for a wide range of circumstances.
Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors present an open-source implementation of structural equation models (SEM), a form of path analysis that resolves complex multivariate relationships among a suite of interrelated variables.
Abstract: Summary Ecologists and evolutionary biologists rely on an increasingly sophisticated set of statistical tools to describe complex natural systems. One such tool that has gained significant traction in the biological sciences is structural equation models (SEM), a form of path analysis that resolves complex multivariate relationships among a suite of interrelated variables. Evaluation of SEMs has historically relied on covariances among variables, rather than the values of the data points themselves. While this approach permits a wide variety of model forms, it limits the incorporation of detailed specifications. Recent developments have allowed for the simultaneous implementation of non-normal distributions, random effects and different correlation structures using local estimation, but this process is not yet automated and consequently, evaluation can be prohibitive with complex models. Here, I present a fully documented, open-source package piecewiseSEM, a practical implementation of confirmatory path analysis for the r programming language. The package extends this method to all current (generalized) linear, (phylogenetic) least-square, and mixed effects models, relying on familiar r syntax. I also provide two worked examples: one involving random effects and temporal autocorrelation, and a second involving phylogenetically independent contrasts. My goal is to provide a user-friendly and tractable implementation of SEM that also reflects the ecological and methodological processes generating data.

2,194 citations

Journal ArticleDOI
21 Jan 2014-eLife
TL;DR: In this article, the authors present the first systematic analysis of threat for a globally distributed lineage of 1,041 chondrichthyan fishes (sharks, rays, and chimaeras).
Abstract: The rapid expansion of human activities threatens ocean-wide biodiversity. Numerous marine animal populations have declined, yet it remains unclear whether these trends are symptomatic of a chronic accumulation of global marine extinction risk. We present the first systematic analysis of threat for a globally distributed lineage of 1,041 chondrichthyan fishes—sharks, rays, and chimaeras. We estimate that one-quarter are threatened according to IUCN Red List criteria due to overfishing (targeted and incidental). Large-bodied, shallow-water species are at greatest risk and five out of the seven most threatened families are rays. Overall chondrichthyan extinction risk is substantially higher than for most other vertebrates, and only one-third of species are considered safe. Population depletion has occurred throughout the world's ice-free waters, but is particularly prevalent in the Indo-Pacific Biodiversity Triangle and Mediterranean Sea. Improved management of fisheries and trade is urgently needed to avoid extinctions and promote population recovery.

1,467 citations

Journal ArticleDOI
TL;DR: This paper generalizes the methods called for Poisson and binomial GLMMs to all other non-Gaussian distributions, in particular to negative binomial and gamma distributions that are commonly used for modelling biological data and can be used across disciplines and regardless of statistical environments.
Abstract: The coefficient of determination R2 quantifies the proportion of variance explained by a statistical model and is an important summary statistic of biological interest. However, estimating R2 for g...

1,389 citations


Cites background or methods from "A general and simple method for obt..."

  • ...In the following, we present a worked example by expanding the beetle dataset that was generated for previous work [3]....

    [...]

  • ...We have reviewed methods for estimating R(2) and ICC in the past, with a particular focus on non-Gaussian response variables in the context of biological data [2,3]....

    [...]

  • ...Research papers in the field of ecology and evolution often report only regression coefficients but not variance components of GLMMs [3]....

    [...]

  • ...Furthermore, we refer to some special considerations when obtaining R(2)GLMM and ICCGLMM from binomial GLMMs for binary and proportion data, which we did not discuss in the past [2,3]....

    [...]

  • ...Each of these distributions has a theoretical variance, namely, p(2)/3, 1 and p(2)/6, respectively, which we previous referred to as distribution-specific variances [2,3] (table 2)....

    [...]

Journal ArticleDOI
23 May 2018-PeerJ
TL;DR: This overview should serve as a widely accessible code of best practice for applying LMMs to complex biological problems and model structures, and in doing so improve the robustness of conclusions drawn from studies investigating ecological and evolutionary questions.
Abstract: The use of linear mixed effects models (LMMs) is increasingly common in the analysis of biological data. Whilst LMMs offer a flexible approach to modelling a broad range of data types, ecological data are often complex and require complex model structures, and the fitting and interpretation of such models is not always straightforward. The ability to achieve robust biological inference requires that practitioners know how and when to apply these tools. Here, we provide a general overview of current methods for the application of LMMs to biological data, and highlight the typical pitfalls that can be encountered in the statistical modelling process. We tackle several issues regarding methods of model selection, with particular reference to the use of information theory and multi-model inference in ecology. We offer practical solutions and direct the reader to key references that provide further technical detail for those seeking a deeper understanding. This overview should serve as a widely accessible code of best practice for applying LMMs to complex biological problems and model structures, and in doing so improve the robustness of conclusions drawn from studies investigating ecological and evolutionary questions.

1,210 citations


Cites background or methods from "A general and simple method for obt..."

  • ...…for GLMMs is that it returns two complementary R2 values: the marginal R2 encompassing variance explained by only the fixed effects, and the conditional R2 comprising variance explained by both fixed and random effects i.e. the variance explained by the whole model (Nakagawa & Schielzeth, 2013)....

    [...]

  • ...The Nakagawa & Schielzeth (2013) R2 functions have been incorporated into several packages, including ‘MuMIn’ (Barto n, 2016) and ‘piecewiseSEM’ (Lefcheck, 2015), and Johnson (2014) has developed an extension of the functions for random slope models....

    [...]

  • ...The method that has gained the most support over recent years is that of Nakagawa & Schielzeth (2013)....

    [...]

  • ...Further reading: Nakagawa & Schielzeth (2013) provide an excellent and accessible description of the problems with, and solutions to, generalising R2 metrics to GLMMs....

    [...]

  • ...Diversemethods have been proposed to account for this in GLMMs, including multiple so-called ‘pseudo-r2’ measures of explained variance (Nagelkerke, 1991; Cox & Snell, 1989), but their performance is often unstable for mixed models and can return negative values (Nakagawa & Schielzeth, 2013)....

    [...]

Journal ArticleDOI
TL;DR: al. as discussed by the authors introduced the R package rptR for the estimation of ICC and R for Gaussian, binomial and Poisson-distributed data, which allows the quantification of coefficients of determination R2 as well as of raw variance components.
Abstract: Summary Intra-class correlations (ICC) and repeatabilities (R) are fundamental statistics for quantifying the reproducibility of measurements and for understanding the structure of biological variation. Linear mixed effects models offer a versatile framework for estimating ICC and R. However, while point estimation and significance testing by likelihood ratio tests is straightforward, the quantification of uncertainty is not as easily achieved. A further complication arises when the analysis is conducted on data with non-Gaussian distributions because the separation of the mean and the variance is less clear-cut for non-Gaussian than for Gaussian models. Nonetheless, there are solutions to approximate repeatability for the most widely used families of generalized linear mixed models (GLMMs). Here, we introduce the R package rptR for the estimation of ICC and R for Gaussian, binomial and Poisson-distributed data. Uncertainty in estimators is quantified by parametric bootstrapping and significance testing is implemented by likelihood ratio tests and through permutation of residuals. The package allows control for fixed effects and thus the estimation of adjusted repeatabilities (that remove fixed effect variance from the estimate) and enhanced agreement repeatabilities (that add fixed effect variance to the denominator). Furthermore, repeatability can be estimated from random-slope models. The package features convenient summary and plotting functions. Besides repeatabilities, the package also allows the quantification of coefficients of determination R2 as well as of raw variance components. We present an example analysis to demonstrate the core features and discuss some of the limitations of rptR.

1,044 citations


Cites background or methods from "A general and simple method for obt..."

  • ...The coefficient of determinationR2 is a similar statistic that quantifies the proportion of variance explained by fixed effects (marginal R2 sensu Nakagawa & Schielzeth 2013)....

    [...]

  • ...However, we have previously reviewed the equations for estimating repeatabilities and R2 from generalized linear mixed effects models (GLMMs) (Nakagawa & Schielzeth 2010, 2013)....

    [...]

  • ...Nonetheless, there are solutions to approximate repeatability for the most widely used families of generalized linearmixedmodels (GLMMs)....

    [...]

  • ...We will illustrate the features of rptR by estimating adjusted repeatabilities for Poisson data with log link for a dataset that was generated for estimating R2 in GLMMs (Nakagawa & Schielzeth 2013)....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.
Abstract: The problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion. These terms are a valid large-sample criterion beyond the Bayesian context, since they do not depend on the a priori distribution.

38,681 citations

Book
19 Jun 2013
TL;DR: The second edition of this book is unique in that it focuses on methods for making formal statistical inference from all the models in an a priori set (Multi-Model Inference).
Abstract: Introduction * Information and Likelihood Theory: A Basis for Model Selection and Inference * Basic Use of the Information-Theoretic Approach * Formal Inference From More Than One Model: Multi-Model Inference (MMI) * Monte Carlo Insights and Extended Examples * Statistical Theory and Numerical Results * Summary

36,993 citations


"A general and simple method for obt..." refers background or methods in this paper

  • ...Information criteria are used to select the ‘best’ or ‘better’ models, and they are indeed useful for selecting the most parsimonious models from a candidate model set (Burnham & Anderson 2002)....

    [...]

  • ...…provide an estimate of the relative fit of alternative models, they do not tell us anything about the absolute model fit (cf. evidence ratio; Burnham & Anderson 2002), (ii) information criteria do not provide any information on variance explained by a model (Orelien & Edwards 2008), and…...

    [...]

01 Jan 2005
TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.
Abstract: The problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion. These terms are a valid large-sample criterion beyond the Bayesian context, since they do not depend on the a priori distribution.

36,760 citations

Book
01 Jan 1966
TL;DR: In this article, the Straight Line Case is used to fit a straight line by least squares, and the Durbin-Watson Test is used for checking the straight line fit.
Abstract: Basic Prerequisite Knowledge. Fitting a Straight Line by Least Squares. Checking the Straight Line Fit. Fitting Straight Lines: Special Topics. Regression in Matrix Terms: Straight Line Case. The General Regression Situation. Extra Sums of Squares and Tests for Several Parameters Being Zero. Serial Correlation in the Residuals and the Durbin--Watson Test. More of Checking Fitted Models. Multiple Regression: Special Topics. Bias in Regression Estimates, and Expected Values of Mean Squares and Sums of Squares. On Worthwhile Regressions, Big F's, and R 2 . Models Containing Functions of the Predictors, Including Polynomial Models. Transformation of the Response Variable. "Dummy" Variables. Selecting the "Best" Regression Equation. Ill--Conditioning in Regression Data. Ridge Regression. Generalized Linear Models (GLIM). Mixture Ingredients as Predictor Variables. The Geometry of Least Squares. More Geometry of Least Squares. Orthogonal Polynomials and Summary Data. Multiple Regression Applied to Analysis of Variance Problems. An Introduction to Nonlinear Estimation. Robust Regression. Resampling Procedures (Bootstrapping). Bibliography. True/False Questions. Answers to Exercises. Tables. Indexes.

18,952 citations

Proceedings Article
01 Jan 1973
TL;DR: The classical maximum likelihood principle can be considered to be a method of asymptotic realization of an optimum estimate with respect to a very general information theoretic criterion to provide answers to many practical problems of statistical model fitting.
Abstract: In this paper it is shown that the classical maximum likelihood principle can be considered to be a method of asymptotic realization of an optimum estimate with respect to a very general information theoretic criterion. This observation shows an extension of the principle to provide answers to many practical problems of statistical model fitting.

18,539 citations


"A general and simple method for obt..." refers methods in this paper

  • ...Commonly used information criteria include Akaike Information Criterion (AIC) (Akaike 1973), Bayesian information criterion (BIC), (Schwarz 1978) and the more recently proposed deviance information criterion (DIC), (Spiegelhalter et al. 2002; reviewed in Claeskens & Hjort 2009; Grueber et al. 2011; Hamaker et al. 2011)....

    [...]

  • ...Information criteria, such as Akaike Information Criterion (AIC), are usually presented as model comparison tools formixed-effectsmodels....

    [...]

  • ...Commonly used information criteria include Akaike Information Criterion (AIC) (Akaike 1973), Bayesian information criterion (BIC), (Schwarz 1978) and the more recently proposed deviance information criterion (DIC), (Spiegelhalter et al. 2002; reviewed in Claeskens & Hjort 2009; Grueber et al. 2011;…...

    [...]

  • ...Despite these limitations, when used along with other statistics such as AIC and PCV, R2GLMM will be a useful summary statistic of mixed-effects models for both biologists and other scientists alike....

    [...]