scispace - formally typeset
Search or ask a question
Book

Ecological Models and Data in R

21 Jul 2008-
TL;DR: In step-by-step detail, Benjamin Bolker teaches ecology graduate students and researchers everything they need to know in order to use maximum likelihood, information-theoretic, and Bayesian techniques to analyze their own data using the programming language R.
Abstract: Ecological Models and Data in R is the first truly practical introduction to modern statistical methods for ecology. In step-by-step detail, the book teaches ecology graduate students and researchers everything they need to know in order to use maximum likelihood, information-theoretic, and Bayesian techniques to analyze their own data using the programming language R. Drawing on extensive experience teaching these techniques to graduate students in ecology, Benjamin Bolker shows how to choose among and construct statistical models for data, estimate their parameters and confidence limits, and interpret the results. The book also covers statistical frameworks, the philosophy of statistical modeling, and critical mathematical functions and probability distributions. It requires no programming background--only basic calculus and statistics.
Citations
More filters
01 Jan 2016
TL;DR: The modern applied statistics with s is universally compatible with any devices to read, and is available in the digital library an online access to it is set as public so you can download it instantly.
Abstract: Thank you very much for downloading modern applied statistics with s. As you may know, people have search hundreds times for their favorite readings like this modern applied statistics with s, but end up in harmful downloads. Rather than reading a good book with a cup of coffee in the afternoon, instead they cope with some harmful virus inside their laptop. modern applied statistics with s is available in our digital library an online access to it is set as public so you can download it instantly. Our digital library saves in multiple countries, allowing you to get the most less latency time to download any of our books like this one. Kindly say, the modern applied statistics with s is universally compatible with any devices to read.

5,249 citations

Journal ArticleDOI
TL;DR: Akaike’s information criterion is provided, using recent examples from the behavioural ecology literature, a simple introductory guide to AIC: what it is, how and when to apply it and what it achieves.
Abstract: Akaike’s information criterion (AIC) is increasingly being used in analyses in the field of ecology. This measure allows one to compare and rank multiple competing models and to estimate which of them best approximates the “true” process underlying the biological phenomenon under study. Behavioural ecologists have been slow to adopt this statistical tool, perhaps because of unfounded fears regarding the complexity of the technique. Here, we provide, using recent examples from the behavioural ecology literature, a simple introductory guide to AIC: what it is, how and when to apply it and what it achieves. We discuss multimodel inference using AIC—a procedure which should be used where no one model is strongly supported. Finally, we highlight a few of the pitfalls and problems that can be encountered by novice practitioners.

1,946 citations


Cites background from "Ecological Models and Data in R"

  • ...Although it is not our intention here to cover in detail the criticisms of the IT-AIC approach, one aspect that has been identified as a weakness is that, despite explicitly taking into account the number of predictors, AIC still tends to favour overly complex models (Kass and Raftery 1995; Link and Barker 2006)....

    [...]

  • ...…0 and 1, with the sum of Akaike weights of all models in the candidate set being 1, and can be considered as analogous to the probability that a given model is the best approximating model (although there are some who disagree with this, e.g. Link and Barker 2006; Bolker 2008; Richards 2005)....

    [...]

  • ...The Akaike weight is a value between 0 and 1, with the sum of Akaike weights of all models in the candidate set being 1, and can be considered as analogous to the probability that a given model is the best approximating model (although there are some who disagree with this, e.g. Link and Barker 2006; Bolker 2008; Richards 2005)....

    [...]

  • ...Considerable literature exists discussing the origin, philosophy and application of AIC (e.g. Burnham and Anderson 2001, 2004; Burnham et al. 2010), and criticism of AIC is likewise prevalent (e.g. Guthery et al. 2005; Richards 2005; Stephens et al. 2005; Link and Barker 2006)....

    [...]

  • ...The Akaike weight for a given model, i, is calculated from Δi values as: wi ¼ exp 12 Δi PR r¼1 exp 12 Δr The Akaike weight is a value between 0 and 1, with the sum of Akaike weights of all models in the candidate set being 1, and can be considered as analogous to the probability that a given model is the best approximating model (although there are some who disagree with this, e.g. Link and Barker 2006; Bolker 2008; Richards 2005)....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors present a study material for the participants of the course named Multivariate Analysis of Ecological Data that we teach at our university for the third year, which provides an easy-to-read supplement for the more exact and detailed publications like the collection of the Dr. Ter Braak' papers and the Canoco for Windows 4.0 manual.
Abstract: 2 Foreword This textbook provides study materials for the participants of the course named Multivariate Analysis of Ecological Data that we teach at our university for the third year. Material provided here should serve both for the introductory and the advanced versions of the course. We admit that some parts of the text would profit from further polishing, they are quite rough but we hope in further improvement of this text. We hope that this book provides an easy-to-read supplement for the more exact and detailed publications like the collection of the Dr. Ter Braak' papers and the Canoco for Windows 4.0 manual. In addition to the scope of these publications, this textbook adds information on the classification methods of the multivariate data analysis and introduces some of the modern regression methods most useful in the ecological research. Wherever we refer to some commercial software products, these are covered by trademarks or registered marks of their respective producers. This publication is far from being final and this is seen on its quality: some issues appear repeatedly through the book, but we hope this provides, at least, an opportunity to the reader to see the same topic expressed in different words.

1,870 citations

Journal ArticleDOI
TL;DR: This paper generalizes the methods called for Poisson and binomial GLMMs to all other non-Gaussian distributions, in particular to negative binomial and gamma distributions that are commonly used for modelling biological data and can be used across disciplines and regardless of statistical environments.
Abstract: The coefficient of determination R2 quantifies the proportion of variance explained by a statistical model and is an important summary statistic of biological interest. However, estimating R2 for g...

1,389 citations

Journal ArticleDOI
TL;DR: Comparisons demonstrate that, if the use of loops is avoided, R code can efficiently integrate problems comprising several thousands of state variables, and the same problem may be solved from 2 to more than 50 times faster by using compiled code compared to an implementation using only R code.
Abstract: In this paper we present the R package deSolve to solve initial value problems (IVP) written as ordinary differential equations (ODE), differential algebraic equations (DAE) of index 0 or 1 and partial differential equations (PDE), the latter solved using the method of lines approach. The differential equations can be represented in R code or as compiled code. In the latter case, R is used as a tool to trigger the integration and post-process the results, which facilitates model development and application, whilst the compiled code significantly increases simulation speed. The methods implemented are efficient, robust, and well documented public-domain Fortran routines. They include four integrators from the ODEPACK package (LSODE, LSODES, LSODA, LSODAR), DVODE and DASPK2.0. In addition, a suite of Runge-Kutta integrators and special-purpose solvers to efficiently integrate 1-, 2- and 3-dimensional partial differential equations are available. The routines solve both stiff and non-stiff systems, and include many options, e.g., to deal in an efficient way with the sparsity of the Jacobian matrix, or finding the root of equations. In this article, our objectives are threefold: (1) to demonstrate the potential of using R for dynamic modeling, (2) to highlight typical uses of the different methods implemented and (3) to compare the performance of models specified in R code and in compiled code for a number of test cases. These comparisons demonstrate that, if the use of loops is avoided, R code can efficiently integrate problems comprising several thousands of state variables. Nevertheless, the same problem may be solved from 2 to more than 50 times faster by using compiled code compared to an implementation using only R code. Still, amongst the benefits of R are a more flexible and interactive implementation, better readability of the code, and access to R’s high-level procedures. deSolve is the successor of package odesolve which will be deprecated in the future; it is free software and distributed under the GNU General Public License, as part of the R software project.

1,264 citations


Cites background from "Ecological Models and Data in R"

  • ...An increasing number of textbooks deal with the subject (Ellner and Guckenheimer 2006; Bolker 2008; Soetaert and Herman 2009; Stevens 2009)....

    [...]

References
More filters
Journal ArticleDOI
13 May 1983-Science
TL;DR: There is a deep and useful connection between statistical mechanics and multivariate or combinatorial optimization (finding the minimum of a given function depending on many parameters), and a detailed analogy with annealing in solids provides a framework for optimization of very large and complex systems.
Abstract: There is a deep and useful connection between statistical mechanics (the behavior of systems with many degrees of freedom in thermal equilibrium at a finite temperature) and multivariate or combinatorial optimization (finding the minimum of a given function depending on many parameters). A detailed analogy with annealing in solids provides a framework for optimization of the properties of very large and complex systems. This connection to statistical mechanics exposes new information and provides an unfamiliar perspective on traditional optimization problems and methods.

41,772 citations

Journal ArticleDOI
TL;DR: In this paper, the authors consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined and derive a measure pD for the effective number in a model as the difference between the posterior mean of the deviances and the deviance at the posterior means of the parameters of interest, which is related to other information criteria and has an approximate decision theoretic justification.
Abstract: Summary. We consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined. Using an information theoretic argument we derive a measure pD for the effective number of parameters in a model as the difference between the posterior mean of the deviance and the deviance at the posterior means of the parameters of interest. In general pD approximately corresponds to the trace of the product of Fisher's information and the posterior covariance, which in normal models is the trace of the ‘hat’ matrix projecting observations onto fitted values. Its properties in exponential families are explored. The posterior mean deviance is suggested as a Bayesian measure of fit or adequacy, and the contributions of individual observations to the fit and complexity can give rise to a diagnostic plot of deviance residuals against leverages. Adding pD to the posterior mean deviance gives a deviance information criterion for comparing models, which is related to other information criteria and has an approximate decision theoretic justification. The procedure is illustrated in some examples, and comparisons are drawn with alternative Bayesian and classical proposals. Throughout it is emphasized that the quantities required are trivial to compute in a Markov chain Monte Carlo analysis.

11,691 citations

Journal ArticleDOI

11,285 citations


"Ecological Models and Data in R" refers methods in this paper

  • ...Various modifications of Newton’s method mitigate some of these problems (Press et al., 1994), and other methods called“quasi-Newton”methods use the general idea of calculating derivatives to iteratively approximate the root of the derivatives....

    [...]

  • ...The classic stochastic optimization algorithm is the Metropolis algorithm (or simulated annealing (Kirkpatrick et al., 1983; Press et al., 1994)....

    [...]

  • ...method, which is a combination of golden section search and parabolic interpolation (Press et al., 1994)....

    [...]

Book
29 Mar 2013
TL;DR: Linear Mixed-Effects and Nonlinear Mixed-effects (NLME) models have been studied in the literature as mentioned in this paper, where the structure of grouped data has been used for fitting LME models.
Abstract: Linear Mixed-Effects * Theory and Computational Methods for LME Models * Structure of Grouped Data * Fitting LME Models * Extending the Basic LME Model * Nonlinear Mixed-Effects * Theory and Computational Methods for NLME Models * Fitting NLME Models

10,715 citations


"Ecological Models and Data in R" refers background in this paper

  • ...There are a growing number of introductory books using R (Dalgaard, 2003; Verzani, 2005; Crawley, 2005), books of examples (Maindonald and Braun, 2003; Heiberger and Holland, 2004; Everitt and Hothorn, 2006), more advanced and encyclopedic books covering a range of statistical approaches (Venables and Ripley, 2002; Crawley, 2002), and books on specific topics such as regression analysis Fox (2002); Faraway (2004), mixed-effect models (Pinheiro and Bates, 2000), phylogenetics (Paradis, 2006), generalized additive models (Wood, 2006), etc....

    [...]

  • ...ˆ The distribution of deviances may not be an equal mixture of χ(2)n and χ(2)n−1 (Pinheiro and Bates, 2000)....

    [...]

  • ...) as well as fixed effects (effects of covariates) (Pinheiro and Bates, 2000)....

    [...]

  • ...The other limitation of the LRT that frequently arises, although it is often ignored, is that it only works when the best estimate of the parameter is not on the edge of its allowable range (Pinheiro and Bates, 2000)....

    [...]

Journal ArticleDOI
TL;DR: This is the Ž rst book on generalized linear models written by authors not mostly associated with the biological sciences, and it is thoroughly enjoyable to read.
Abstract: This is the Ž rst book on generalized linear models written by authors not mostly associated with the biological sciences. Subtitled “With Applications in Engineering and the Sciences,” this book’s authors all specialize primarily in engineering statistics. The Ž rst author has produced several recent editions of Walpole, Myers, and Myers (1998), the last reported by Ziegel (1999). The second author has had several editions of Montgomery and Runger (1999), recently reported by Ziegel (2002). All of the authors are renowned experts in modeling. The Ž rst two authors collaborated on a seminal volume in applied modeling (Myers and Montgomery 2002), which had its recent revised edition reported by Ziegel (2002). The last two authors collaborated on the most recent edition of a book on regression analysis (Montgomery, Peck, and Vining (2001), reported by Gray (2002), and the Ž rst author has had multiple editions of his own regression analysis book (Myers 1990), the latest of which was reported by Ziegel (1991). A comparable book with similar objectives and a more speciŽ c focus on logistic regression, Hosmer and Lemeshow (2000), reported by Conklin (2002), presumed a background in regression analysis and began with generalized linear models. The Preface here (p. xi) indicates an identical requirement but nonetheless begins with 100 pages of material on linear and nonlinear regression. Most of this will probably be a review for the readers of the book. Chapter 2, “Linear Regression Model,” begins with 50 pages of familiar material on estimation, inference, and diagnostic checking for multiple regression. The approach is very traditional, including the use of formal hypothesis tests. In industrial settings, use of p values as part of a risk-weighted decision is generally more appropriate. The pedagologic approach includes formulas and demonstrations for computations, although computing by Minitab is eventually illustrated. Less-familiar material on maximum likelihood estimation, scaled residuals, and weighted least squares provides more speciŽ c background for subsequent estimation methods for generalized linear models. This review is not meant to be disparaging. The authors have packed a wealth of useful nuggets for any practitioner in this chapter. It is thoroughly enjoyable to read. Chapter 3, “Nonlinear Regression Models,” is arguably less of a review, because regression analysis courses often give short shrift to nonlinear models. The chapter begins with a great example on the pitfalls of linearizing a nonlinear model for parameter estimation. It continues with the effective balancing of explicit statements concerning the theoretical basis for computations versus the application and demonstration of their use. The details of maximum likelihood estimation are again provided, and weighted and generalized regression estimation are discussed. Chapter 4 is titled “Logistic and Poisson Regression Models.” Logistic regression provides the basic model for generalized linear models. The prior development for weighted regression is used to motivate maximum likelihood estimation for the parameters in the logistic model. The algebraic details are provided. As in the development for linear models, some of the details are pushed into an appendix. In addition to connecting to the foregoing material on regression on several occasions, the authors link their development forward to their following chapter on the entire family of generalized linear models. They discuss score functions, the variance-covariance matrix, Wald inference, likelihood inference, deviance, and overdispersion. Careful explanations are given for the values provided in standard computer software, here PROC LOGISTIC in SAS. The value in having the book begin with familiar regression concepts is clearly realized when the analogies are drawn between overdispersion and nonhomogenous variance, or analysis of deviance and analysis of variance. The authors rely on the similarity of Poisson regression methods to logistic regression methods and mostly present illustrations for Poisson regression. These use PROC GENMOD in SAS. The book does not give any of the SAS code that produces the results. Two of the examples illustrate designed experiments and modeling. They include discussion of subset selection and adjustment for overdispersion. The mathematic level of the presentation is elevated in Chapter 5, “The Family of Generalized Linear Models.” First, the authors unify the two preceding chapters under the exponential distribution. The material on the formal structure for generalized linear models (GLMs), likelihood equations, quasilikelihood, the gamma distribution family, and power functions as links is some of the most advanced material in the book. Most of the computational details are relegated to appendixes. A discussion of residuals returns one to a more practical perspective, and two long examples on gamma distribution applications provide excellent guidance on how to put this material into practice. One example is a contrast to the use of linear regression with a log transformation of the response, and the other is a comparison to the use of a different link function in the previous chapter. Chapter 6 considers generalized estimating equations (GEEs) for longitudinal and analogous studies. The Ž rst half of the chapter presents the methodology, and the second half demonstrates its application through Ž ve different examples. The basis for the general situation is Ž rst established using the case with a normal distribution for the response and an identity link. The importance of the correlation structure is explained, the iterative estimation procedure is shown, and estimation for the scale parameters and the standard errors of the coefŽ cients is discussed. The procedures are then generalized for the exponential family of distributions and quasi-likelihood estimation. Two of the examples are standard repeated-measures illustrations from biostatistical applications, but the last three illustrations are all interesting reworkings of industrial applications. The GEE computations in PROC GENMOD are applied to account for correlations that occur with multiple measurements on the subjects or restrictions to randomizations. The examples show that accounting for correlation structure can result in different conclusions. Chapter 7, “Further Advances and Applications in GLM,” discusses several additional topics. These are experimental designs for GLMs, asymptotic results, analysis of screening experiments, data transformation, modeling for both a process mean and variance, and generalized additive models. The material on experimental designs is more discursive than prescriptive and as a result is also somewhat theoretical. Similar comments apply for the discussion on the quality of the asymptotic results, which wallows a little too much in reports on various simulation studies. The examples on screening and data transformations experiments are again reworkings of analyses of familiar industrial examples and another obvious motivation for the enthusiasm that the authors have developed for using the GLM toolkit. One can hope that subsequent editions will similarly contain new examples that will have caused the authors to expand the material on generalized additive models and other topics in this chapter. Designating myself to review a book that I know I will love to read is one of the rewards of being editor. I read both of the editions of McCullagh and Nelder (1989), which was reviewed by Schuenemeyer (1992). That book was not fun to read. The obvious enthusiasm of Myers, Montgomery, and Vining and their reliance on their many examples as a major focus of their pedagogy make Generalized Linear Models a joy to read. Every statistician working in any area of applied science should buy it and experience the excitement of these new approaches to familiar activities.

10,520 citations