Ecological Models and Data in R

Home
/
Papers
/
Ecological Models and Data in R

Book•

Ecological Models and Data in R

21 Jul 2008-

TL;DR: In step-by-step detail, Benjamin Bolker teaches ecology graduate students and researchers everything they need to know in order to use maximum likelihood, information-theoretic, and Bayesian techniques to analyze their own data using the programming language R.

read less

Abstract: Ecological Models and Data in R is the first truly practical introduction to modern statistical methods for ecology. In step-by-step detail, the book teaches ecology graduate students and researchers everything they need to know in order to use maximum likelihood, information-theoretic, and Bayesian techniques to analyze their own data using the programming language R. Drawing on extensive experience teaching these techniques to graduate students in ecology, Benjamin Bolker shows how to choose among and construct statistical models for data, estimate their parameters and confidence limits, and interpret the results. The book also covers statistical frameworks, the philosophy of statistical modeling, and critical mathematical functions and probability distributions. It requires no programming background--only basic calculus and statistics.

...read moreread less

Citations

PDF

Open Access

More filters

Modern Applied Statistics With S

[...]

Christina Gloeckner

01 Jan 2016

TL;DR: The modern applied statistics with s is universally compatible with any devices to read, and is available in the digital library an online access to it is set as public so you can download it instantly.

...read moreread less

Abstract: Thank you very much for downloading modern applied statistics with s. As you may know, people have search hundreds times for their favorite readings like this modern applied statistics with s, but end up in harmful downloads. Rather than reading a good book with a cup of coffee in the afternoon, instead they cope with some harmful virus inside their laptop. modern applied statistics with s is available in our digital library an online access to it is set as public so you can download it instantly. Our digital library saves in multiple countries, allowing you to get the most less latency time to download any of our books like this one. Kindly say, the modern applied statistics with s is universally compatible with any devices to read.

...read moreread less

5,249 citations

Journal Article•DOI•

A brief guide to model selection, multimodel inference and model averaging in behavioural ecology using Akaike’s information criterion.

[...]

Matthew R. E. Symonds¹, Adnan Moussalli²•Institutions (2)

University of Melbourne¹, Museum Victoria²

01 Jan 2011-Behavioral Ecology and Sociobiology

TL;DR: Akaike’s information criterion is provided, using recent examples from the behavioural ecology literature, a simple introductory guide to AIC: what it is, how and when to apply it and what it achieves.

...read moreread less

Abstract: Akaike’s information criterion (AIC) is increasingly being used in analyses in the field of ecology. This measure allows one to compare and rank multiple competing models and to estimate which of them best approximates the “true” process underlying the biological phenomenon under study. Behavioural ecologists have been slow to adopt this statistical tool, perhaps because of unfounded fears regarding the complexity of the technique. Here, we provide, using recent examples from the behavioural ecology literature, a simple introductory guide to AIC: what it is, how and when to apply it and what it achieves. We discuss multimodel inference using AIC—a procedure which should be used where no one model is strongly supported. Finally, we highlight a few of the pitfalls and problems that can be encountered by novice practitioners.

...read moreread less

1,946 citations

Cites background from "Ecological Models and Data in R"

...Although it is not our intention here to cover in detail the criticisms of the IT-AIC approach, one aspect that has been identified as a weakness is that, despite explicitly taking into account the number of predictors, AIC still tends to favour overly complex models (Kass and Raftery 1995; Link and Barker 2006)....
[...]
...…0 and 1, with the sum of Akaike weights of all models in the candidate set being 1, and can be considered as analogous to the probability that a given model is the best approximating model (although there are some who disagree with this, e.g. Link and Barker 2006; Bolker 2008; Richards 2005)....
[...]
...The Akaike weight is a value between 0 and 1, with the sum of Akaike weights of all models in the candidate set being 1, and can be considered as analogous to the probability that a given model is the best approximating model (although there are some who disagree with this, e.g. Link and Barker 2006; Bolker 2008; Richards 2005)....
[...]
...Considerable literature exists discussing the origin, philosophy and application of AIC (e.g. Burnham and Anderson 2001, 2004; Burnham et al. 2010), and criticism of AIC is likewise prevalent (e.g. Guthery et al. 2005; Richards 2005; Stephens et al. 2005; Link and Barker 2006)....
[...]
...The Akaike weight for a given model, i, is calculated from Δi values as: wi ¼ exp 12 Δi PR r¼1 exp 12 Δr The Akaike weight is a value between 0 and 1, with the sum of Akaike weights of all models in the candidate set being 1, and can be considered as analogous to the probability that a given model is the best approximating model (although there are some who disagree with this, e.g. Link and Barker 2006; Bolker 2008; Richards 2005)....
[...]

Journal Article•DOI•

Multivariate Analysis of Ecological Data

[...]

Jan Lepš¹, Petr Šmilauer¹•Institutions (1)

Sewanee: The University of the South¹

01 Jul 2006-Bulletin of The Ecological Society of America

TL;DR: In this paper, the authors present a study material for the participants of the course named Multivariate Analysis of Ecological Data that we teach at our university for the third year, which provides an easy-to-read supplement for the more exact and detailed publications like the collection of the Dr. Ter Braak' papers and the Canoco for Windows 4.0 manual.

...read moreread less

Abstract: 2 Foreword This textbook provides study materials for the participants of the course named Multivariate Analysis of Ecological Data that we teach at our university for the third year. Material provided here should serve both for the introductory and the advanced versions of the course. We admit that some parts of the text would profit from further polishing, they are quite rough but we hope in further improvement of this text. We hope that this book provides an easy-to-read supplement for the more exact and detailed publications like the collection of the Dr. Ter Braak' papers and the Canoco for Windows 4.0 manual. In addition to the scope of these publications, this textbook adds information on the classification methods of the multivariate data analysis and introduces some of the modern regression methods most useful in the ecological research. Wherever we refer to some commercial software products, these are covered by trademarks or registered marks of their respective producers. This publication is far from being final and this is seen on its quality: some issues appear repeatedly through the book, but we hope this provides, at least, an opportunity to the reader to see the same topic expressed in different words.

...read moreread less

1,870 citations

Journal Article•DOI•

The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded.

[...]

Shinichi Nakagawa¹, Shinichi Nakagawa², Paul C. D. Johnson³, Holger Schielzeth⁴•Institutions (4)

University of New South Wales¹, Garvan Institute of Medical Research², University of Glasgow³, University of Jena⁴

01 Sep 2017-Journal of the Royal Society Interface

TL;DR: This paper generalizes the methods called for Poisson and binomial GLMMs to all other non-Gaussian distributions, in particular to negative binomial and gamma distributions that are commonly used for modelling biological data and can be used across disciplines and regardless of statistical environments.

...read moreread less

Abstract: The coefficient of determination R2 quantifies the proportion of variance explained by a statistical model and is an important summary statistic of biological interest. However, estimating R2 for g...

...read moreread less

1,389 citations

Journal Article•DOI•

Solving Differential Equations in R: Package deSolve

[...]

Karline Soetaert, Thomas Petzoldt, R. Woodrow Setzer

23 Feb 2010-Journal of Statistical Software

TL;DR: Comparisons demonstrate that, if the use of loops is avoided, R code can efficiently integrate problems comprising several thousands of state variables, and the same problem may be solved from 2 to more than 50 times faster by using compiled code compared to an implementation using only R code.

...read moreread less

Abstract: In this paper we present the R package deSolve to solve initial value problems (IVP) written as ordinary differential equations (ODE), differential algebraic equations (DAE) of index 0 or 1 and partial differential equations (PDE), the latter solved using the method of lines approach. The differential equations can be represented in R code or as compiled code. In the latter case, R is used as a tool to trigger the integration and post-process the results, which facilitates model development and application, whilst the compiled code significantly increases simulation speed. The methods implemented are efficient, robust, and well documented public-domain Fortran routines. They include four integrators from the ODEPACK package (LSODE, LSODES, LSODA, LSODAR), DVODE and DASPK2.0. In addition, a suite of Runge-Kutta integrators and special-purpose solvers to efficiently integrate 1-, 2- and 3-dimensional partial differential equations are available. The routines solve both stiff and non-stiff systems, and include many options, e.g., to deal in an efficient way with the sparsity of the Jacobian matrix, or finding the root of equations. In this article, our objectives are threefold: (1) to demonstrate the potential of using R for dynamic modeling, (2) to highlight typical uses of the different methods implemented and (3) to compare the performance of models specified in R code and in compiled code for a number of test cases. These comparisons demonstrate that, if the use of loops is avoided, R code can efficiently integrate problems comprising several thousands of state variables. Nevertheless, the same problem may be solved from 2 to more than 50 times faster by using compiled code compared to an implementation using only R code. Still, amongst the benefits of R are a more flexible and interactive implementation, better readability of the code, and access to R’s high-level procedures. deSolve is the successor of package odesolve which will be deprecated in the future; it is free software and distributed under the GNU General Public License, as part of the R software project.

...read moreread less

1,264 citations

Cites background from "Ecological Models and Data in R"

...An increasing number of textbooks deal with the subject (Ellner and Guckenheimer 2006; Bolker 2008; Soetaert and Herman 2009; Stevens 2009)....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Optimization by Simulated Annealing

[...]

Scott Kirkpatrick¹, C. D. Gelatt¹, Mario P. Vecchi²•Institutions (2)

IBM¹, Venezuelan Institute for Scientific Research²

13 May 1983-Science

TL;DR: There is a deep and useful connection between statistical mechanics and multivariate or combinatorial optimization (finding the minimum of a given function depending on many parameters), and a detailed analogy with annealing in solids provides a framework for optimization of very large and complex systems.

...read moreread less

Abstract: There is a deep and useful connection between statistical mechanics (the behavior of systems with many degrees of freedom in thermal equilibrium at a finite temperature) and multivariate or combinatorial optimization (finding the minimum of a given function depending on many parameters). A detailed analogy with annealing in solids provides a framework for optimization of the properties of very large and complex systems. This connection to statistical mechanics exposes new information and provides an unfamiliar perspective on traditional optimization problems and methods.

...read moreread less

41,772 citations

Journal Article•DOI•

Bayesian measures of model complexity and fit

[...]

David Spiegelhalter¹, Nicola G. Best², Bradley P. Carlin³, Angelika van der Linde⁴•Institutions (4)

Medical Research Council¹, Imperial College London², University of Minnesota³, University of Bremen⁴

01 Oct 2002-Journal of The Royal Statistical Society Series B-statistical Methodology

TL;DR: In this paper, the authors consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined and derive a measure pD for the effective number in a model as the difference between the posterior mean of the deviances and the deviance at the posterior means of the parameters of interest, which is related to other information criteria and has an approximate decision theoretic justification.

...read moreread less

Abstract: Summary. We consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined. Using an information theoretic argument we derive a measure pD for the effective number of parameters in a model as the difference between the posterior mean of the deviance and the deviance at the posterior means of the parameters of interest. In general pD approximately corresponds to the trace of the product of Fisher's information and the posterior covariance, which in normal models is the trace of the ‘hat’ matrix projecting observations onto fitted values. Its properties in exponential families are explored. The posterior mean deviance is suggested as a Bayesian measure of fit or adequacy, and the contributions of individual observations to the fit and complexity can give rise to a diagnostic plot of deviance residuals against leverages. Adding pD to the posterior mean deviance gives a deviance information criterion for comparing models, which is related to other information criteria and has an approximate decision theoretic justification. The procedure is illustrated in some examples, and comparisons are drawn with alternative Bayesian and classical proposals. Throughout it is emphasized that the quantities required are trivial to compute in a Markov chain Monte Carlo analysis.

...read moreread less

11,691 citations

Journal Article•DOI•

Numerical Recipes in C: The Art of Scientific Computing

[...]

Mary C. Seiler, Fritz A. Seiler

01 Sep 1989-Risk Analysis

11,285 citations

"Ecological Models and Data in R" refers methods in this paper

...Various modifications of Newton’s method mitigate some of these problems (Press et al., 1994), and other methods called“quasi-Newton”methods use the general idea of calculating derivatives to iteratively approximate the root of the derivatives....
[...]
...The classic stochastic optimization algorithm is the Metropolis algorithm (or simulated annealing (Kirkpatrick et al., 1983; Press et al., 1994)....
[...]
...method, which is a combination of golden section search and parabolic interpolation (Press et al., 1994)....
[...]

Book•

Mixed-Effects Models in S and S-PLUS

[...]

Josae C. Pinheiro, Douglas M. Bates

29 Mar 2013

TL;DR: Linear Mixed-Effects and Nonlinear Mixed-effects (NLME) models have been studied in the literature as mentioned in this paper, where the structure of grouped data has been used for fitting LME models.

...read moreread less

Abstract: Linear Mixed-Effects * Theory and Computational Methods for LME Models * Structure of Grouped Data * Fitting LME Models * Extending the Basic LME Model * Nonlinear Mixed-Effects * Theory and Computational Methods for NLME Models * Fitting NLME Models

...read moreread less

10,715 citations

"Ecological Models and Data in R" refers background in this paper

...There are a growing number of introductory books using R (Dalgaard, 2003; Verzani, 2005; Crawley, 2005), books of examples (Maindonald and Braun, 2003; Heiberger and Holland, 2004; Everitt and Hothorn, 2006), more advanced and encyclopedic books covering a range of statistical approaches (Venables and Ripley, 2002; Crawley, 2002), and books on specific topics such as regression analysis Fox (2002); Faraway (2004), mixed-effect models (Pinheiro and Bates, 2000), phylogenetics (Paradis, 2006), generalized additive models (Wood, 2006), etc....
[...]
... The distribution of deviances may not be an equal mixture of χ(2)n and χ(2)n−1 (Pinheiro and Bates, 2000)....
[...]
...) as well as fixed effects (effects of covariates) (Pinheiro and Bates, 2000)....
[...]
...The other limitation of the LRT that frequently arises, although it is often ignored, is that it only works when the best estimate of the parameter is not on the edge of its allowable range (Pinheiro and Bates, 2000)....
[...]

Journal Article•DOI•

Generalized Linear Models

[...]

Eric R. Ziegel

01 Aug 2002-Technometrics

TL;DR: This is the rst book on generalized linear models written by authors not mostly associated with the biological sciences, and it is thoroughly enjoyable to read.

...read moreread less

Abstract: This is the rst book on generalized linear models written by authors not mostly associated with the biological sciences. Subtitled “With Applications in Engineering and the Sciences,” this book’s authors all specialize primarily in engineering statistics. The rst author has produced several recent editions of Walpole, Myers, and Myers (1998), the last reported by Ziegel (1999). The second author has had several editions of Montgomery and Runger (1999), recently reported by Ziegel (2002). All of the authors are renowned experts in modeling. The rst two authors collaborated on a seminal volume in applied modeling (Myers and Montgomery 2002), which had its recent revised edition reported by Ziegel (2002). The last two authors collaborated on the most recent edition of a book on regression analysis (Montgomery, Peck, and Vining (2001), reported by Gray (2002), and the rst author has had multiple editions of his own regression analysis book (Myers 1990), the latest of which was reported by Ziegel (1991). A comparable book with similar objectives and a more speci c focus on logistic regression, Hosmer and Lemeshow (2000), reported by Conklin (2002), presumed a background in regression analysis and began with generalized linear models. The Preface here (p. xi) indicates an identical requirement but nonetheless begins with 100 pages of material on linear and nonlinear regression. Most of this will probably be a review for the readers of the book. Chapter 2, “Linear Regression Model,” begins with 50 pages of familiar material on estimation, inference, and diagnostic checking for multiple regression. The approach is very traditional, including the use of formal hypothesis tests. In industrial settings, use of p values as part of a risk-weighted decision is generally more appropriate. The pedagologic approach includes formulas and demonstrations for computations, although computing by Minitab is eventually illustrated. Less-familiar material on maximum likelihood estimation, scaled residuals, and weighted least squares provides more speci c background for subsequent estimation methods for generalized linear models. This review is not meant to be disparaging. The authors have packed a wealth of useful nuggets for any practitioner in this chapter. It is thoroughly enjoyable to read. Chapter 3, “Nonlinear Regression Models,” is arguably less of a review, because regression analysis courses often give short shrift to nonlinear models. The chapter begins with a great example on the pitfalls of linearizing a nonlinear model for parameter estimation. It continues with the effective balancing of explicit statements concerning the theoretical basis for computations versus the application and demonstration of their use. The details of maximum likelihood estimation are again provided, and weighted and generalized regression estimation are discussed. Chapter 4 is titled “Logistic and Poisson Regression Models.” Logistic regression provides the basic model for generalized linear models. The prior development for weighted regression is used to motivate maximum likelihood estimation for the parameters in the logistic model. The algebraic details are provided. As in the development for linear models, some of the details are pushed into an appendix. In addition to connecting to the foregoing material on regression on several occasions, the authors link their development forward to their following chapter on the entire family of generalized linear models. They discuss score functions, the variance-covariance matrix, Wald inference, likelihood inference, deviance, and overdispersion. Careful explanations are given for the values provided in standard computer software, here PROC LOGISTIC in SAS. The value in having the book begin with familiar regression concepts is clearly realized when the analogies are drawn between overdispersion and nonhomogenous variance, or analysis of deviance and analysis of variance. The authors rely on the similarity of Poisson regression methods to logistic regression methods and mostly present illustrations for Poisson regression. These use PROC GENMOD in SAS. The book does not give any of the SAS code that produces the results. Two of the examples illustrate designed experiments and modeling. They include discussion of subset selection and adjustment for overdispersion. The mathematic level of the presentation is elevated in Chapter 5, “The Family of Generalized Linear Models.” First, the authors unify the two preceding chapters under the exponential distribution. The material on the formal structure for generalized linear models (GLMs), likelihood equations, quasilikelihood, the gamma distribution family, and power functions as links is some of the most advanced material in the book. Most of the computational details are relegated to appendixes. A discussion of residuals returns one to a more practical perspective, and two long examples on gamma distribution applications provide excellent guidance on how to put this material into practice. One example is a contrast to the use of linear regression with a log transformation of the response, and the other is a comparison to the use of a different link function in the previous chapter. Chapter 6 considers generalized estimating equations (GEEs) for longitudinal and analogous studies. The rst half of the chapter presents the methodology, and the second half demonstrates its application through ve different examples. The basis for the general situation is rst established using the case with a normal distribution for the response and an identity link. The importance of the correlation structure is explained, the iterative estimation procedure is shown, and estimation for the scale parameters and the standard errors of the coef cients is discussed. The procedures are then generalized for the exponential family of distributions and quasi-likelihood estimation. Two of the examples are standard repeated-measures illustrations from biostatistical applications, but the last three illustrations are all interesting reworkings of industrial applications. The GEE computations in PROC GENMOD are applied to account for correlations that occur with multiple measurements on the subjects or restrictions to randomizations. The examples show that accounting for correlation structure can result in different conclusions. Chapter 7, “Further Advances and Applications in GLM,” discusses several additional topics. These are experimental designs for GLMs, asymptotic results, analysis of screening experiments, data transformation, modeling for both a process mean and variance, and generalized additive models. The material on experimental designs is more discursive than prescriptive and as a result is also somewhat theoretical. Similar comments apply for the discussion on the quality of the asymptotic results, which wallows a little too much in reports on various simulation studies. The examples on screening and data transformations experiments are again reworkings of analyses of familiar industrial examples and another obvious motivation for the enthusiasm that the authors have developed for using the GLM toolkit. One can hope that subsequent editions will similarly contain new examples that will have caused the authors to expand the material on generalized additive models and other topics in this chapter. Designating myself to review a book that I know I will love to read is one of the rewards of being editor. I read both of the editions of McCullagh and Nelder (1989), which was reviewed by Schuenemeyer (1992). That book was not fun to read. The obvious enthusiasm of Myers, Montgomery, and Vining and their reliance on their many examples as a major focus of their pedagogy make Generalized Linear Models a joy to read. Every statistician working in any area of applied science should buy it and experience the excitement of these new approaches to familiar activities.

...read moreread less

10,520 citations