Methods to account for spatial autocorrelation in the analysis of species distributional data : a review
Summary (2 min read)
Introduction
- These users are usually not statisticians, and the authors attempt to relate sometimes rather sophisticated methodologies to the desperate analyst.
- What the authors do attempt is a) a decision tree about which spatial autocorrelation modelling method to use when, and b) software implementation aids for these methods.
- The following pages cannot be understood without some advanced statistical knowledge or without the paper this code accompanies.
- Most details on the methods are provided in the main paper, while these pages are primarily for implementing the methods.
Decision tree
- More methods are available for data derived from a normal distribution (data whose residuals are normally distributed) than data of alternative distributional form (e.g. binomial, Poisson).
- Typical examples of ecological data with normally distributed errors include abundance, species richness, or functional diversity per unit area, crop yield and catch per unit effort.
- The second partition refers to computational efficiency.
- Method residuals computational intensity GAM normal, Poisson, binomial low autogressive models (SAR/CAR) normal medium-high GLS normal medium-high GEE normal, Poisson, binomial low autocovariate regression normal, Poisson, binomial low spatial GLMM normal, Poisson, binomial very high Spatial Eigenvector Mapping normal, Poisson, binomial very high 1.1.
- All following analyses are illustrated using data organized as an XYZ-table (or, in the R nomenclature), data.
Plotting/calculating spatial autocorrelation
- Listw <- nb2listw(snouter.nb) #turns neighbourhood object into a weighted list #this next step takes often several minutes to run: GlobMT1.1<- moran.test(residuals(model), listw=snouter.listw).
- Now the authors are set to start with the spatial analysis.
- The best established methods here are autoregressive models and Generalized Linear Models.
- For details on GEE and autocovariate regression models see next section.
Autoregressive Models in R
- Several functions can be invoked for the regression itself, depending on which assumptions are made about the cause of spatial autocorrelation (errorsarlm, lagsarlm, spautolm).
- A comparison of these different autoregressive models is very advisable, either using model selection procedures (e.g. Kissling & Carl 2007) or the Lagrange multiplier test (see SAR below).
Generalized Least Square Models in R (Björn Reineking)
- GLS are fitted using the function gls {nlme} or gls.fit {MASS}.
- Internally, also SAR and CAR methods call one of them.
- Gls {nlme} offers to specify the expected form of autocorrelation in the correlation argument.
- 3. Methods also for non-normally distributed residuals (e.g. Poisson or binomial).
Generalized Estimation Equations in R (Gudrun Carl)
- Two different GEE packages are available in R: gee {gee} and geese {geepack}.
- The following code and helper functions shall aid with the data preparation.
Spatial Generalized Linear Mixed Model in R (Frank Schurr)
- This is an inofficial abuse of a Generalized Linear Mixed Model function (glmmPQL {MASS}), which is a wrapper function for lme {nlme}, which in turn internally calls gls {nlme}.
- This code produces the identical results as an official spatial GLMM in SAS (proc glimmix) and can hence be trusted.
- #For some reason, the data have to be attached AND specified in the formula!.
Did you find this useful? Give us your feedback
Citations
5,249 citations
5,076 citations
Cites background or methods from "Methods to account for spatial auto..."
...Erroneous use of geographic terms to correct for either missing environmental predictors or wrongly specified models is likely to result in poor predictive ability, especially when extrapolating to new regions or times (Dormann et al. 2007, and see below)....
[...]
...Such data have prompted use of mixed models or other methods for dealing with pseudoreplication and spatial autocorrelation (Dormann et al. 2007, and Supplemental Literature Cited)....
[...]
...…residual geographic patterning generally indicates that either key environmental predictors are missing (Leathwick & Whitehead 2001), the model is mis-specified (e.g., only linear terms where nonlinear are required), or geographic factors are influential (Dormann et al. 2007, Miller et al. 2007)....
[...]
...…variables to an environmental model to test for residual spatial structure, or use of LISA (local indicator of spatial autocorrelation) to estimate the contribution of each sampling unit to the overall measure of spatial autocorrelation (Dormann et al. 2007, Miller et al. 2007, Rangel et al. 2006)....
[...]
1,297 citations
Cites background from "Methods to account for spatial auto..."
...It is however, also possible to find evidence supporting interactions among species by considering the spatial structure of the residuals in a single species’ model, although this by itself is not enough to indicate the presence of biotic interactions (Dormann et al., 2007)....
[...]
1,123 citations
References
23,215 citations
19,261 citations
18,346 citations
17,111 citations
"Methods to account for spatial auto..." refers background in this paper
...Liang and Zeger (1986) developed the generalised estimating equation (GEE) approach which is an extension of generalised linear models (GLMs)....
[...]
...Liang and Zeger (1986) developed the generalised estimating equation (GEE) approach which is an extension of generalised linear models (GLMs)....
[...]
10,715 citations
Related Papers (5)
Frequently Asked Questions (12)
Q2. What are the typical examples of ecological data with normally distributed errors?
Typical examples of ecological data with normally distributed errors include abundance, species richness, or functional diversity per unit area, crop yield and catch per unit effort.
Q3. What can be done to achieve the prediction of values within the parameter and spatial range?
i.e. the prediction of values within the parameter and spatial range, can be achieved by several of the presented methods.
Q4. What is the way to decompose a connectivity matrix?
Either binary or distance-based connectivity matrices can be decomposed, offering a great deal of flexibility regarding topology and transformations.
Q5. What are the main reasons for the use of Bayesian methods?
Bayesian methods are also a generally more suitable tool for inference in data sets with many missing values, or when accounting for detection probabilities (Gelfand et al. 2005, Kühn et al. 2006).
Q6. why is the eigenvector extraction limited to 7000 observations?
Due to numerical precision regarding the eigenvector extraction of large matrices (Bai et al. 1996) the method is limited to ca 7000 observations depending on platform and software (but see Griffith 2000a, for solutions based on large binary connectivity matrices).
Q7. What was the weight matrix used to simulate the spatially correlated errors oi?
A weight matrix W was used to simulate the spatially correlated errors oi using weights according to the distance between data points.
Q8. What are the two models that are used to model the error generating process?
CAR and SAR, on the other hand, model the error generating process and operate with weight matrices that specify the strength of interaction between neighbouring sites.
Q9. What are the advantages of Bayesian methods for the analyses of species distribution data?
Bayesian methods for the analyses of species distribution data are more flexible; they can be more easily extended to include more complex structures (Latimer et al. 2006).
Q10. What is the main argument for spatial autocorrelation in species distribution models?
While Lennon (2000) and others (Tognelli and Kelt 2004, Jetz et al. 2005, Dormann 2007b, Kühn 2007) argue that spatial autocorrelation in species distribution models may well bias coefficient estimation, Diniz-Filho et al. (2003) and Hawkins et al. (2007) found non-spatial model to be robust and unbiased for several data sets.
Q11. What are the constraints placed on the variance-covariance matrix?
Some restrictions are placed upon the resulting variance-covariance matrix a: a) it must be symmetric, and b) it must be positive definite.
Q12. What is the argument that the use of spatial parameters at least helps to derive better models?
One might therefore argue that, while taking the autocorrelation structure as constant adds one more assumption, the use of spatial parameters at least helps to derive better models.