scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Methods to account for spatial autocorrelation in the analysis of species distributional data : a review

TL;DR: In this paper, the authors describe six different statistical approaches to infer correlates of species distributions, for both presence/absence (binary response) and species abundance data (poisson or normally distributed response), while accounting for spatial autocorrelation in model residuals: autocovariate regression; spatial eigenvector mapping; generalised least squares; (conditional and simultaneous) autoregressive models and generalised estimating equations.
Abstract: Species distributional or trait data based on range map (extent-of-occurrence) or atlas survey data often display spatial autocorrelation, i.e. locations close to each other exhibit more similar values than those further apart. If this pattern remains present in the residuals of a statistical model based on such data, one of the key assumptions of standard statistical analyses, that residuals are independent and identically distributed (i.i.d), is violated. The violation of the assumption of i.i.d. residuals may bias parameter estimates and can increase type I error rates (falsely rejecting the null hypothesis of no effect). While this is increasingly recognised by researchers analysing species distribution data, there is, to our knowledge, no comprehensive overview of the many available spatial statistical methods to take spatial autocorrelation into account in tests of statistical significance. Here, we describe six different statistical approaches to infer correlates of species’ distributions, for both presence/absence (binary response) and species abundance data (poisson or normally distributed response), while accounting for spatial autocorrelation in model residuals: autocovariate regression; spatial eigenvector mapping; generalised least squares; (conditional and simultaneous) autoregressive models and generalised estimating equations. A comprehensive comparison of the relative merits of these methods is beyond the scope of this paper. To demonstrate each method’s implementation, however, we undertook preliminary tests based on simulated data. These preliminary tests verified that most of the spatial modeling techniques we examined showed good type I error control and precise parameter estimates, at least when confronted with simplistic simulated data containing

Summary (2 min read)

Introduction

  • These users are usually not statisticians, and the authors attempt to relate sometimes rather sophisticated methodologies to the desperate analyst.
  • What the authors do attempt is a) a decision tree about which spatial autocorrelation modelling method to use when, and b) software implementation aids for these methods.
  • The following pages cannot be understood without some advanced statistical knowledge or without the paper this code accompanies.
  • Most details on the methods are provided in the main paper, while these pages are primarily for implementing the methods.

Decision tree

  • More methods are available for data derived from a normal distribution (data whose residuals are normally distributed) than data of alternative distributional form (e.g. binomial, Poisson).
  • Typical examples of ecological data with normally distributed errors include abundance, species richness, or functional diversity per unit area, crop yield and catch per unit effort.
  • The second partition refers to computational efficiency.
  • Method residuals computational intensity GAM normal, Poisson, binomial low autogressive models (SAR/CAR) normal medium-high GLS normal medium-high GEE normal, Poisson, binomial low autocovariate regression normal, Poisson, binomial low spatial GLMM normal, Poisson, binomial very high Spatial Eigenvector Mapping normal, Poisson, binomial very high 1.1.
  • All following analyses are illustrated using data organized as an XYZ-table (or, in the R nomenclature), data.

Plotting/calculating spatial autocorrelation

  • Listw <- nb2listw(snouter.nb) #turns neighbourhood object into a weighted list #this next step takes often several minutes to run: GlobMT1.1<- moran.test(residuals(model), listw=snouter.listw).
  • Now the authors are set to start with the spatial analysis.
  • The best established methods here are autoregressive models and Generalized Linear Models.
  • For details on GEE and autocovariate regression models see next section.

Autoregressive Models in R

  • Several functions can be invoked for the regression itself, depending on which assumptions are made about the cause of spatial autocorrelation (errorsarlm, lagsarlm, spautolm).
  • A comparison of these different autoregressive models is very advisable, either using model selection procedures (e.g. Kissling & Carl 2007) or the Lagrange multiplier test (see SAR below).

Generalized Least Square Models in R (Björn Reineking)

  • GLS are fitted using the function gls {nlme} or gls.fit {MASS}.
  • Internally, also SAR and CAR methods call one of them.
  • Gls {nlme} offers to specify the expected form of autocorrelation in the correlation argument.
  • 3. Methods also for non-normally distributed residuals (e.g. Poisson or binomial).

Generalized Estimation Equations in R (Gudrun Carl)

  • Two different GEE packages are available in R: gee {gee} and geese {geepack}.
  • The following code and helper functions shall aid with the data preparation.

Spatial Generalized Linear Mixed Model in R (Frank Schurr)

  • This is an inofficial abuse of a Generalized Linear Mixed Model function (glmmPQL {MASS}), which is a wrapper function for lme {nlme}, which in turn internally calls gls {nlme}.
  • This code produces the identical results as an official spatial GLMM in SAS (proc glimmix) and can hence be trusted.
  • #For some reason, the data have to be attached AND specified in the formula!.

Did you find this useful? Give us your feedback

Figures (5)

Content maybe subject to copyright    Report

Methods to account for spatial autocorrelation in the analysis
of species distributional data: a review
Carsten F. Dormann, Jana M. McPherson, Miguel B. Arau
´
jo, Roger Bivand, Janine Bolliger,
Gudrun Carl, Richard G. Davies, Alexandre Hirzel, Walter Jetz, W. Daniel Kissling,
Ingolf Ku
¨
hn, Ralf Ohlemu
¨
ller, Pedro R. Peres-Neto, Bjo
¨
rn Reineking, Boris Schro
¨
der,
Frank M. Schurr and Robert Wilson
C. F. Dormann (carsten.dormann@ufz.de), Dept of Computational Landscape Ecology, UFZ Helmholtz Centre for Environmental
Research, Permoserstr. 15, DE-04318 Leipzig, Germany. J. M. McPherson, Dept of Biology, Dalhousie Univ., 1355 Oxford Street
Halifax NS, B3H 4J1 Canada. M. B. Arau
´
jo, Dept de Biodiversidad y Biologı
´
a Evolutiva, Museo Nacional de Ciencias Naturales,
CSIC, C/ Gutie
´
rrez Abascal, 2, ES-28006 Madrid, Spain, and Centre for Macroecology, Inst. of Biology, Universitetsparken 15, DK-
2100 Copenhagen Ø, Denmark. R. Bivand, Economic Geography Section, Dept of Economics, Norwegian School of Economics and
Business Administration, Helleveien 30, NO-5045 Bergen, Norway. J. Bolliger, Swiss Federal Research Inst. WSL, Zu
¨
rcherstrasse
111, CH-8903 Birmensdorf, Switzerland. G. Carl and I. Ku
¨
hn, Dept of Community Ecology (BZF), UFZ Helmholtz Centre for
Environmental Research, Theodor-Lieser-Strasse 4, DE-06120 Halle, Germany, and Virtual Inst. Macroecology, Theodor-Lieser-
Strasse 4, DE-06120 Halle, Germany. R. G. Davies, Biodiversity and Macroecology Group, Dept of Animal and Plant Sciences,
Univ. of Sheffield, Sheffield S10 2TN, U.K. A. Hirzel, Ecology and Evolution Dept, Univ. de Lausanne, Biophore Building, CH-
1015 Lausanne, Switzerland. W. Jetz, Ecology Behavior and Evolution Section, Div. of Biological Sciences, Univ. of California, San
Diego, 9500 Gilman Drive, MC 0116, La Jolla, CA 92093-0116, USA. W. D. Kissling, Community and Macroecology Group,
Inst. of Zoology, Dept of Ecology, Johannes Gutenberg Univ. of Mainz, DE-55099 Mainz, Germany, and Virtual Inst. Macroecology,
Theodor-Lieser-Strasse 4, DE-06120 Halle, Germany. R. Ohlemu
¨
ller, Dept of Biology, Univ. of York, PO Box 373, York YO10
5YW, U.K. P. R. Peres-Neto, Dept of Biology, Univ. of Regina, SK, S4S 0A2 Canada, present address: Dept of Biological Sciences,
Univ. of Quebec at Montreal, CP 8888, Succ. Centre Ville, Montreal, QC, H3C 3P8, Canada. B. Reineking, Forest Ecology, ETH
Zurich CHN G 75.3, Universita
¨
tstr. 16, CH-8092 Zu
¨
rich, Switzerland. B. Schro
¨
der, Inst. for Geoecology, Univ. of Potsdam, Karl-
Liebknecht-Strasse 24-25, DE-14476 Potsdam, Germany. F. M. Schurr, Plant Ecology and Nature Conservation, Inst. of
Biochemistry and Biology, Univ. of Potsdam, Maulbeerallee 2, DE-14469 Potsdam, Germany. R. Wilson, A
´
rea de Biodiversidad y
Conservacio
´
n, Escuela Superior de Ciencias Experimentales y Tecnologı
´
a, Univ. Rey Juan Carlos, Tulipa
´
n s/n, Mo
´
stoles, ES-28933
Madrid, Spain.
Species distributional or trait data based on range map (extent-of-occurrence) or atlas survey data often display
spatial autocorrelation, i.e. locations close to each other exhibit more similar values than those further apart. If
this pattern remains present in the residuals of a statistical model based on such data, one of the key assumptions
of standard statistical analyses, that residuals are independent and identically distributed (i.i.d), is violated. The
violation of the assumption of i.i.d. residuals may bias parameter estimates and can increase type I error rates
(falsely rejecting the null hypothesis of no effect). While this is increasingly recognised by researchers analysing
species distribution data, there is, to our knowledge, no comprehensive overview of the many available spatial
statistical methods to take spatial autocorrelation into account in tests of statistical significance. Here, we
describe six different statistical approaches to infer correlates of species’ distributions, for both presence/absence
(binary response) and species abundance data (poisson or normally distributed response), while accounting for
spatial autocorrelation in model residuals: autocovariate regression; spatial eigenvector mapping; generalised
least squares; (conditional and simultaneous) autoregressive models and generalised estimating equations. A
comprehensive comparison of the relative merits of these methods is beyond the scope of this paper. To
demonstrate each method’s implementation, however, we undertook preliminary tests based on simulated data.
These preliminary tests verified that most of the spatial modeling techniques we examined showed good type I
error control and precise parameter estimates, at least when confronted with simplistic simulated data containing
Ecography 30: 609628, 2007
doi: 10.1111/j.2007.0906-7590.05171.x
# 2007 The Authors. Journal compilation # 2007 Ecography
Subject Editor: Carsten Rahbek. Accepted 3 August 2007
609

spatial autocorrelation in the errors. However, we found that for presence/absence data the results and
conclusions were very variable between the different methods. This is likely due to the low information content
of binary maps. Also, in contrast with previous studies, we found that autocovariate methods consistently
underestimated the effects of environmental controls of species distributions. Given their widespread use, in
particular for the modelling of species presence/absence data (e.g. climate envelope models), we argue that this
warrants further study and caution in their use. To aid other ecologists in making use of the methods described,
code to implement them in freely available software is provided in an electronic appendix.
Species distributional data such as species range maps
(extent-of-occurrence), breeding bird surveys and bio-
diversity atlases are a common source for analyses of
species-environment relationships. These, in turn, form
the basis for conservation and management plans for
endangered species, for calculating distributions under
future climate and land-use scenarios and other forms
of environmental risk assessment.
The analysis of spatial data is complicated by a
phenomenon known as spatial autocorrelation. Spatial
autocorrelation (SAC) occurs when the values of vari-
ables sampled at nearby locations are not independent
from each other (Tobler 1970). The causes of spatial
autocorrelation are manifold, but three factors are
particularly common (Legendre and Fortin 1989,
Legendre 1993, Legendre and Legendre 1998): 1)
biological processes such as speciation, extinction,
dispersal or species interactions are distance-related; 2)
non-linear relationships between environment and spe-
cies are modelled erroneously as linear; 3) the statistical
model fails to account for an important environmental
determinant that in itself is spatially structured and thus
causes spatial structuring in the response (Besag 1974).
The second and third points are not always referred to as
spatial autocorrelation, but rather spatial dependency
(Legendre et al. 2002). Since they also lead to auto-
correlated residuals, these are equally problematic. A
fourth source of spatial autocorrelation relates to spatial
resolution, because coarser grains lead to a spatial
smoothing of data. In all of these cases, SAC may
confound the analysis of species distribution data.
Spatial autocorrelation may be seen as both an
opportunity and a challenge for spatial analysis. It is an
opportunity when it provides useful information for
inference of process from pattern (Palma et al. 1999)
by, for example, increasing our understanding of
contagious biotic processes such as population growth,
geographic dispersal, differential mortality, social
organization or competition dynamics (Griffith and
Peres-Neto 2006). In most cases, however, the presence
of spatial autocorrelation is seen as posing a serious
shortcoming for hypothesis testing and prediction
(Lennon 2000, Dormann 2007b), because it violates
the assumption of independently and identically dis-
tributed (i.i.d.) errors of most standard statistical
procedures (Anselin 2002) and hence inflates type I
errors, occasionally even inverting the slope of relation-
ships from non-spatial analysis (Ku
¨
hn 2007).
A variety of methods have consequently been devel-
oped to correct for the effects of spatial autocorrelation
(partially reviewed in Keitt et al. 2002, Miller et al. 2007,
see below), but only a few have made it into the
ecological literature. The aims of this paper are to 1)
present and explain methods that account for spatial
autocorrelation in analyses of spatial data; the app-
roaches considered are: autocovariate regression, spatial
eigenvector mapping (SEVM), generalised least squares
(GLS), conditional autoregressive models (CAR), simul-
taneous autoregressive models (SAR), generalised linear
mixed models (GLMM) and generalised estimation
equations (GEE); 2) describe which of these methods
can be used for which error distribution, and discuss
potential problems with implementation; 3) illustrate
how to implement these methods using simulated data
sets and by providing computing code (Anon. 2005).
Methods for dealing with spatial
autocorrelation
Detecting and quantifying spatial autocorrelation
Before considering the use of modelling methods that
account for spatial autocorrelation, it is a sensible first
step to check whether spatial autocorrelation is in fact
likely to impact the planned analyses, i.e. if model
residuals indeed display spatial autocorrelation. Check-
ing for spatial autocorrelation (SAC) has become a
commonplace exercise in geography and ecology (Sokal
and Oden 1978a, b, Fortin and Dale 2005). Established
procedures include (Isaaks and Shrivastava 1989, Perry
et al. 2002): Moran’s I plots (also termed Moran’s I
correlogram by Legendre and Legendre 1998), Geary’s
c correlograms and semi-variograms. In all three cases a
measure of similarity (Moran’s I, Geary’s c) or variance
(variogram) of data points (i and j) is plotted as a
function of the distance between them (d
ij
). Distances
are usually grouped into bins. Moran’s I-based correlo-
grams typically show a decrease from some level of SAC
to a value of 0 (or below; expected value in the absence
of SAC: E(I)1/(n1), where nsample size),
indicating no SAC at some distance between locations.
Variograms depict the opposite, with the variance
610

between pairs of points increasing up to a certain
distance, where variance levels off. Variograms are more
commonly employed in descriptive geostatistics, while
correlograms are the prevalent graphical presentation in
ecology (Fortin and Dale 2005).
Values of Moran’s I are assessed by a test statistic
(the Moran’s I standard deviate) which indicates the
statistical significance of SAC in e.g. model residuals.
Additionally, model residuals may be plotted as a map
that more explicitly reveals particular patterns of spatial
autocorrelation (e.g. anisotropy or non-stationarity of
spatial autocorrelation). For further details and for-
mulae see e.g. Isaaks and Shrivastava (1989) or Fortin
and Dale (2005).
Assumptions common to all modelling
approaches considered
All methods assume spatial stationarity, i.e. spatial
autocorrelation and effects of environmental correlates
to be constant across the region, and there are very few
methods to deal with non-stationarity (Osborne et al.
2007). Stationarity may or may not be a reasonable
assumption, depending, among other things, on the
spatial extent of the study. If the main cause of spatial
autocorrelation is dispersal (for example in research on
animal distributions), stationarity is likely to be
violated, for example when moving from a floodplain
to the mountains, where movement may be more
restricted. One method able to accommodate spatial
variation in autocorrelation is geographically weighted
regression (Fotheringham et al. 2002), a method not
considered here because of its limited use for hypothesis
testing (coefficient estimates depend on spatial position)
and because it was not designed to remove spatial
autocorrelation (see e.g. Kupfer and Farris 2007, for a
GWR correlogram).
Another assumption is that of isotropic spatial
autocorrelation. This means that the process causing
the spatial autocorrelation acts in the same way in all
directions. Environmental factors that may cause
anisotropic spatial autocorrelation are wind (giving a
wind-dispersed organism a preferential direction), water
currents (e.g. carrying plankton), or directionality in
soil transport (carrying seeds) from mountains to plains.
He et al. (2003) as well as Worm et al. (2005) provide
examples of analyses accounting for anisotropy in
ecological data, and several of the methods described
below can be adapted for such circumstances.
Description of spatial statistical modelling
methods
The methods we describe in the following fall broadly
into three groups. 1) Autocovariate regression and
spatial eigenvector mapping seek to capture the spatial
configuration in additional covariates, which are then
added into a generalised linear model (GLM). 2)
Generalised least squares (GLS) methods fit a var-
iance-covariance matrix based on the non-independence
of spatial observations. Simultaneous autoregressive
models (SAR) and conditional autoregressive models
(CAR) do the same but in different ways to GLS, and
the generalised linear mixed models (GLMM) we
employ for non-normal data are a generalisation of
GLS. 3) Generalised estimating equations (GEE) split
the data into smaller clusters before also modelling the
variance-covariance relationship. For comparison, the
following non-spatial models were also employed:
simple GLM and trend-surface generalised additive
models (GAM: Hastie and Tibshirani 1990, Wood
2006), in which geographical location was fitted using
splines as a trend-surface (as a two-dimensional spline
on geographical coordinates). Trend surface GAM does
not address the problem of spatial autocorrelation, but
merely accounts for trends in the data across larger
geographical distances (Cressie 1993). A promising tool
which became available only recently is the use of
wavelets to remove spatial autocorrelation (Carl and
Ku
¨
hn 2007b). However, the method was published too
recently to be included here and hence awaits further
testing.
We also did not include Bayesian spatial models in
this review. Several recent publications have employed
this method and provide a good coverage of its
implementation (Osborne et al. 2001, Hooten et al.
2003, Thogmartin et al. 2004, Gelfand et al. 2005,
Ku
¨
hn et al. 2006, Latimer et al. 2006). The Bayesian
approach to spatial models used in these studies is based
either on a CAR or an autologistic implementation
similar to the one we used as a frequentist method. The
Bayesian framework allows for a more flexible incor-
poration of other complications (observer bias, missing
data, different error distributions) but is much more
computer-intensive then any of the methods presented
here.
Beyond the methods mentioned above, there are
also those which correct test statistics for spatial auto-
correlation. These include Dutilleul’s modified t-test
(Dutilleul 1993) or the CRH-correction for correla-
tions (Clifford et al. 1989), randomisation tests such as
partial Mantel tests (Legendre and Legendre 1998), or
strategies employed by Lennon (2000), Liebhold and
Gurevitch (2002) and Segurado et al. (2006) which are
all useful as a robust assessment of correlation between
environmental and response variables. As these methods
do not allow a correction of the parameter estimates,
however, they are not considered further in this study.
In the following sections we present a detailed descrip-
tion of all methods employed here.
611

1. Autocovariate models
Autocovariate models address spatial autocorrelation by
estimating how much the response variable at any one
site reflects response values at surrounding sites. This is
achieved through a simple extension of generalised
linear models by adding a distance-weighted function of
neighbouring response values to the model’s explana-
tory variables. This extra parameter is known as the
autocovariate. The autocovariate is intended to capture
spatial autocorrelation originating from endogenous
processes such as conspecific attraction, limited dis-
persal, contagious population growth, and movement
of censused individuals between sampling sites (Smith
1994, Keitt et al. 2002, Yamaguchi et al. 2003).
Adding the autocovariate transforms the linear
predictor of a generalised linear model from its usual
form, yXbo,toyXbrAo, where b is a
vector of coefficients for intercept and explana-
tory variables X;andr is the coefficient of the autoco-
variate A.
A at any site i may be calculated as:
A
i
X
j k
i
w
ij
y
j
(the weighted sum) or
A
i
X
j k
i
w
ij
y
j
X
j k
i
w
ij
(the weighted average);
where y
j
is the response value of y at site j among site i’s
set of k
i
neighbours; and w
ij
is the weight given to site
j’s influence over site i (Augustin et al. 1996, Gumpertz
et al. 1997). Usually, weight functions are related to
geographical distance between data points (Augustin
et al. 1996, Arau
´
jo and Williams 2000, Osborne et al.
2001, Brownstein et al. 2003) or environmental
distance (Augustin et al. 1998, Ferrier et al. 2002).
The weighting scheme and neighbourhood size (k) are
often chosen arbitrarily, but may be optimised (by trial
and error) to best capture spatial autocorrelation
(Augustin et al. 1996). Alternatively, if the cause of
spatial autocorrelation is known (or at least suspected),
the choice of neighbourhood configuration may be
informed by biological parameters, such as the species’
dispersal capacity (Knapp et al. 2003).
Autocovariate models can be applied to binomial
data (‘‘autologistic regression’’, Smith 1994, Augustin
et al. 1996, Klute et al. 2002, Knapp et al. 2003), as
well as normally and Poisson-distributed data (Luoto
et al. 2001, Kaboli et al. 2006).
Where spatial autocorrelation is thought to be
anisotropic (e.g. because seed dispersal follows prevail-
ing winds or downstream run-off), multiple autoco-
variates can be used to capture spatial autocorrelation in
different geographic directions (He et al. 2003).
2. Spatial eigenvector mapping (SEVM)
Spatial eigenvector mapping is based on the idea that
the spatial arrangement of data points can be translated
into explanatory variables, which capture spatial effects
at different spatial resolutions. During the analysis,
those eigenvectors that reduce spatial autocorrelation in
the residuals best are chosen explicitly as spatial
predictors. Since each eigenvector represents a particu-
lar spatial patterning, SAC is effectively allowed to vary
in space, relaxing the assumption of both spatial
isotropy and stationarity. Plotting these eigenvectors
reveals the spatial patterning of the spatial autocorrela-
tion (see Diniz-Filho and Bini 2005, for an example).
This method could thus be very useful for data with
SAC stemming from larger scale observation bias.
The method is based on the eigenfunction decom-
position of spatial connectivity matrices, a relatively
new and still unfamiliar method for describing spatial
patterns in complex data (Griffith 2000b, Borcard and
Legendre 2002, Griffith and Peres-Neto 2006, Dray
et al. 2006). A very similar approach, called eigenvector
filtering, was presented by Diniz-Filho and Bini (2005)
based on their method to account for phylogenetic non-
independence in biological data (Diniz-Filho et al.
1998). Eigenvectors from these matrices represent the
decompositions of Moran’s I statistic into all mutually
orthogonal maps that can be generated from a given
connectivity matrix (Griffith and Peres-Neto 2006).
Either binary or distance-based connectivity matrices
can be decomposed, offering a great deal of flexibility
regarding topology and transformations. Given the
non-Euclidean nature of the spatial connectivity ma-
trices (i.e. not all sampling units are connected), both
positive and negative eigenvalues are produced. The
non-Euclidean part is introduced by the fact that only
certain connections among sampling units, and not all,
are considered. Eigenvectors with positive eigenvalues
represent positive autocorrelation, whereas eigenvectors
with negative eigenvalues represent negative autocorre-
lation. For the sake of presenting a general method that
will work for either binary or distance matrices, we used
a distance-based eigenvector procedure (after Dray
et al. 2006) which can be summarized as follows:
1) compute a pairwise Euclidean (geographic) distance
matrix among sampling units: D
[d
ij
]; 2) choose a
threshold value t and construct a connectivity matrix
using the following rule:
W [w
ij
]
0ifij
0ifd
ij
t
[1(d
ij
=4t)
2
] if d
ij
5 t
8
<
:
where t is chosen as the maximum distance that
maintains connections among all sampling units being
connected using a minimum spanning tree algorithm
612

(Legendre and Legendre 1998). Because the example
data we use represent a regular grid (see below), t 1 and
thus w
ij
is either 0 or 11/4
2
0.9375 in our analysis.
Note that we can change 0.9375 to 1 without affecting
eigenvector extraction. This would make the matrix fully
compatible with a binary matrix which is the case for a
regular grid. 3) Compute the eigenvectors of the centred
similarity matrix: (I11
T
/n)W(I
11
T
/n), where I is the
identity matrix. Due to numerical precision regarding
the eigenvector extraction of large matrices (Bai et al.
1996) the method is limited to ca 7000 observations
depending on platform and software (but see Griffith
2000a, for solutions based on large binary connectivity
matrices). 4) Select eigenvectors to be included as spatial
predictors in a linear or generalised linear model. Here, a
model selection procedure that minimizes the amount of
spatial autocorrelation in residuals was used (see Griffith
and Peres-Neto 2006 and Appendix for computational
details). In this approach, eigenvectors are added to a
model until the spatial autocorrelation in the residuals,
measured by Moran’s I, is non-significant. Our selection
algorithm considered global Moran’s I (i.e. autocorrela-
tion across all residuals), but could be easily amended to
target spatial autocorrelation within certain distance
classes. The significance of Moran’s I was tested using a
permutation test as implemented in Lichstein et al.
(2002). This potentially renders the selection procedure
computationally intensive for large data sets (200 or
more observations), because a permutation test must be
performed for each new eigenvector entered into the
model. Once the location-dependent, but data-inde-
pendent eigenvectors are selected, they are incorporated
into the ordinary regression model (i.e. linear or
generalized linear model) as covariates. Since their
relevance has been assessed during the filtering process
model simplification is not indicated (although some
eigenvectors will not be significant).
3. Spatial models based on generalised least
squares regression
In linear models of normally distributed data, spatial
autocorrelation can be addressed by the related ap-
proaches of generalised least squares (GLS) and auto-
regressive models (conditional autoregressive models
(CAR) and simultaneous autoregressive models (SAR)).
GLS directly models the spatial covariance structure in
the variance-covariance matrix a, using parametric
functions. CAR and SAR, on the other hand, model
the error generating process and operate with weight
matrices that specify the strength of interaction between
neighbouring sites.
Although models based on generalised least squares
have been known in the statistical literature for
decades (Besag 1974, Cliff and Ord 1981), their
application in ecology has been very limited so far
(Jetz and Rahbek 2002, Keitt et al. 2002, Lichstein
et al. 2002, Dark 2004, Tognelli and Kelt 2004). This
is most likely due to the limited availability of
appropriate software that easily facilitates the applica-
tion of these kinds of models (Lichstein et al. 2002).
With the recent development of programs that fit a
variety of GLS (Littell et al. 1996, Pinheiro and Bates
2000, Venables and Ripley 2002) and autoregressive
models (Kaluzny et al. 1998, Bivand 2005, Rangel
et al. 2006), however, the range of available tools for
ecologists to analyse spatially autocorrelated normal
data has been greatly expanded.
Generalised least squares (GLS)
As before, the underlying model is YXbo, with the
error vector o N(0,aa). aa is called the variance-
covariance matrix. Instead of fitting individual values
for the variance-covariance matrix aa, a parametric
correlation function is assumed. Correlation functions
are isotropic, i.e. they depend only on the distance s
ij
between locations i and j, but not on the direction.
Three frequently used examples of correlation functions
C(s) also used in this study are exponential (C(s)s
2
exp(r/s)), Gaussian (C(s)s
2
exp(r/s))
2
) and sphe-
rical (C(s)s
2
(12=p(r=s
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1r
2
=s
2
p
sin
1
r=s));
where r is a scaling factor that is estimated from the
data).
Some restrictions are placed upon the resulting
variance-covariance matrix a: a) it must be symmetric,
and b) it must be positive definite. This guarantees that
the matrix is invertible, which is necessary for
the fitting process (see below). The choice of correlation
function is commonly based on a visual investigation of
the semi-variogram or correlogram of the residuals.
Parameter estimation is a two-step process. First, the
parameters of the correlation function (i.e. scaling
factor r in the examples used here) are found by
optimizing the so called profiled log-likelihood, which
is the log-likelihood where the unknown values for b
and s
2
are replaced by their algebraic maximum
likelihood estimators. Secondly, given the parameter-
ization of the variance-covariance matrix, the values for
b and s
2
are found by solving a weighted ordinary least
square problem:
XX
1=2
T
y
XX
1=2
T
Xb
XX
1=2
T
o
where the error term (aa
1=2
)
T
o is now normally
distributed with mean 0 and variance s
2
I.
Autoregressive models
Both CAR and SAR incorporate spatial autocorrelation
using neighbourhood matrices which specify the
613

Citations
More filters
Journal ArticleDOI

6,278 citations

01 Jan 2016
TL;DR: The modern applied statistics with s is universally compatible with any devices to read, and is available in the digital library an online access to it is set as public so you can download it instantly.
Abstract: Thank you very much for downloading modern applied statistics with s. As you may know, people have search hundreds times for their favorite readings like this modern applied statistics with s, but end up in harmful downloads. Rather than reading a good book with a cup of coffee in the afternoon, instead they cope with some harmful virus inside their laptop. modern applied statistics with s is available in our digital library an online access to it is set as public so you can download it instantly. Our digital library saves in multiple countries, allowing you to get the most less latency time to download any of our books like this one. Kindly say, the modern applied statistics with s is universally compatible with any devices to read.

5,249 citations

Journal ArticleDOI
TL;DR: Species distribution models (SDMs) as mentioned in this paper are numerical tools that combine observations of species occurrence or abundance with environmental estimates, and are used to gain ecological and evolutionary insights and to predict distributions across landscapes, sometimes requiring extrapolation in space and time.
Abstract: Species distribution models (SDMs) are numerical tools that combine observations of species occurrence or abundance with environmental estimates. They are used to gain ecological and evolutionary insights and to predict distributions across landscapes, sometimes requiring extrapolation in space and time. SDMs are now widely used across terrestrial, freshwater, and marine realms. Differences in methods between disciplines reflect both differences in species mobility and in “established use.” Model realism and robustness is influenced by selection of relevant predictors and modeling method, consideration of scale, how the interplay between environmental and geographic factors is handled, and the extent of extrapolation. Current linkages between SDM practice and ecological theory are often weak, hindering progress. Remaining challenges include: improvement of methods for modeling presence-only data and for model selection and evaluation; accounting for biotic interactions; and assessing model uncertainty.

5,076 citations


Cites background or methods from "Methods to account for spatial auto..."

  • ...Erroneous use of geographic terms to correct for either missing environmental predictors or wrongly specified models is likely to result in poor predictive ability, especially when extrapolating to new regions or times (Dormann et al. 2007, and see below)....

    [...]

  • ...Such data have prompted use of mixed models or other methods for dealing with pseudoreplication and spatial autocorrelation (Dormann et al. 2007, and Supplemental Literature Cited)....

    [...]

  • ...…residual geographic patterning generally indicates that either key environmental predictors are missing (Leathwick & Whitehead 2001), the model is mis-specified (e.g., only linear terms where nonlinear are required), or geographic factors are influential (Dormann et al. 2007, Miller et al. 2007)....

    [...]

  • ...…variables to an environmental model to test for residual spatial structure, or use of LISA (local indicator of spatial autocorrelation) to estimate the contribution of each sampling unit to the overall measure of spatial autocorrelation (Dormann et al. 2007, Miller et al. 2007, Rangel et al. 2006)....

    [...]

Journal ArticleDOI
TL;DR: It is shown that biotic interactions have clearly left their mark on species distributions and realised assemblages of species across all spatial extents, and is called for for accelerated collection of spatially and temporally explicit species data.
Abstract: Predicting which species will occur together in the future, and where, remains one of the greatest challenges in ecology, and requires a sound understanding of how the abiotic and biotic environments interact with dispersal processes and history across scales. Biotic interactions and their dynamics influence species' relationships to climate, and this also has important implications for predicting future distributions of species. It is already well accepted that biotic interactions shape species' spatial distributions at local spatial extents, but the role of these interactions beyond local extents (e.g. 10 km2 to global extents) are usually dismissed as unimportant. In this review we consolidate evidence for how biotic interactions shape species distributions beyond local extents and review methods for integrating biotic interactions into species distribution modelling tools. Drawing upon evidence from contemporary and palaeoecological studies of individual species ranges, functional groups, and species richness patterns, we show that biotic interactions have clearly left their mark on species distributions and realised assemblages of species across all spatial extents. We demonstrate this with examples from within and across trophic groups. A range of species distribution modelling tools is available to quantify species environmental relationships and predict species occurrence, such as: (i) integrating pairwise dependencies, (ii) using integrative predictors, and (iii) hybridising species distribution models (SDMs) with dynamic models. These methods have typically only been applied to interacting pairs of species at a single time, require a priori ecological knowledge about which species interact, and due to data paucity must assume that biotic interactions are constant in space and time. To better inform the future development of these models across spatial scales, we call for accelerated collection of spatially and temporally explicit species data. Ideally, these data should be sampled to reflect variation in the underlying environment across large spatial extents, and at fine spatial resolution. Simplified ecosystems where there are relatively few interacting species and sometimes a wealth of existing ecosystem monitoring data (e.g. arctic, alpine or island habitats) offer settings where the development of modelling tools that account for biotic interactions may be less difficult than elsewhere.

1,297 citations


Cites background from "Methods to account for spatial auto..."

  • ...It is however, also possible to find evidence supporting interactions among species by considering the spatial structure of the residuals in a single species’ model, although this by itself is not enough to indicate the presence of biotic interactions (Dormann et al., 2007)....

    [...]

Journal ArticleDOI
TL;DR: SAM (Spatial Analysis in Macroecology) as discussed by the authors ) is a freeware application that offers a comprehensive array of spatial statistical methods, focused primarily on surface pattern spatial analysis.
Abstract: SAM (Spatial Analysis in Macroecology) is a freeware application that offers a comprehensive array of spatial statistical methods, focused primarily on surface pattern spatial analysis. SAM is a compact, but powerful stand-alone software, with a user-friendly, menu-driven graphical interface. The methods available in SAM are the most commonly used in macroecology and geographical ecology, and range from simple tools for exploratory graphical analysis (e.g. mapping and graphing) and descriptive statistics of spatial patterns (e.g. autocorrelation metrics), to advanced spatial regression models (e.g. autoregression and eigenvector filtering). Download of the software, along with the user manual, can be downloaded online at the SAM website: (permanent URL at ).

1,123 citations

References
More filters
Book
01 Jan 1983
TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).
Abstract: The technique of iterative weighted linear regression can be used to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation. A generalization of the analysis of variance is given for these models using log- likelihoods. These generalized linear models are illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables) and gamma (variance components).

23,215 citations

Book
28 Jul 2013
TL;DR: In this paper, the authors describe the important ideas in these areas in a common conceptual framework, and the emphasis is on concepts rather than mathematics, with a liberal use of color graphics.
Abstract: During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression and path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for ``wide'' data (p bigger than n), including multiple testing and false discovery rates. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.

19,261 citations

BookDOI
01 Dec 2010
TL;DR: A guide to using S environments to perform statistical analyses providing both an introduction to the use of S and a course in modern statistical methods.
Abstract: A guide to using S environments to perform statistical analyses providing both an introduction to the use of S and a course in modern statistical methods The emphasis is on presenting practical problems and full analyses of real data sets

18,346 citations

Journal ArticleDOI
TL;DR: In this article, an extension of generalized linear models to the analysis of longitudinal data is proposed, which gives consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence.
Abstract: SUMMARY This paper proposes an extension of generalized linear models to the analysis of longitudinal data. We introduce a class of estimating equations that give consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence. The estimating equations are derived without specifying the joint distribution of a subject's observations yet they reduce to the score equations for multivariate Gaussian outcomes. Asymptotic theory is presented for the general class of estimators. Specific cases in which we assume independence, m-dependence and exchangeable correlation structures from each subject are discussed. Efficiency of the proposed estimators in two simple situations is considered. The approach is closely related to quasi-likelih ood. Some key ironh: Estimating equation; Generalized linear model; Longitudinal data; Quasi-likelihood; Repeated measures.

17,111 citations


"Methods to account for spatial auto..." refers background in this paper

  • ...Liang and Zeger (1986) developed the generalised estimating equation (GEE) approach which is an extension of generalised linear models (GLMs)....

    [...]

  • ...Liang and Zeger (1986) developed the generalised estimating equation (GEE) approach which is an extension of generalised linear models (GLMs)....

    [...]

Book
29 Mar 2013
TL;DR: Linear Mixed-Effects and Nonlinear Mixed-effects (NLME) models have been studied in the literature as mentioned in this paper, where the structure of grouped data has been used for fitting LME models.
Abstract: Linear Mixed-Effects * Theory and Computational Methods for LME Models * Structure of Grouped Data * Fitting LME Models * Extending the Basic LME Model * Nonlinear Mixed-Effects * Theory and Computational Methods for NLME Models * Fitting NLME Models

10,715 citations

Frequently Asked Questions (12)
Q1. What contributions have the authors mentioned in the paper "Methods to account for spatial autocorrelation in the analysis of species distributional data: a review" ?

In this paper, Dormann et al. present a set of statistical methods for species distribution analysis. 

Typical examples of ecological data with normally distributed errors include abundance, species richness, or functional diversity per unit area, crop yield and catch per unit effort. 

i.e. the prediction of values within the parameter and spatial range, can be achieved by several of the presented methods. 

Either binary or distance-based connectivity matrices can be decomposed, offering a great deal of flexibility regarding topology and transformations. 

Bayesian methods are also a generally more suitable tool for inference in data sets with many missing values, or when accounting for detection probabilities (Gelfand et al. 2005, Kühn et al. 2006). 

Due to numerical precision regarding the eigenvector extraction of large matrices (Bai et al. 1996) the method is limited to ca 7000 observations depending on platform and software (but see Griffith 2000a, for solutions based on large binary connectivity matrices). 

A weight matrix W was used to simulate the spatially correlated errors oi using weights according to the distance between data points. 

CAR and SAR, on the other hand, model the error generating process and operate with weight matrices that specify the strength of interaction between neighbouring sites. 

Bayesian methods for the analyses of species distribution data are more flexible; they can be more easily extended to include more complex structures (Latimer et al. 2006). 

While Lennon (2000) and others (Tognelli and Kelt 2004, Jetz et al. 2005, Dormann 2007b, Kühn 2007) argue that spatial autocorrelation in species distribution models may well bias coefficient estimation, Diniz-Filho et al. (2003) and Hawkins et al. (2007) found non-spatial model to be robust and unbiased for several data sets. 

Some restrictions are placed upon the resulting variance-covariance matrix a: a) it must be symmetric, and b) it must be positive definite. 

One might therefore argue that, while taking the autocorrelation structure as constant adds one more assumption, the use of spatial parameters at least helps to derive better models.