scispace - formally typeset
Open AccessJournal ArticleDOI

Validation of species-climate impact models under climate change

TLDR
In this article, the authors provided a first independent validation of four envelope modelling techniques under climate change. And they showed good to fair predictive performance on independent validation, although rules used to assess model performance are difficult to interpret in decision-planning context.
Abstract
Increasing concern over the implications of climate change for biodiversity has led to the use of species–climate envelope models to project species extinction risk under climatechange scenarios. However, recent studies have demonstrated significant variability in model predictions and there remains a pressing need to validate models and to reduce uncertainties. Model validation is problematic as predictions are made for events that have not yet occurred. Resubstituition and data partitioning of present-day data sets are, therefore, commonly used to test the predictive performance of models. However, these approaches suffer from the problems of spatial and temporal autocorrelation in the calibration and validation sets. Using observed distribution shifts among 116 British breeding-bird species over the past � 20 years, we are able to provide a first independent validation of four envelope modelling techniques under climate change. Results showed good to fair predictive performance on independent validation, although rules used to assess model performance are difficult to interpret in a decision-planning context. We also showed that measures of performance on nonindependent data provided optimistic estimates of models’ predictive ability on independent data. Artificial neural networks and generalized additive models provided generally more accurate predictions of species range shifts than generalized linear models or classification tree analysis. Data for independent model validation and replication of this study are rare and we argue that perfect validation may not in fact be conceptually possible. We also note that usefulness of models is contingent on both the questions being asked and the techniques used. Implementations of species–climate envelope models for testing hypotheses and predicting future events may prove wrong, while being potentially useful if put into appropriate context.

read more

Content maybe subject to copyright    Report

Validation of species–climate impact models under
climate change
MIGUEL B. ARAU
´
JO
*
w , RICHARD G. PEARSON
*
z, WILFRIED THUILLER§ and
MARKUS ERHARD}
*
Biodiversity Research Group, School of Geography and Environment, University of Oxford, Mansfield Road, Oxford OX1 3TD,
UK, wBiogeography and Conservation Laboratory, Natural History Museum, Cromwell Road, London SW7 5BD, UK,
zMacroecology and Conservation Unit, University of E
´
vora, Estrada dos Leo
˜
es, 7000-730 E
´
vora, Portugal, §Climate Change
Research Group, Kirstenbosch Research Centre, South African National Biodiversity Institute, Private Bag x7, Claremont 7735,
Cape Town, South Africa, }Institute for Meteorology and Climate Research, Forschungszentrum Karlsruhe, Postfach 3640, 76021
Karlsruhe, Germany
Abstract
Increasing concern over the implications of climate change for biodiversity has led to the
use of species–climate envelope models to project species extinction risk under climate-
change scenarios. However, recent studies have demonstrated significant variability in
model predictions and there remains a pressing need to validate models and to reduce
uncertainties. Model validation is problematic as predictions are made for events that
have not yet occurred. Resubstituition and data partitioning of present-day data sets are,
therefore, commonly used to test the predictive performance of models. However, these
approaches suffer from the problems of spatial and temporal autocorrelation in the
calibration and validation sets. Using observed distribution shifts among 116 British
breeding-bird species over the past 20 years, we are able to provide a first
independent validation of four envelope modelling techniques under climate change.
Results showed good to fair predictive performance on independent validation,
although rules used to assess model performance are difficult to interpret in a
decision-planning context. We also showed that measures of performance on
nonindependent data provided optimistic estimates of models’ predictive ability on
independent data. Artificial neural networks and generalized additive models provided
generally more accurate predictions of species range shifts than generalized linear
models or classification tree analysis. Data for independent model validation and
replication of this study are rare and we argue that perfect validation may not in fact be
conceptually possible. We also note that usefulness of models is contingent on both the
questions being asked and the techniques used. Implementations of species–climate
envelope models for testing hypotheses and predicting future events may prove wrong,
while being potentially useful if put into appropriate context.
Keywords: bioclimatic-envelope models, breeding birds, Britain, climate change, model accuracy,
uncertainty, validation
Received 3 November 2004; revised version received 24 January 2005; accepted 8 March 2005
Introduction
Attempts to predict climate-change impacts on biodi-
versity have often relied on the species–climate
‘envelope’ modelling approach (also known as ecolo-
gical niche models), whereby present day distributions
of species are combined with environmental variables
to project distributions of species under future climates
(for review, see Pearson & Dawson, 2003). In spite of the
Correspondence: Miguel B. Arau
´
jo, Departamento de
Biodiversidad y Biologia Evolutiva, Museo Nacional de Ciencias
Naturales, CSIC, C/Jose Gutierrez Abascal, 2, 28006 Madrid,
Spain, tel. 1 34 91411328, fax 1 34 915645078,
e-mail: maraujo@mncn.csic.es
Global Change Biology (2005) 11, 1504–1513, doi: 10.1111/j.1365-2486.2005.001000.x
1504 r 2005 Blackwell Publishing Ltd

inherent limitations of correlative models (for review,
see Guisan & Zimmermann, 2000), projections arising
from species–climate envelope models have been used
to support estimates of species’ extinction risk under
climate change for a variety of taxa and parts of the
world (e.g. Bakkenes et al., 2002; Erasmus et al., 2002;
Midgley et al., 2002; Peterson et al., 2002; Thomas et al.,
2004a). The impact of these estimates within political
and public debate is potentially high, yet there is great
deal of scope for misrepresenting the science behind
such studies (Ladle et al., 2004). Recent studies have
reported that projections arising from species–climate
models may be highly sensitive to the assumptions,
algorithms and parameterizations of different methods
(e.g. Thuiller, 2004; Thuiller et al.; 2004a, Pearson et al.,
2005). These studies have raised a number of metho-
dological issues that lead to a degree of uncertainty
which has been underestimated, or simply overlooked,
in previous assessments of climate impacts on biodi-
versity. We argue that when results of a particular
analysis contribute to the discussion of the weight of
evidence required to support important societal deci-
sions, the demand that models’ predictive accuracy be
assessed is eminently reasonable.
Nevertheless, validation (also referred to as evalua-
tion) of species–climate envelope models under climate
changes remains poorly explored. The reason is that
events being predicted have either been poorly docu-
mented or have yet not occurred. Consequently,
assessments of accuracy are usually limited to a process
of ‘resubstituition’, in which the data used to calibrate
(or train) models are also used to validate (test) them
(Fig. 1a; for review, see Table 1). A problem with the
resubstituition approach is that models may overfit to
the calibration data, leaving users unable to judge
whether high accuracy on nonindependent data reflect
good predictive accuracy on independent data sets.
Some authors also caution against possible bias in
estimates of model-prediction errors as the models are
optimized to deal with the ‘noise’ in the data and might
consequently lose generality outside the original data
(for discussion, see Olden & Jackson, 2000; Olden et al.,
2002). To address these problems, a growing number of
studies have used data partitioning methods for the
allocation of cases to calibration and validation data
sets. The most familiar technique is one-time data-
splitting, whereby data are split into calibration and
validation samples by random process (Fig. 1b, Table 1).
There are alternative techniques including grouped
cross-validation (also known as k fold partitioning, hold
out, or external method), bootstrapping, and jack-
knifing (also known as leave-one-out) (for discussion,
see Harrell, 2001), but they all share the assumption
that randomly selected samples from original data
constitute independent observations, hence suitable for
model validation. Although these validation strategies
have generally been accepted to provide more robust
measures of predictive success than resubstituition (e.g.
Fielding & Bell, 1997), they may not avoid two of the
most important pitfalls of correlative models. The first
is that of spatial autocorrelation in the distribution of
species and environmental variables (e.g. Hampe,
2004). This is a problem because modelling techniques
assume that modelled events are independent, which is
not true in the case of spatially autocorrelated data. This
problem is not overridden by resampling the original
data randomly, nor is it by carrying additional field
sampling for testing models within the modelled
region, because any of these validation strategies would
use test data that is spatially autocorrelated with data to
calibrate models. The second is that of temporal
correlation in biological and environmental phenom-
ena. This is another form of autocorrelation in the data,
and implies that observations in time series are
100%
Environmental
envelope
Environmental
envelope
Environmental
envelope
Evaluation
Calibration
Projection
Same region
New region
New resolution
New time
Evaluation
Calibration
Projection
Same region
New region
New resolution
New time
Evaluation
Calibration
Projection
100%
New region
New resolution
New time
(a)
(b)
(c)
Fig. 1 Species-climate envelope modelling framework under
three calibration and validation strategies: (a) resubstituition; (b)
data splitting; and (c) independent validation.
VALIDATION OF IMPACT MODELS 1505
r 2005 Blackwell Publishing Ltd, Global Change Biology, 11, 1504–1513

nonrandom because of lack of independence between
data points that are adjacent in time. Consequently,
projections of observed current distributions closer in
time are likely to be more similar than projections made
further apart. The interplay of spatial and temporal
autocorrelation make it conceptually difficult to discard
the possibility that models’ goodness-of-fit to the data
represent an over-optimistic estimate of their predictive
ability outside the initial spatial and temporal condi-
tions defining the training set (e.g. Beutel et al., 1999).
Thus, the number of degrees of freedom is over-
estimated, causing unrealistically small estimates of
the standard errors of the model outputs. In addition,
as temporal autocorrelation can introduce slow changes
(i.e. low-frequency variability) in the time series, it can
affect the estimate of the degree of estimated changes.
It may be argued that the predictive accuracy of
species–climate envelope models can only be fully
tested by means of validation studies using direct
comparison of model predictions with independent
empirical observations (Fig. 1c). Attempts to perform
such tests are relatively rare. A limited number of
studies have attempted independent validation using
known distributions in different regions (Beerling et al.,
1995; Fielding & Haworth, 1995; Peterson, 2003a), data
at different resolutions (Pearson et al., 2004; Arau
´
jo
et al., 2005a), field observations in previously un-
sampled regions where species’ occurrences are pre-
dicted (Raxworthy et al., 2003), fossil records of
mammal distributions under Pleistocene climates
(Martinez-Meyer et al., 2004), and visual comparison
between simulated and observed range changes for
butterflies in the UK over the 20th century (Hill et al.,
1999). However, statistical validation using indepen-
dent data describing range shifts under recent climate
change has not previously been undertaken.
As models projecting species’ distributional shifts
under future climate change are unlikely to be
validated in most circumstances because of data
limitations, it is important to improve understanding
Table 1 Four approaches used to validate species–climate envelope models under climate change
Reference Resubstituition Bootstrap Data-splitting
Independent
validation
Arau
´
jo et al. (2004) 1
Bakkenes et al. (2002) 1
Beaumont & Hughes (2002)
*
Berry et al. (2002) 1
Burns et al. (2003) 1
Erasmus et al. (2002) 1
Guisan & Theurillat (2000) 1
Huntley (1995) 1
Huntley et al. (1995) 1
Huntley et al. (2004) 1
Iverson & Prasad (1998) 1
Iverson et al. (1999) 1
Martinez-Meyer et al. (2004) 1
Midgley et al. (2002) 1
Midgley et al. (2003) 1
Miles et al. (2004) 1
Pearson et al. (2002) 1
Pearson et al. (2005) 1
Peterson (2003b) 1
Peterson et al. (2002) 1
Peterson et al. (2001) 1
Saetersdal et al. (1998)
*
Skov & Svenning (2004) 1
Sykes et al. (1996) 1
Teixeira & Arntzen (2002) 1
Thuiller (2003) 1
Thuiller (2004) 1
Thuiller et al. (2004a) 1
Thuiller et al. (2004b) 1
Few studies (
*
) have not attempted to validate the predictive accuracy of their models.
1506 M. B. ARAU
´
JO et al.
r 2005 Blackwell Publishing Ltd, Global Change Biology, 11, 1504–1513

of the underlying characteristics of data and methods
that contribute uncertainty to predictions. Because most
model evaluations assess accuracy to the calibration, or
nonindependent validation data (also referred to as
verification), it is important to investigate the degree to
which these measures correlate with proper validations
on independent data sets. These questions can be
addressed only when independent data adequate for
model validation are available and this is a rare
circumstance for climate-change impact assessments.
We make a first attempt to address these problems
using British-breeding bird distributional records in
two periods between the 1960s and the 1990s. We
assume these are independent events, although we
acknowledge that some degree of nonindependence
may arise given that data were recorded in the same
region and in two periods of time only 20-years apart.
However, they do constitute a rare record of observed
range shifts, and one of the few examples of species
range-shift data that allows direct comparison between
observations in each recording period, without the need
to correct for sampling bias. Furthermore, they also
have the advantage of including species reported to
shift northward in apparent response to recent regional
climate changes (Thomas & Lennon, 1999). The
unprecedented quality of these data allows researchers
to explore issues of bioclimate envelope model valida-
tion that have not yet been addressed in the literature.
In particular, we ask: (1) how well do models perform
on an independent validation dataset? (2) does valida-
tion using nonindependent distribution data provide a
good surrogate for accuracy on independent data? (3)
do particular modelling techniques perform consis-
tently better than others?
Data and methods
Species data
We used distributional records in Britain for 116 native
breeding-bird species recorded during the periods
1968–1972 (t
1
) and 1988–1991 (t
2
) (Sharrock, 1976,
Gibbons et al., 1993). Volunteer recorders achieved
100% cover of the British 2831 10 km squares, with the
total number of nonduplicate 10 km squares receiving
records for the second period being within 1% of the
217 615 10 km squares records received for 1968–1972.
This has allowed researchers to make comparisons
between occupancy of squares in each recording
period, without the need to correct for sampling bias
(e.g. Thomas & Lennon, 1999; Thomas et al., 2004b). Our
analyses of bird distributions did not include marine,
waterfowl, and aquatic shorebirds. Species with less
than 20 records in the first recording period were also
excluded from analysis to avoid problems related to
modelling data with excessively small sample sizes
(e.g. Stockwell & Peterson, 2002). The minimum
number of records for a species in this period was 25,
the median number was 1560, and the maximum was
2405.
Climate data
A set of aggregated climate parameters were derived
from an updated version of the CRU (Climate Research
Unit at the University of East Anglia, UK) monthly
climate data (New et al., 2000). The updated data set
provides monthly values for the years 1901–2000 at
10
0
10
0
spatial resolution (Mitchell et al., 2004).
Average monthly temperature, precipitation and cloud
cover of 1416 grid cells covering the area of the UK
(7130
0
E–1140
0
W and 501N–611N) were used to calculate
mean values of six different climate parameters in two
different time slices (1967–1972, 1987–1991). Variables
include mean annual temperature within time slices
( 1C), mean temperature of the coldest month ( 1 C),
mean temperature of the warmest month ( 1C), mean
annual summed precipitation (mm), and mean sum of
precipitation between July–September (mm), and grow-
ing season, defined as the temperature sum of all
consecutive days with mean temperature greater than
5 1C. The six variables were selected on the basis that
they are known to impose constraints upon species
distributions as a result of widely shared physiological
limitations (Crick, 2004).
Species–climate modelling
Breeding bird species distribution records in Britain
were modelled using SPLUS-based BIOMOD (Thuiller,
2003). Modelling procedures included (1) generalized
linear models (GLM) with linear, quadratic and poly-
nomial terms (second and third order). A stepwise
procedure using the AIC criterion was used to select the
most significant variables (Akaike, 1974); (2) general-
ized additive models (GAM) with cubic-smooth
splines. The degree of smoothness was bounded to
four for each variable. As for GLM, a stepwise
procedure was used to select the most parsimonious
model; (3) classification tree analysis (CTA) using a 10-
fold cross-validation to select the best trade-off between
the number of leaves of the tree and the explained
deviance; and (4) feed-forward artificial neural net-
works (ANN) with seven hidden units in a single layer
and with weight decay equal to 0.03. Because of the
heuristic nature of ANN models were run 10 times and
the mean prediction was used. This procedure of
averaging predictions over the collection of networks
VALIDATION OF IMPACT MODELS 1507
r 2005 Blackwell Publishing Ltd, Global Change Biology, 11, 1504–1513

is often preferred to using the solution giving the
lowest error (Ripley, 1996).
Two runs were made with each modelling technique.
In the first run, models were calibrated on a 70%
random sample of the original time t
1
data and
predictive accuracy was evaluated on the remaining
30% of the data (Fig. 1b). The size of the calibration set
was determined by application of a commonly used
heuristic for identifying the ratio of training and cross-
validation sets in presence and absence models:
[1 1 (p1)
1/2
]
1
, where p is the number of predictor
(here climate) variables (Fielding & Bell, 1997). In the
second run, models were calibrated using 100% of the
original time t
1
data and evaluated on the original time
t
2
data (Fig. 1c). In each run, we tested agreement
between observed and projected distributions by
calculating Cohen’s k statistic of similarity (k) and the
area under curve (AUC) of the receiver operating
characteristic (ROC) approach (Fielding & Bell, 1997).
We used the k approach after maximising the statistic
over a range of thresholds above which model outputs
are considered to represent species’ presence. We
calculated AUC using the nonparametric method based
on the derivation of the Wilcoxon statistic (Fielding &
Bell, 1997). Values of AUC range from 0.5 for models
with no predictive ability, to 1.0 for models giving
perfect predictions. k values range from 0.0 (no
predictive ability) to 1.0 (perfect predictive ability).
There are a number of rules-of-thumb available to help
interpreting measures of agreement between observed
and projected events. For example, when using the k
statistic approach, Landis & Koch (1977) suggest the
following ranges of agreement: excellent K40.75; good
0.404Ko0.75; and poor Ko0.40. When using the ROC
procedure, Swets (1988) recommends interpreting
range values as: excellent AUC40.90; good 0.804
AUCo0.90; fair 0.704AUCo0.80; poor 0.604AUCo
0.70; fail 0.504AUCo0.60.
Results
How well do models perform on an independent validation
dataset?
Our results demonstrate that models’ predictive accu-
racy on independent validation were good around
median values with AUC assessment (i.e. 0.80oAU-
Co0.90 except for CTA and GLM), but only fair near
the lower quartile distribution of accuracy values (i.e.
0.70oAUCo0.80, Table 2). With k assessment, models
also provided good agreement around median values
(i.e. 0.40oko0.75 except for GLM), while lower
quartile distribution values of accuracy were classified
as poor (i.e. ko0.40). In both cases, upper quartile
accuracy values were below ‘excellent’ threshold values
(i.e. AUCo0.90 and ko0.75).
Does validation on nonindependent distribution data
provide a good surrogate for accuracy on independent
data?
As most assessments of model accuracy use noninde-
pendent data, it is useful to estimate the degree to
which predictive accuracy measured with nonindepen-
dent t
1
distribution data provides a good surrogate for
accuracy on t
2
independent data. Our results show that
model accuracy evaluated on nonindependent 30%
subset of t
1
data was always higher than accuracy on
Table 2 Predictive accuracy of different modelling techniques (ANN, CTA, GAM and GLM), calibrated with 70% data from time t
1
and verified against remaining 30% data of time t
1
(Fig. 1b), or calibrated with 100% of time t
1
data and validated against 100% time
t
2
(Fig. 1c)
Calibration 70% t
1
Validation 30% t
1
D
b
Calibration 100% t
1
Validation 100% t
2
D
c
British breeding birds
k
ANN 0.59 (0.48, 0.70) 0.59 (0.43, 0.69) 0 0.60 (0.47, 0.69) 0.46 (0.26, 0.56) 0.14
CTA 0.57 (0.47, 0.67) 0.53 (0.38, 0.62) 0.04 0.57 (0.45, 0.66) 0.40 (0.25, 0.53) 0.17
GAM 0.53 (0.41, 0.66) 0.58 (0.40, 0.67) 0.05 0.53 (0.42, 0.66) 0.43 (0.29, 0.54) 0.10
GLM 0.53 (0.42, 0.66) 0.57 (0.41, 0.67) 0.04 0.54 (0.42, 0.66) 0.37 (0.22, 0.50) 0.17
AUC
ANN 0.92 (0.87, 0.94) 0.90 (0.85, 0.93) 0.02 0.92 (0.87, 0.94) 0.84 (0.78, 0.88) 0.08
CTA 0.88 (0.82, 0.91) 0.86 (0.78, 0.89) 0.02 0.87 (0.81, 0.91) 0.77 (0.70, 0.83) 0.10
GAM 0.91 (0.85, 0.94) 0.90 (0.85, 0.93) 0.01 0.91 (0.85, 0.94) 0.82 (0.75, 0.89) 0.09
GLM 0.91 (0.85, 0.93) 0.90 (0.85, 0.93) 0.01 0.91 (0.86, 0.93) 0.78 (0.68, 0.85) 0.13
Values correspond to median (lower quartile, upper quartile) accuracy measures (k and ROC) obtained for selected British breeding
birds (n 5 116); D values correspond to the difference between median accuracy measured on the 30% randomly chosen t
1
data or
100% time t
2
validation sets and median accuracy measured on calibration sets.
1508 M. B. ARAU
´
JO et al.
r 2005 Blackwell Publishing Ltd, Global Change Biology, 11, 1504–1513

Citations
More filters
Journal ArticleDOI

Predicting species distribution: offering more than simple habitat models.

TL;DR: An overview of recent advances in species distribution models, and new avenues for incorporating species migration, population dynamics, biotic interactions and community ecology into SDMs at multiple spatial scales are suggested.
Journal ArticleDOI

Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation

TL;DR: This paper presents a tuning method that uses presence-only data for parameter tuning, and introduces several concepts that improve the predictive accuracy and running time of Maxent and describes a new logistic output format that gives an estimate of probability of presence.
Journal ArticleDOI

Species Distribution Models: Ecological Explanation and Prediction Across Space and Time

TL;DR: Species distribution models (SDMs) as mentioned in this paper are numerical tools that combine observations of species occurrence or abundance with environmental estimates, and are used to gain ecological and evolutionary insights and to predict distributions across landscapes, sometimes requiring extrapolation in space and time.
Journal ArticleDOI

Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS)

TL;DR: In this article, the authors provide a theoretical explanation for the observed dependence of kappa on prevalence, and introduce an alternative measure of accuracy, the true skill statistic (TSS), which corrects for this dependence while still keeping all the advantages of Kappa.
References
More filters
Journal ArticleDOI

The measurement of observer agreement for categorical data

TL;DR: A general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies is presented and tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interob server agreement are developed as generalized kappa-type statistics.
Journal ArticleDOI

A new look at the statistical model identification

TL;DR: In this article, a new estimate minimum information theoretical criterion estimate (MAICE) is introduced for the purpose of statistical identification, which is free from the ambiguities inherent in the application of conventional hypothesis testing procedure.
Journal ArticleDOI

The Elements of Statistical Learning

Eric R. Ziegel
- 01 Aug 2003 - 
TL;DR: Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research, and a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods.
Journal ArticleDOI

Measuring the accuracy of diagnostic systems

John A. Swets
- 03 Jun 1988 - 
TL;DR: For diagnostic systems used to distinguish between two classes of events, analysis in terms of the "relative operating characteristic" of signal detection theory provides a precise and valid measure of diagnostic accuracy.
Related Papers (5)
Frequently Asked Questions (10)
Q1. What are the contributions mentioned in the paper "Validation of species–climate impact models under climate change" ?

Increasing concern over the implications of climate change for biodiversity has led to the use of species–climate envelope models to project species extinction risk under climatechange scenarios. Using observed distribution shifts among 116 British breeding-bird species over the past 20 years, the authors are able to provide a first independent validation of four envelope modelling techniques under climate change. The authors also showed that measures of performance on nonindependent data provided optimistic estimates of models ’ predictive ability on independent data. Data for independent model validation and replication of this study are rare and the authors argue that perfect validation may not in fact be conceptually possible. Implementations of species–climate envelope models for testing hypotheses and predicting future events may prove wrong, while being potentially useful if put into appropriate context. 

The high performance of complex nonlinear techniques suggests that relatively unexplored methodologies such as multivariate adaptive regression splines, adaptive logistic regression ( boosting ) and generalized multiplicative models ( for review see Hastie et al., 2001 ) might deserve future testing. Many studies have used good model fits on nonindependent validation data to support results pertaining to the potential impacts of future climate change on biodiversity ( see references in Table 1 ). There are many reasons, additionally to the effects of autocorrelation in the data, why good model fits on present-day distribution data ( i. e. nonindependent validation data ) do not necessarily translate into good predictions of future ranges. There are clearly limits to the ability of any model to predict the future distribution of species under climate change, and model validation thus becomes a conceptually difficult problem. 

Average monthly temperature, precipitation and cloud cover of 1416 grid cells covering the area of the UK (71300 E–11400 W and 501N–611N) were used to calculate mean values of six different climate parameters in two different time slices (1967–1972, 1987–1991). 

A problem with the resubstituition approach is that models may overfit to the calibration data, leaving users unable to judge whether high accuracy on nonindependent data reflect good predictive accuracy on independent data sets. 

As most assessments of model accuracy use nonindependent data, it is useful to estimate the degree to which predictive accuracy measured with nonindependent t1 distribution data provides a good surrogate for accuracy on t2 independent data. 

Such factors may include the presence of spurious correlations between response (i.e. species) and predictor (i.e. climate) variables, which may translate into poor predictions on independent validation data (e.g. Guisan & Zimmermann, 2000). 

This pattern of performance across modelling techniques is consistent with previous assessments of performance of species–climate envelope models with nonindependent data (for reviews see Olden & Jackson, 2002; Segurado & Araújo, 2004), and suggests that modelling techniques capable of summarising complex nonlinear relationships are more likely to provide useful projections of species responses to climate change. 

There are many reasons, additionally to the effects of autocorrelation in the data, why good model fits on present-day distribution data (i.e. nonindependent validation data) do not necessarily translate into good predictions of future ranges. 

This is because the effect of inflated performance arising from modelling spatially and temporally autocorrelated data should decrease as observed and modelled events become increasingly independent from each other. 

It may be argued that the predictive accuracy of species–climate envelope models can only be fully tested by means of validation studies using direct comparison of model predictions with independentempirical observations (Fig. 1c).