scispace - formally typeset
Journal ArticleDOI

Correspondence Analysis of Incidence and Abundance Data:Properties in Terms of a Unimodal Response Model

C.J.F. ter Braak
- 01 Dec 1985 - 
- Vol. 41, Iss: 4, pp 859-873
Reads0
Chats0
TLDR
In this paper, correspondence analysis is shown to approximate the maximum likelihood solution of explicit unimodal response models in one latent variable, and the approximation is best when the maxima and tolerances (widths) of the response curves are equal and the species' optima and sample values of the latent variable are equally spaced.
Abstract
Correspondence analysis is commonly used by ecologists to analyze data on the incidence or abundance of species in samples. The first few axes are interpreted as latent variables and are presumed to relate to underlying environmental variables. In this paper correspondence analysis is shown to approximate the maximum likelihood solution of explicit unimodal response models in one latent variable. These models are logistic-linear for presence/absence data and loglinear for Poisson counts, with predictors that are quadratic in the latent variable. The approximation is best when the maxima and tolerances (widths) of the response curves are equal and the species' optima and the sample values of the latent variable are equally spaced. It is still fairly good for uniformly distributed optima and sample values, as shown by simulation. For the models extended to two latent variables, the approximation is often bad because of the horseshoe effect in correspondence analysis, but improves considerably in the simulations when this effect is removed as it is in detrended correspondence analysis.

read more

Content maybe subject to copyright    Report

Correspondence Analysis
of
Incidence and Abundance Data: Properties in
Terms
of
a Unimodal Response Model
STOR
Cajo
J.
F. ter Braak
Biometrics, Vol. 41,
No.4
(Dec., 1985), 859-873.
Stable URL:
http:/ /links.j stor.org/sici
?sici=0006-341X%28198512%2941 %3A4%3C859%3ACAOIAA %3E2.0.C0%3B2-S
Your use
of
the JSTOR archive indicates your acceptance
of
JSTOR' s Terms and Conditions
of
Use, available at
http://www.jstor.org/about/terms.html.
JSTOR's
Terms and Conditions
of
Use provides, in part, that unless you
have obtained prior permission, you may not download an entire issue
of
a journal or multiple copies
of
articles, and
you may use content in the
JSTOR archive only for your personal, non-commercial use.
Each copy
of
any part
of
a JSTOR transmission must contain the same copyright notice that appears on the screen or
printed page
of
such transmission.
Biometrics is published by International Biometric Society. Please contact the publisher for further permissions
regarding the use
of
this work. Publisher contact information may
be
obtained at
http://www
.j
stor .org/journals/ibs.html.
Biometrics
©1985 International Biometric Society
JSTOR and the JSTOR logo are trademarks
of
JSTOR, and are Registered in the U.S. Patent and Trademark Office.
For more information on JSTOR contactjstor-info@umich.edu.
©2003 JSTOR
http://www
.j
stor.org/
Fri
Oct
17
09:53:12 2003

BIOMETRICS 41, 859-873
December
1985
Correspondence
Analysis of
Incidence
and
Abundance
Data:
Properties
in
Terms
of a
Unimodal
Response
Model
Cajo
J.
F. ter Braak
TNO
Institute
of
Mathematics, Information Processing
and
Statistics,
P.
0.
Box 100, 6700 AC Wageningen,
The
Netherlands
SUMMARY
Correspondence analysis
is
commonly used by ecologists to analyze data on the incidence
or
abundance
of
species in samples. The first
few
axes are interpreted as latent variables and are presumed
to relate to underlying environmental variables. In this paper correspondence analysis
is
shown to
approximate the maximum likelihood solution
of
explicit unimodal response models in one latent
variable. These models are logistic-linear for presence/absence data and loglinear for Poisson counts,
with predictors that are quadratic in the latent variable. The approximation
is
best when the maxima
and tolerances (widths)
of
the response curves are equal and the species' optima and the sample
values
of
the latent variable are equally spaced.
It
is
still fairly good for uniformly distributed optima
and sample values, as shown by simulation. For the models extended
to
two latent variables, the
approximation is often
ba:d
because
of
the horseshoe effect in correspondence analysis, but improves
considerably in the simulations when this effect
is
removed as it
is
in detrended correspondence
analysis.
1. Introduction
Correspondence analysis is a multivariate technique primarily developed for the analysis
of
contingency table
data
(Nishisato, 1980; Greenacre, 1984). However,
in
ecology
and
archaeology, correspondence analysis is
commonly
applied
to
incidence
or
abundance
matrices· (Gauch, 1982).
In:
ecology these matrices typically record the presence/absence
or
abundance
of
species
in
samples, e.g., plant species
in
quadrats
or
animal species
in
areas.
Such matrices are
not
transformed
to
m-way contingency tables
"on
the grounds
that
the
data
are essentially asymmetric
and
the absences indicate little" (Hill, 1974). Clearly a
different rationale is needed for the application
of
correspondence analysis to incidence
or
abundance data. A pertinent result concerns so-called Petrie matrices (a Petrie matrix is
an
incidence matrix which has a block
of
consecutive 1 's
in
every row
and
in
every column,
the block
of
the first row starting
in
the first
column
and
the
block
of
the last row ending
in the last column).
The
result says
that
if
a matrix
can
be rearranged
to
a Petrie matrix by
a permutation
of
rows
and
columns,
then
this permutation is generated by the first nontrivial
solution
of
correspondence analysis (see Hill, 1974).
Hill (
197
3)
introduced correspondence analysis
to
ecology,
under
the
name
of
"reciprocal
averaging." He suggested the technique as a natural extension
of
the
method
of
weighted
averaging used in Whittaker's (1956) "direct gradient analysis." Whittaker,
among
others,
observed that species typically show unimodal (bell-shaped) response curves with respect
to environmental gradients.
For
example, a plant species
may
prefer a particular soil
moisture content,
and
not
grow
at
all
in
places where the soil is either too dry
or
too
wet.
Key
words:
Correspondence analysis; Detrended correspondence analysis; Dual scaling; Ecology;
Generalized linear models; Joint plot; Reciprocal averaging; Species packing model; Unfolding;
Unimodal response model.
859

860
Biometrics;
December
1985
Each species is therefore largely confined to a specific interval
along
an
environmental
variable.
The
value most preferred by a species was termed its "indicator value"
or
optimum.
In Whittaker's method,
the indicator value
of
a species is estimated by taking the average
of
the values
of
the environmental variable
in
those samples
in
which the species occurs.
(For quantitative data, the average is weighted by species abundance.) Conversely, with
known indicator values
of
species, weighted averaging is used to estimate the value
of
an
environmental variable
in
a sample from the species
that
it contained [see e.g., Kovacs
(1969) for
an
application]. Hill (1973) showed
that
if
iterated, this process
of
"reciprocal
averaging" converges
to
a solution independent
of
initial indicator values, namely the first
nontrivial axis
of
correspondence analysis (see also Greenacre, 1984, §4.2). Hill's method
therefore
amounts
to arranging samples
and
species along a latent variable,
an
activity
Whittaker (1967) termed
"indirect gradient analysis." After such analysis, attempts are
made
to
identify the latent variable by comparison with known variation
in
the environment
(Gauch, 1982).
The
Petrie matrix provides a deterministic example
of
a response model
wherein the response curves are (weakly) unimodal
"block functions." Unimodal models
also play
an
important role
in
unfolding theory (Coombs, 1964).
In
this paper, correspondence analysis is regarded as
an
estimation method for latent
variable models
and
is compared with.
maximum
likelihood
under
parametric unimodal
response models with respect to one
or
two latent variables. The models considered are
loglinear
and
logistic-linear models with predictors
that
are quadratic
in
the latent vari-
able(s).
Ter
Braak
and
Barendregt (in press) showed
that
these are the only models with
Poisson
and
binomial error, respectively, for which the weighted average
of
indicator values
can
achieve unit asymptotic efficiency with respect
to
maximum
likelihood.
The
compari-
son gives some idea about the model
that
is implicitly invoked when correspondence
analysis is applied
to
incidence
or
abundance data. This comparison is important because
the maximum likelihood approach may be computationally too demanding for the numbers
of
species
and
samples commonly encountered
in
ecological research. Moreover, when the
maximum likelihood approach is considered worthwhile, the results suggest
that
good
initial estimates
can
be derived from correspondence analysis or, for two latent variables,
from detrended correspondence analysis (Hill
and
Gauch, 1980).
2. Correspondence Analysis
Nishisato (
1980) takes the view
that
correspondence analysis, alias dual scaling, assigns real
numbers
or
"scores" to rows
and
columns
of
a table so as
to
optimize a particular criterion.
Consider a species-by-sample matrix Y
= [yk;]
(k
=
1,
...
, m; i =
1,
...
,
n)
of
nonnegative
real numbers, denoting the presence/absence (
Yki
= 1
or
0)
or
count
of
individuals
of
each
of
m species
inn
samples. Let u =
[uk]
(k =
1,
...
, m)
and
x =
[x;]
(i
=
1,
...
,
n)
contain
the scores for species (rows)
and
samples (columns), respectively.
In
correspondence analysis
these scores are chosen so
that
the weighted
sum
of
squares
of
the sample scores
is
maximum with respect
to
the weighted
sum
of
squares
of
the sample scores within species,
i.e., the criterion maximized is
(2.1)
where
z =
~i
Y+iXiY++
and
the subscript + denotes summation over
that
subscript.
Maximization
of
Jj2
will give each species a score close
to
the seores
of
those samples in
which it is abundant. (An alternative interpretation
ofthis
criterion is given
in
Section 4.3.)
With the Lagrange method
of
multipliers
and
the
sample scores centred so
that
z = 0,
we
obtain after some rearrangement the transition formulae
of
correspondence analysis (with

a=
0):
Correspondence Analysis and Unimodal Models
.>.
1
-"x; = L
Yk;Uk/Y+;
(i
=
1,
...
,
n),
k
A"Uk
= L
Yk;x;/yk+
(k
=
1,
...
, m),
i
861
(2.2)
(2.3)
where A is a real
number
(0 .., A
..,
1
).
The extra parameter a governs the scaling
of
the
species scores and the sample scores with respect
to
one another. There are three choices
of
a in common usage, namely a = 0,
1,
or
!.
Criterion (2.1) leads
to
a = 0. With a = 0,
the species scores
Uk
are weighted averages
of
the sample scores X; [equation (2.3)]
and
the
sample scores are proportional
to
the weighted averages
of
the species scores [equation
(2.2)]. With a =
1,
the role
of
species
and
samples is interchanged, also in the criterion
being maximized.
The
third choice, a =
!,
is a compromise in that it treats species and
sample scores in a symmetric way.
The transition formulae have more
than
one solution. All solutions can be obtained
from the singular value decomposition
ofR-
1
1
2
YC-
1
1
2
(see Hill, 1974) with R =
diag(Yk+)
and C = diag(
Y+;).
When the left
and
right normalized singular vectors in this decomposition
are denoted by
CJs
and
r., corresponding
to
singular value
Ps
=
~
(s = 0,
1,
2,
...
),
then
the solutions are
Us
=
PsR-
112
qsyW
and
Xs =
c-
112
rsy.VJ.
The
solutions are the "axes"
of
correspondence analysis
and
As
is termed the eigenvalue
of
the
sth
axis. The maximum
singular value is always
1,
corresponding
to
the trivial solution in which all sample
and
species scores equal
1.
The first nontrivial solution (s =
1)
is orthogonal
to
the trivial
solution, hence satisfies the previously applied centering
z = 0,
and
maximizes the criterion
IY with u =
u~,
x =
x~,
and
IY = 1/(1
-AI).
Moreover, the singular value decomposition
implies that the species
and
sample scores, u and x, approximate the data in a weighted
least squares sense by the bilinear model (see Nishisato,
1980)
(2.4)
with
ek;
=
Yk+Y+;/Y++•
the expectation under the assumption
of
row/column independence
in contingency tables.
3. A Unimodal Response Model
From now
on
the species-by-sample matrix Y will be assumed to consist either
of
counts
Yk;
that are independent Poisson variables with expected value
J.Lk;,
or
of
presence/absence
(1/0) data that are independent Bernoulli variables with probability
J.Lk;
that the
kth
species
is present in the
ith
sample.
The
models assumed for
J.Lk;
are loglinear
and
logistic-linear
models (Neider
and
Wedderburn, 1972) in which the linear predictor is a quadratic
polynomial in the latent variable
x.
It
is convenient
to
write these models in the form
(3.1)
where link is the logarithmic function for counts
and
the logistic function for the 1/0 data.
In (3.1) the parameters for the
kth
species are
ak,
the maximum
on
log
or
logit scale;
uk,
the mode
or
optimum (i.e., the value
of
x for which the maximum is attained); and
tk,
the
tolerance, a measure
of
ecological amplitude. The value
of
the latent variable in the
ith
sample is X;, which is treated as a fixed incidental parameter. Figure 1 displays
an
example
for
1/0 data.
The
loglinear model is precisely the "Gaussian" response curve that is
put
forward by ecologists as
an
ideal for species responses along a gradient [see Austin ( 1976)
and Gauch (1982) for reviews].

862
Biometrics, December
1985
p
12
5 3 4
5
7
9 8
X
10
Figure 1. Unimodal response curves ( 3.1) for the probability (
P)
of
occurrence along a latent variable
(x), fitted by correspondence analysis to Table
2.
The species optima
and
sample points are indicated
by ticks below
and
above the abscissa. The length
of
a tick is proportional to the number
of
sample
points. The numbers below the optima correspond to row numbers in Table
2.
The horizontal bar
is
1 tolerance unit.
The arbitrariness in the scale
of
the latent variable can be resolved, for example by
centering as in correspondence analysis
(L;
Y+;X; = 0) and by setting the mean square
of
the tolerances to unity (Lk
tVm
=
1),
so that the latent variable can be measured in (mean)
tolerance units. Then, the maximum likelihood equations for the parameters x
=
[x;]
(i =
1,
...
,
n)
and u = [uk]
(k
=
1,
...
, m) become, after some rearrangement,
X;=
L
Yki~k/L
Y~i-
[L
(X;-
~k)llki;·
L
Y~i].
k
tk
k
tk
k
tk
k
tk
.
(3.2)
Uk
=
~
Yk;X;/Yk+
- [
~
(X;-
Uk)/lki/Yk+].
(3.3)
These (implicit) equations could be simplified further by using the maximum likelihood
equations for the parameters a
=
[ak]
(k
=
1,
...
, m),
but
for the comparison with
correspondence analysis, (3.2) and (3.3) are sufficient.
4. Theoretical Comparisons
Hill's approach to correspondence analysis makes plausible that the species scores and
sample scores in Section 2 play a role similar to the species optima and sample values in
Section
3;
that
is
why similar symbols are used in Sections 2 and
3.
Our aim
is
to show
that the terms between square brackets in (3.2) and (3.3) are negligible in certain cases, so
that the maximum likelihood equations reduce effectively to the transitional formulae (2.2)
and (2.3)
of
correspondence analysis. These cases are as follows: either
/lki
is
small or
/lki
is
symmetric around
X;
and around
uk.

Citations
More filters
Journal ArticleDOI

Species assemblages and indicator species:the need for a flexible asymmetrical approach

TL;DR: A new and simple method to find indicator species and species assemblages characterizing groups of sites, and a new way to present species-site tables, accounting for the hierarchical relationships among species, is proposed.
Journal ArticleDOI

Canonical Correspondence Analysis: A New Eigenvector Technique for Multivariate Direct Gradient Analysis

TL;DR: In this article, a new multivariate analysis technique, called canonical correspondence analysis (CCA), was developed to relate community composition to known variation in the environment, where ordination axes are chosen in the light of known environmental variables by imposing the extra restriction that the axes be linear combinations of environmental variables.
Journal ArticleDOI

Ecologically meaningful transformations for ordination of species data

TL;DR: Transitions are proposed for species data tables which allow ecologists to use ordination methods such as PCA and RDA for the analysis of community data, while circumventing the problems associated with the Euclidean distance, and avoiding CA and CCA which present problems of their own in some cases.
Book ChapterDOI

A Theory of Gradient Analysis

TL;DR: In this article, the authors present a theory of gradient analysis, in which the heuristic techniques are integrated with regression, calibration, ordination and constrained ordination as distinct, well-defined statistical problems.
References
More filters
Journal ArticleDOI

Generalized Linear Models

TL;DR: In this paper, the authors used iterative weighted linear regression to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation.
Journal ArticleDOI

The Limiting Similarity, Convergence, and Divergence of Coexisting Species

TL;DR: The total number of species is proportional to the total range of the environment divided by the niche breadth of the species, which is reduced by unequal abundance of resources but increased by adding to the dimensionality of the niche.
Book ChapterDOI

Detrended correspondence analysis: an improved ordination technique

TL;DR: DCA consistently gives the most interpretable ordination results, but as always the interpretation of results remains a matter of ecological insight and is improved by field experience and by integration of supplementary environmental data for the vegetation sample sites.