©INTERNATIONAL REGIONAL SCIENCE REVIEW 20, 1 & 2: 103–111 (1997)
ESTIMATION OF SPATIAL REGRESSION MODELS
WITH AUTOREGRESSIVE ERRORS BY TWO-
STAGE LEAST SQUARES PROCEDURES:
A SERIOUS PROBLEM
HARRY H. KELEJIAN
Department of Economics, University of Maryland, College Park, MD 20742 USA
(kelejian@econ.umd.edu)
INGMAR R. PRUCHA
Department of Economics, University of Maryland, College Park, MD 20742 USA
(prucha@econ.umd.edu)
Time series regression models that have autoregressive errors are often estimated by two-stage proce-
dures which are based on the Cochrane-Orcutt (1949) transformation. It seems natural to also attempt
the estimation of spatial regression models whose error terms are autoregressive in terms of an analo-
gous transformation. Various two-stage least squares procedures suggest themselves in this context,
including an analog to Durbin’s (1960) procedure. Indeed, these procedures are so suggestive and
computationally convenient that they are quite “tempting.” Unfortunately, however, as shown in this
paper, these two-stage least squares procedures are generally, in a typical cross-sectional spatial con-
text, not consistent and therefore should not be used.
0
INTRODUCTION
The spatial autoregressive model studied by Cliff and Ord (1973, 1981),
which is a variant of the model considered by Whittle (1954), is widely used to
describe the properties of the error terms in spatial regressions. As typically
specified, the error terms of a spatial autoregressive model depend on two
unknown parameters. One is an autoregressive parameter, say ρ, and the other is
a variance, say . Interest often focuses on ρ as a measure of spatial depen-
dence, and also because it is a component of the generalized least estimator of the
regression parameters. However, consistent estimation of both ρ and is
important for making inferences based on the regression model.
Based on an analogy with the Cochrane-Orcutt (1949) transformation in a
linear time series model with autocorrelated error terms, one might think that, in
a spatial context, the parameter ρ can be estimated consistently by two-stage least
squares (2SLS) procedures. In particular, one might consider the estimation of
the parameter ρ by a procedure that is analogous to that suggested by Durbin
(1960) for linear time series models, referred to in the spatial literature as the spa-
tial Durbin procedure. Unfortunately, however, as shown below, under typical
0
Luc Anselin and Serge Rey provided helpful comments.
Received January 1997; revised April 1997.
σ
2
σ
2
104 INTERNATIONAL REGIONAL SCIENCE REVIEW, VOL. 20, NOS. 1 & 2, 1997
assumptions these procedures are, in general, not consistent. This point is impor-
tant, especially since these 2SLS procedures are computationally convenient and
therefore their use is “tempting.”
In this paper, the basic model is first specified, then results concerning the
inconsistency of the 2SLS procedures are presented, and finally some concluding
remarks are given in the last section. Technical details are relegated to the
Appendix.
THE MODEL
In this section, the regression model is specified, along with its assumptions.
Those assumptions are then discussed. The following concept will be needed for
the discussion. Let a
ij
denote the (i, j)-th element of an n by n matrix A. Then, the
row and column sums of A are said to be uniformly bounded in absolute value if
where c
a
is a finite constant.
1
The model considered is
, (1)
, (2)
where y is the n by 1 vector of observations on the dependent variable, X is the n
by k matrix of observations on k exogenous regressors, β is the k by 1 vector of
regression parameters, ε is the n by 1 vector of regression disturbances, ρ is the
scalar autoregressive parameter, W is an n by n weights matrix, and u is an n by
1 vector of innovation error terms.
Let u
i
be the i-th element of u, let Z be an n by q, q≥k, matrix of instru-
ments, and let . Then, assume the following:
ASSUMPTION 1: The u
i
’s are i.i.d. with mean 0 and finite variance .
ASSUMPTION 2: The elements of the weights matrix W are known constants,
and rank for all .
1
It can be shown that if two matrices, say A and B, are conformable for multiplication and their
row and column sums are uniformly bounded in absolute value, then the row and column sums of the
product matrix AB are also uniformly bounded in absolute value (see, e.g., Kelejian and Prucha
1995). Of course, if the row or column sums of a matrix are uniformly bounded in absolute value,
then this is also the case for each element.
a
ij
j 1=
n
∑
c
a
for all i≤ 1 … nn1≥;,,=
a
ij
i 1=
n
∑
c
a
for all j≤ 1 … nn1≥;,,=
yX
βε
+=
ερ
W
ε
u+=
PZW
′
Z
,
()
=
σ
2
I
ρ
W–
()
n=
ρ
1
<
KELEJIAN, PRUCHA: ESTIMATION OF SPATIAL REGRESSION MODELS 105
ASSUMPTION 3: The row and column sums of W and
are uniformly bounded in absolute value.
ASSUMPTION 4: The elements of the regressor matrix X are nonstochastic,
and X has full column rank.
ASSUMPTION 5: The elements of the instrument matrix Z are nonstochastic
and bounded in absolute value, and Z has full column rank.
ASSUMPTION 6: and where Q
x
and Q
p
are finite and nonsingular. Furthermore, and
are finite.
Assumptions 1 and 2 imply that and furthermore that
, where
. (3)
These two assumptions are typical in spatial autoregressive models unless special
complications are considered
2
(e.g., Cliff and Ord 1981: 198–9). Assumption 3 is
reasonable and should hold for most weights matrix specifications. For example,
the row and column sums of W will be uniformly bounded if W becomes a suffi-
ciently sparse matrix as n → ∞. Another example where this condition is satisfied
is the case in which the elements of W are row normalized and the maximum
number of nonzero elements in any given column remains bounded as n → ∞.
Next observe from (3) that, except for the scale factor ,
is the variance-covariance matrix of ε. The assumption that the row and column
sums of this matrix are uniformly bounded therefore restricts the extent of corre-
lations relating to the elements of ε. In particular, the assumption implies, as is
easily seen, that there exists some finite constant, say , such that
,
where denotes the correlation between and . Virtually all large
sample analyses restrict the extent of correlations in some way (see, e.g.,
Amemiya 1985, Ch. 3, 4; Pötscher and Prucha 1997, Ch. 5, 6; Anselin and Kele-
jian 1997). Assumption 4 is a standard condition in the context of the general lin-
ear regression model. Essentially, Assumption 4 rules out perfect
multicollinearity. Assumption 5 maintains that the instruments are nonstochastic.
One interpretation of this assumption is that the instruments are exogenous vari-
ables, and that the analysis is conditional upon their realized values. Assumption
6 relates to second order sample moments and is similar to those typically made
2
Among other things, these complications could relate to heteroskedasticity concerning the
innovation error terms, more general patterns of spatial correlation, and parametric specifications of
the weights matrix (see, e.g., Case 1991; Anselin 1990; Dubin 1988).
I ρ
W–
()
1–
I ρ
W
′
–
()
1–
n
1–
X′
XQ
x
=
n
∞
→
lim
lim n
n
∞→
1–
P′PQ
p
=
lim
n
n
∞→
1–
Z′X
lim
n
n
∞→
1–
Z
′
WX
ε I ρ
W–
()
1–
u=
E
εε′
()Ω
ε
=
Ω
ε
σ
2
I ρW
–
()
1–
I ρW′
–
()
1–
=
σ
2
I ρ
W–
()
1–
I ρ
W
′
–
()
1–
c
ω
n
1–
corr ε
i
ε
j
,()c
ω
∞ for all<≤
j 1=
n
∑
i 1=
n
∑
n
1
≥
corr
ε
i
ε
j
,
()
ε
i
ε
j
106 INTERNATIONAL REGIONAL SCIENCE REVIEW, VOL. 20, NOS. 1 & 2, 1997
in large sample analyses involving instrumental variable estimators (e.g., Judge
et al. 1985: 167–9).
TWO-STAGE LEAST SQUARES PROCEDURES
Applying the analog of a Cochrane-Orcutt (1949) transformation to (1) and
(2) and rearranging terms in analogy to Durbin’s (1960) approach yields
, (4)
which can also be written in an over-parameterized form as
(5)
where the restriction is not considered. Note that the model formula-
tions (4) and (5) have been called the spatial Durbin model (see, e.g., Anselin 1988).
3
The model in (1) implies that . It then follows from (2) and
Assumptions 1 and 2 that
.
Therefore, as noted in Anselin (1988: 58), the spatially lagged regressor, , is
correlated with the error term, u. One implication of this is that the parameters of
(5) cannot be consistently estimated by ordinary least squares, nor can the param-
eters of (4) be consistently estimated by nonlinear least squares.
In light of the correlation between and u, one might think of estimating
(4) by nonlinear 2SLS, or (5) by (linear) 2SLS. However, as will be demon-
strated, these procedures are, in general, not consistent. For this discussion, it
proves convenient to denote with
the stacked vector of the true
model parameters in (4). Furthermore, let denote some arbitrary a
priori permissible parameter vector (of corresponding dimensions). Rewrite (4)
as
with
. (6)
The function f(θ) is often referred to as the response function. The nonlinear
2SLS estimator of , say , based on the instruments Z is
now defined as the minimizer of
. (7)
3
These model formulations have also been considered by Burridge (1981) and Blommestein
(1983) and have been referred to in the spatial literature as the spatial common factor model.
y
ρ
WyX
ρ
WX–
()β
u++=
y
ρ
WyX
β
WX
γ
u+++=
γρβ
–=
WyWX
β
W
ε
+=
EWyu
′()σ
2
I ρ
W–
()
1–
0≠
=
Wy
Wy
θρβ′
,
()′
=
θρβ′,()′
=
yf
θ
()
u+=
f
θ
()ρ
WyX
ρ
WX–
()β
+=
θρβ′
,
()′
=
θ
ˆ
ρ
ˆ
β
ˆ
′,()′
=
R
n
θ()
n
1–
yf
θ()
–
[]′ZZ′Z()
1–
Z′
yf
θ()
–
[]
=
KELEJIAN, PRUCHA: ESTIMATION OF SPATIAL REGRESSION MODELS 107
Amemiya (1985: 246) gives conditions under which the nonlinear 2SLS esti-
mator is consistent. In terms of the model presented in this paper, one of
Amemiya’s conditions for the consistency of is that the matrix
(8)
has full column rank. For purposes of interpretation, if a model were linear in the
parameters, then the derivative of the response function with respect to the
parameters would be the regressor matrix, say S. In this case, H would then corre-
spond to the probability limit of .
4
From (6),
(9)
Note that the expected value of the first column of the n by k+1 matrix in (9) is a
vector of zero. Given this and the maintained assumptions, it is shown in the
appendix that the first column of H is also a vector of zeroes. It follows that H
does not have full column rank. The violation of Amemiya’s rank condition
implies that his proof of consistency does not apply to the nonlinear 2SLS estima-
tor corresponding to (4). It also suggests that there may be a fundamental “identi-
fication problem” in the sense that the objective function
becomes flat in the direction of as n tends toward infinity. That is, it suggests
that in the limit the minimum of is not associated with a unique value of
. That this is indeed the case for is now demonstrated.
The nonlinear 2SLS estimator can be viewed as a special case of an M-esti-
mator. A basic condition maintained in the general literature on M-estimators is
that the parameters be identifiably unique (see, e.g., Gallant and White 1988;
Pötscher and Prucha 1991, 1997). For the problem at hand, this translates into the
requirement that the limiting objective function
has a unique minimum at the true parameter value, i.e., for all
. Now observe that, for any given value of ,
4
In somewhat more detail, consider for a moment the classical case of a linear model, say
with , where S is the regressor matrix. In this case, the minimizer of (7),
i.e., the 2SLS estimator, can be expressed explicitly (in terms of the usual formula) as
. Furthermore, observe that in this case, .
Thus, in the linear case, Amemiya’s condition reduces to the standard requirement that
has full column rank.
θ
ˆ
ρ
ˆ
β
ˆ
′,()′
=
H p n
1–
Z′
f
θ()∂
θ∂
------------
n ∞→
lim
θθ
=
=
n
1–
Z′
S
yf
θ
()
u+=
f
θ
()
S
θ
=
θ
ˆ
S′ZZ′Z()
1–
Z′S[]
1–
S′ZZ′Z()
1–
Z′
y=
f
θ()∂θ∂⁄ S
=
plim
n
∞→
n
1–
Z′S
f θ()∂
θ∂
------------
θθ=
WyXβ–()X ρWX–(),[]=
W
ε X ρ
WX–
(),[]
.=
R
n
θ()R
n
ρβ,()
=
ρ
R
n
ρβ,()
ρ
ββ
=
R ρβ,()p R
n
ρβ,()
n
∞→
lim=
R ρβ,()R ρβ,()>
ρβ,()ρβ,()≠
ρ