1
COMPUTING NONPARAMETRIC FUNCTIONAL ESTIMATES
IN SEMIPARAMETRIC PROBLEMS
Miguel
A.
Delgado
Universidad
Carlos
III de
Madrid
Getafe;
Madrid
28903; SPAIN
Key
Words
and
Phrases:
nonparametric
functionaL
estimation;
semiparametric
modeLs;
Fortran
routines.
ABSTRACT
The
purpose
of
this
note
is
to
provide
a
brief
account
of
available
FORTRAN
routines
for
computing
nonparametric
functional
estimates,
frequently
used
in
semiparametric
problems,
evaluated
at
each
data
point.
Then
semipararnetric
estimates
can
be
computed employing
the
user-favored
econometric
software.
1. INTRODUCTION
The
nonpararnetric
functionals
more
frequently
used
in
semiparametric
estimation
of
econometric
models
are
density
functions
and
their
derivatives,
and
regression
curves.
Different
semipa;ametr'
~
problems
require
different
nonparametric
estimation
methods.
Densities
aná
their
derivatives
are
usually
estimated
by
the
kernel
method.
Regression
curves
are
estimated
by
either
kernels
or
nearest
neighbors.
Given n
observations
{(X.,Y.), i=l,
...
,n}
of
a
random
variable
(X,Y),
where
1 1
Y
is
scalar
and
X
is
a
p-dimensional
random
vector,
nonpararnetric
estimates
of
the
regression
function
m(a:)=E(Y I
X=a:)
can
be
defined
as
m(a:)=
'f;=l
Y
i
Wi(a:J,
where
{W.(a:),
i=l,
...
,n}
is
a
sequence
of
weights.
Kernel
weights
are
1
defined
as
W.(a:)=
n-
1
K (a:-
X.)lf
(a:),
where
f (a:)=
n-l~
K (H-1(a:- X.))
is
1 H 1 H
:H
Li=l
H 1
COMPUTING NONPARAMETRIC FUNCTIONAL ESTIMATES
IN SEMIPARAMETRIC PROBLEMS
Miguel
A.
Delgado
Universidad
Carlos
III de
Madrid
Getafe;
Madrid
28903; SPAIN
Key
Words
and
Phrases:
nonparametric
functionaL
estimation;
semiparametric
modeLs;
Fortran
routines.
ABSTRACT
The
purpose
of
this
note
is
to
provide
a
brief
account
of
available
FORTRAN
routines
for
computing
nonparametric
functional
estimates,
frequently
used
in
semiparametric
problems,
evaluated
at
each
data
point.
Then
semipararnetric
estimates
can
be
computed employing
the
user-favored
econometric
software.
1. INTRODUCTION
The
nonpararnetric
functionals
more
frequently
used
in
semiparametric
estimation
of
econometric
models
are
density
functions
and
their
derivatives,
and
regression
curves.
Different
semipa;ametr'
~
problems
require
different
nonparametric
estimation
methods.
Densities
aná
their
derivatives
are
usually
estimated
by
the
kernel
method.
Regression
curves
are
estimated
by
either
kernels
or
nearest
neighbors.
Given n
observations
{(X.,Y.), i=l,
...
,n}
of
a
random
variable
(X,Y),
where
1 1
Y
is
scalar
and
X
is
a
p-dimensional
random
vector,
nonpararnetric
estimates
of
the
regression
function
m(a:)=E(Y I
X=a:)
can
be
defined
as
m(a:)=
'f;=l
Y
i
Wi(a:J,
where
{W.(a:),
i=l,
...
,n}
is
a
sequence
of
weights.
Kernel
weights
are
1
defined
as
W.(a:)=
n-
1
K (a:-
X.)lf
(a:),
where
f (a:)=
n-l~
K (H-1(a:- X.))
is
1 H 1 H
:H
Li=l
H 1
2
the
density
function
estimate
of
X
evaluated
at
a:,
and
K (u)= detCH)-¡ K(H-lu)
- H
is
the
kernel
with
scale
smoothing
matrix
H.
The
function
K(.)
integrates
to
one.
Nearest
neighbor
weights
are
defined
as
·W.(a:)= c
L
(a:,
k),
where
1 i
~
c.(a:, k)=
1,
c.(a:,k» O
when
i:s
k,
and
L.
is
the
rank
of
Xl'
according
to
Li
=1 1 1 1
increasing
distances
p(X.,
a:),
where
p(.,.)
is
sorne
distance
function
(if
1
p(X.
,a:) =
p(X.,
xl,
then
we
arbitrarily
call
X.
c10ser
to
a:
if
i<
jl.
This
1 J 1
tie-breaking-rule
was
suggested
by Devroye (1978)
and
is
computationally
convenient.
The
routines
described
below
compute
regression
estimates
and
many
other
related
functionals
like
density
estimates,
robust
conditional
M-estimates,
and
conditional
quantile
estimates.
50ft
copies
of
the
code
and
detailed
documentation
on
the
routines
can
be
obtained
by
e-mail
from
DELGADO@ECO.UC3M.ES,
or
sending
a
formatted
floppy
disk
and
a
self-address
stamped
envelope
to
the
author.
2.
ROUTINES FOR NONPARAMETRIC FUNCTIONAL ESTIMATION
The
output
of
the
routines
consists
of
a
vector
containing
m(X.l, i=l,
..
,n,
1
or
other
related
functionals
(conditional
robust
regression
and
conditional
quantiles
estimatesl
when
required.
Kernel
routines
al
so
provide
f (X.)
and
H 1
P (X.) =
m(X.)f
(X.l, i=l,
..
,n,
by
default.
H 1 1 H 1
The
ihput
for
kernel
estimation
consists
of
the
kernel
function
and
the
bandwidth
matrix.
Different
option
parameters
allow
to
choose
H
diagonal
or
H= h
t,
where
h
is
scalar
and
f:
is
the
sample
covariance
matrix
of
X.
The
user
can
choose
whether
or
not
employ
the
own
observation
when
computing
the
kernels.
If
instead
of
the
kernel
function,
kernel
derivatives
are-
provided,
the
output
will
consist
of
the
derivative
density
estimates
evaluated
at
each
data
point.
An
efficient
algorithm,
respect
to
storage
space
and
time,
is
employed
when
the
kernel
is
symmetric.
The
input
for
nearest
neighbor
estimation
consists
of
the
number
of
nearest
neighbors
k
and
the
weight
function.
The
user
can
choose
whether
or
not
employ
the
own
observation
when
computing
the
weights
and
the
distance
function.
The
nearest
neighbors
are
found
using
the
algoritnm
proposed
by
Friedman
et
al
(1985).
This
algorithm
is
pretty
fast
compared
with
a
brute
force
method.
the
density
function
estimate
of
X
evaluated
at
a:,
and
K (u)= detCH)-¡ K(H-lu)
- H
is
the
kernel
with
scale
smoothing
matrix
H.
The
function
K(.)
integrates
to
one.
Nearest
neighbor
weights
are
defined
as
·W.(a:)= c
L
(a:,
k),
where
1 i
~
c.(a:, k)=
1,
c.(a:,k» O
when
i:s
k,
and
L.
is
the
rank
of
Xl'
according
to
Li
=1 1 1 1
increasing
distances
p(X.,
a:),
where
p(.,.)
is
sorne
distance
function
(if
1
p(X.
,a:) =
p(X.,
xl,
then
we
arbitrarily
call
X.
c10ser
to
a:
if
i<
jl.
This
1 J 1
tie-breaking-rule
was
suggested
by Devroye (1978)
and
is
computationally
convenient.
The
routines
described
below
compute
regression
estimates
and
many
other
related
functionals
like
density
estimates,
robust
conditional
M-estimates,
and
conditional
quantile
estimates.
50ft
copies
of
the
code
and
detailed
documentation
on
the
routines
can
be
obtained
by
e-mail
from
DELGADO@ECO.UC3M.ES,
or
sending
a
formatted
floppy
disk
and
a
self-address
stamped
envelope
to
the
author.
2.
ROUTINES FOR NONPARAMETRIC FUNCTIONAL ESTIMATION
The
output
of
the
routines
consists
of
a
vector
containing
m(X.l, i=l,
..
,n,
1
or
other
related
functionals
(conditional
robust
regression
and
conditional
quantiles
estimatesl
when
required.
Kernel
routines
al
so
provide
f (X.)
and
H 1
P (X.) =
m(X.)f
(X.l, i=l,
..
,n,
by
default.
H 1 1 H 1
The
ihput
for
kernel
estimation
consists
of
the
kernel
function
and
the
bandwidth
matrix.
Different
option
parameters
allow
to
choose
H
diagonal
or
H= h
t,
where
h
is
scalar
and
f:
is
the
sample
covariance
matrix
of
X.
The
user
can
choose
whether
or
not
employ
the
own
observation
when
computing
the
kernels.
If
instead
of
the
kernel
function,
kernel
derivatives
are-
provided,
the
output
will
consist
of
the
derivative
density
estimates
evaluated
at
each
data
point.
An
efficient
algorithm,
respect
to
storage
space
and
time,
is
employed
when
the
kernel
is
symmetric.
The
input
for
nearest
neighbor
estimation
consists
of
the
number
of
nearest
neighbors
k
and
the
weight
function.
The
user
can
choose
whether
or
not
employ
the
own
observation
when
computing
the
weights
and
the
distance
function.
The
nearest
neighbors
are
found
using
the
algoritnm
proposed
by
Friedman
et
al
(1985).
This
algorithm
is
pretty
fast
compared
with
a
brute
force
method.
3
3.
APPLlCATIONS
TO
SEMIPARt\METRIC PROBLEMS
Many
semiparametric
estimates
appearing
in
the
recent
econometric
literature
can
be
computed
using
these
routines
and
standard
econometric
software.
We
mention
only a
few.
Generalized
least
squares
(GLS)
estimators
in
the
presence
of
heteroskedasticity
of
unknown
form,
proposed
by
Carrol!
(1982)
and
Robinson
(1987) among
others,
are
straightforward
to
compute
using
standard
econometric
software
(TSP,
SAS,
PC-GIVE
etc).
Once
ordinary
least
squares
(OLS)
residuals
are
obtained,
their
squares
and
the
set
of
regressors
observations
is
the
input
of
the
routines
and
the
output
is
the
vector
of
weights
in
the
GLS
procedure.
The
kernel
routines
are
suitable
for
computing
semiparametric
estimates
of
the
parameter
vector
(3
in
the
partial
linear
regression
model
E(Y I X ,X
)=
X'
(3+
9(X
).
Once E(Y
IX)
and
E(X
IX)
evaluated
at
each
data
point
121
2 2
12
are
estimated,
(3
can
be
estimated
as
suggested
by
Robinson
(1988)
by
linear
regression.
Optimal
instrumental
variables
(IV)
estimates
in
nonlinear
equation
systems,
proposed
by Newey (1990),
are
obtained
by
computing
the
optimal
instruments
using
our
routines.
The
input,
in
this
case,
is
the
vector
of
derivatives
of
the
error
function
evaluated
at
sorne
root-n-consistent
preliminary
estimate,
and
the
regressors
observations.
The
output
is
the
vector
of
optimal
instruments.
Our
program
also
includes a
routine
for
computing
optimal
instruments
by
resampling,
as
proposed
bv
RC)binson (1990). Once
optimal
instruments
are
available,
instrumental
variables
estimates
can
be
computed
using
TSP.
Density
derivative
estimates
are
needed
when
computing
average
derivatives
of
regression
functions
as
suggested
by
Powell
et
al
(1989).
Sorne
nonparametric
and
semiparametric
test
procedures,
e.g.
Robinson
(1989),
require
fH(X
i
),
PH(X
i
),
and
their
derivatives.
ACKNOWLEDGEMENTS
These
routines
were
written
in
the
summer
of
1990,
when
1
was
visiting
the
3.
APPLlCATIONS
TO
SEMIPARt\METRIC PROBLEMS
Many
semiparametric
estimates
appearing
in
the
recent
econometric
literature
can
be
computed
using
these
routines
and
standard
econometric
software.
We
mention
only a
few.
Generalized
least
squares
(GLS)
estimators
in
the
presence
of
heteroskedasticity
of
unknown
form,
proposed
by
Carrol!
(1982)
and
Robinson
(1987) among
others,
are
straightforward
to
compute
using
standard
econometric
software
(TSP,
SAS,
PC-GIVE
etc).
Once
ordinary
least
squares
(OLS)
residuals
are
obtained,
their
squares
and
the
set
of
regressors
observations
is
the
input
of
the
routines
and
the
output
is
the
vector
of
weights
in
the
GLS
procedure.
The
kernel
routines
are
suitable
for
computing
semiparametric
estimates
of
the
parameter
vector
(3
in
the
partial
linear
regression
model
E(Y I X ,X
)=
X'
(3+
9(X
).
Once E(Y
IX)
and
E(X
IX)
evaluated
at
each
data
point
121
2 2
12
are
estimated,
(3
can
be
estimated
as
suggested
by
Robinson
(1988)
by
linear
regression.
Optimal
instrumental
variables
(IV)
estimates
in
nonlinear
equation
systems,
proposed
by Newey (1990),
are
obtained
by
computing
the
optimal
instruments
using
our
routines.
The
input,
in
this
case,
is
the
vector
of
derivatives
of
the
error
function
evaluated
at
sorne
root-n-consistent
preliminary
estimate,
and
the
regressors
observations.
The
output
is
the
vector
of
optimal
instruments.
Our
program
also
includes a
routine
for
computing
optimal
instruments
by
resampling,
as
proposed
bv
RC)binson (1990). Once
optimal
instruments
are
available,
instrumental
variables
estimates
can
be
computed
using
TSP.
Density
derivative
estimates
are
needed
when
computing
average
derivatives
of
regression
functions
as
suggested
by
Powell
et
al
(1989).
Sorne
nonparametric
and
semiparametric
test
procedures,
e.g.
Robinson
(1989),
require
fH(X
i
),
PH(X
i
),
and
their
derivatives.
ACKNOWLEDGEMENTS
These
routines
were
written
in
the
summer
of
1990,
when
1
was
visiting
the
4
London
School
of
Economics.
This
work
was
s~pported
by
the
Economic
and
Social
Research
Council (ESRC),
reference
number:
R00023411.
REFERENCES
Carrol!,
R.J. (1982):
"Adapting
for
heteroscedasticity
in
linear
models",
Annals
of
Statistics
10, 1224-1233.
Devroye,
L. (1978),
"Uniform
convergence
of
nearest
neighbor
regression
function
estimators
and
their
application
in
optimization",
lEE
Transactions
in
lnformation
Theory,
IT-24,
142-
151.
Friedman,
J.H., F.
Baskett
and
L.J.
Shustek
(1975), "An
algorithm
for
finding
nearest
neighbors",
lEE
Transactions
on
Computers
C-24, 1149-1158.
Robinson,
P.M. (1988):
"Root-n-consistent
semiparametric
regression",
Econometrica
56, 931-954.
Robinson,
P.M. (1989):
"Hypothesis
testing
in
semiparametric
and
nonparametric
models
for
econometric
time
series",
Review
of
Economic
Studies
56,
511-534.
Robinson,
P.M. (1990):
"Best
nonlinear
three-stage
least
squares
of
certain
econometric
models",
Econometrica,
.
Newey,
W.
(1990):
"Efficient
instrumental
variable
estimation
of
non
linear
models",
Econometrica
58,
809-837.
Powel!
J.L.,
J.H.
Stock
and
T.M.
Stoker
(1989):
"Semiparametric
estimation
of
index
coefficients",
Econometrica
57, 1403-1430.
London
School
of
Economics.
This
work
was
s~pported
by
the
Economic
and
Social
Research
Council (ESRC),
reference
number:
R00023411.
REFERENCES
Carrol!,
R.J. (1982):
"Adapting
for
heteroscedasticity
in
linear
models",
Annals
of
Statistics
10, 1224-1233.
Devroye,
L. (1978),
"Uniform
convergence
of
nearest
neighbor
regression
function
estimators
and
their
application
in
optimization",
lEE
Transactions
in
lnformation
Theory,
IT-24,
142-
151.
Friedman,
J.H., F.
Baskett
and
L.J.
Shustek
(1975), "An
algorithm
for
finding
nearest
neighbors",
lEE
Transactions
on
Computers
C-24, 1149-1158.
Robinson,
P.M. (1988):
"Root-n-consistent
semiparametric
regression",
Econometrica
56, 931-954.
Robinson,
P.M. (1989):
"Hypothesis
testing
in
semiparametric
and
nonparametric
models
for
econometric
time
series",
Review
of
Economic
Studies
56,
511-534.
Robinson,
P.M. (1990):
"Best
nonlinear
three-stage
least
squares
of
certain
econometric
models",
Econometrica,
.
Newey,
W.
(1990):
"Efficient
instrumental
variable
estimation
of
non
linear
models",
Econometrica
58,
809-837.
Powel!
J.L.,
J.H.
Stock
and
T.M.
Stoker
(1989):
"Semiparametric
estimation
of
index
coefficients",
Econometrica
57, 1403-1430.