Computing nonparametric functional estimates in semiparametric problems

doi:10.1080/07474939308800256

1

COMPUTING NONPARAMETRIC FUNCTIONAL ESTIMATES

IN SEMIPARAMETRIC PROBLEMS

Miguel

A.

Delgado

Universidad

Carlos

III de

Madrid

Getafe;

Madrid

28903; SPAIN

Key

Words

and

Phrases:

nonparametric

functionaL

estimation;

semiparametric

modeLs;

Fortran

routines.

ABSTRACT

The

purpose

of

this

note

is

to

provide

a

brief

account

of

available

FORTRAN

routines

for

computing

nonparametric

functional

estimates,

frequently

used

in

semiparametric

problems,

evaluated

at

each

data

point.

Then

semipararnetric

estimates

can

be

computed employing

the

user-favored

econometric

software.

1. INTRODUCTION

The

nonpararnetric

functionals

more

frequently

used

in

semiparametric

estimation

of

econometric

models

are

density

functions

and

their

derivatives,

and

regression

curves.

Different

semipa;ametr'

~

problems

require

different

nonparametric

estimation

methods.

Densities

aná

their

derivatives

are

usually

estimated

by

the

kernel

method.

Regression

curves

are

estimated

by

either

kernels

or

nearest

neighbors.

Given n

observations

{(X.,Y.), i=l,

...

,n}

of

a

random

variable

(X,Y),

where

1 1

Y

is

scalar

and

X

is

a

p-dimensional

random

vector,

nonpararnetric

estimates

of

the

regression

function

m(a:)=E(Y I

X=a:)

can

be

defined

as

m(a:)=

'f;=l

Y

i

Wi(a:J,

where

{W.(a:),

i=l,

...

,n}

is

a

sequence

of

weights.

Kernel

weights

are

1

defined

as

W.(a:)=

n-

1

K (a:-

X.)lf

(a:),

where

f (a:)=

n-l~

K (H-1(a:- X.))

is

1 H 1 H

:H

Li=l

H 1

COMPUTING NONPARAMETRIC FUNCTIONAL ESTIMATES

IN SEMIPARAMETRIC PROBLEMS

Miguel

A.

Delgado

Universidad

Carlos

III de

Madrid

Getafe;

Madrid

28903; SPAIN

Key

Words

and

Phrases:

nonparametric

functionaL

estimation;

semiparametric

modeLs;

Fortran

routines.

ABSTRACT

The

purpose

of

this

note

is

to

provide

a

brief

account

of

available

FORTRAN

routines

for

computing

nonparametric

functional

estimates,

frequently

used

in

semiparametric

problems,

evaluated

at

each

data

point.

Then

semipararnetric

estimates

can

be

computed employing

the

user-favored

econometric

software.

1. INTRODUCTION

The

nonpararnetric

functionals

more

frequently

used

in

semiparametric

estimation

of

econometric

models

are

density

functions

and

their

derivatives,

and

regression

curves.

Different

semipa;ametr'

~

problems

require

different

nonparametric

estimation

methods.

Densities

aná

their

derivatives

are

usually

estimated

by

the

kernel

method.

Regression

curves

are

estimated

by

either

kernels

or

nearest

neighbors.

Given n

observations

{(X.,Y.), i=l,

...

,n}

of

a

random

variable

(X,Y),

where

1 1

Y

is

scalar

and

X

is

a

p-dimensional

random

vector,

nonpararnetric

estimates

of

the

regression

function

m(a:)=E(Y I

X=a:)

can

be

defined

as

m(a:)=

'f;=l

Y

i

Wi(a:J,

where

{W.(a:),

i=l,

...

,n}

is

a

sequence

of

weights.

Kernel

weights

are

1

defined

as

W.(a:)=

n-

1

K (a:-

X.)lf

(a:),

where

f (a:)=

n-l~

K (H-1(a:- X.))

is

1 H 1 H

:H

Li=l

H 1

2

the

density

function

estimate

of

X

evaluated

at

a:,

and

K (u)= detCH)-¡ K(H-lu)

- H

is

the

kernel

with

scale

smoothing

matrix

H.

The

function

K(.)

integrates

to

one.

Nearest

neighbor

weights

are

defined

as

·W.(a:)= c

L

(a:,

k),

where

1 i

~

c.(a:, k)=

1,

c.(a:,k» O

when

i:s

k,

and

L.

is

the

rank

of

Xl'

according

to

Li

=1 1 1 1

increasing

distances

p(X.,

a:),

where

p(.,.)

is

sorne

distance

function

(if

1

p(X.

,a:) =

p(X.,

xl,

then

we

arbitrarily

call

X.

c10ser

to

a:

if

i<

jl.

This

1 J 1

tie-breaking-rule

was

suggested

by Devroye (1978)

and

is

computationally

convenient.

The

routines

described

below

compute

regression

estimates

and

many

other

like

density

estimates,

robust

conditional

M-estimates,

and

conditional

quantile

estimates.

50ft

copies

of

the

code

and

detailed

documentation

on

the

routines

can

be

obtained

by

e-mail

from

DELGADO@ECO.UC3M.ES,

or

sending

a

formatted

floppy

disk

and

a

self-address

stamped

envelope

to

the

author.

2.

ROUTINES FOR NONPARAMETRIC FUNCTIONAL ESTIMATION

The

output

of

the

routines

consists

of

a

vector

containing

m(X.l, i=l,

..

,n,

1

or

other

(conditional

robust

regression

and

conditional

quantiles

estimatesl

when

required.

Kernel

routines

al

so

provide

f (X.)

and

H 1

P (X.) =

m(X.)f

(X.l, i=l,

..

,n,

by

default.

H 1 1 H 1

The

ihput

for

kernel

estimation

consists

of

the

kernel

function

and

the

bandwidth

matrix.

Different

option

parameters

allow

to

choose

H

diagonal

or

H= h

t,

where

h

is

scalar

and

f:

is

the

sample

covariance

matrix

of

X.

The

user

can

choose

whether

or

not

employ

the

own

observation

when

computing

the

kernels.

If

instead

of

the

kernel

function,

kernel

derivatives

are-

provided,

the

output

will

consist

of

the

derivative

density

estimates

evaluated

at

each

data

point.

An

efficient

algorithm,

respect

to

storage

space

and

time,

is

employed

when

the

kernel

is

symmetric.

The

input

for

nearest

neighbor

estimation

consists

of

the

number

of

nearest

neighbors

k

and

the

weight

function.

The

user

can

choose

whether

or

not

employ

the

own

observation

when

computing

the

weights

and

the

distance

function.

The

nearest

neighbors

are

found

using

the

algoritnm

proposed

by

Friedman

et

al

(1985).

This

algorithm

is

pretty

fast

compared

with

a

brute

force

method.

the

density

function

estimate

of

X

evaluated

at

a:,

and

K (u)= detCH)-¡ K(H-lu)

- H

is

the

kernel

with

scale

smoothing

matrix

H.

The

function

K(.)

integrates

to

one.

Nearest

neighbor

weights

are

defined

as

·W.(a:)= c

L

(a:,

k),

where

1 i

~

c.(a:, k)=

1,

c.(a:,k» O

when

i:s

k,

and

L.

is

the

rank

of

Xl'

according

to

Li

=1 1 1 1

increasing

distances

p(X.,

a:),

where

p(.,.)

is

sorne

distance

function

(if

1

p(X.

,a:) =

p(X.,

xl,

then

we

arbitrarily

call

X.

c10ser

to

a:

if

i<

jl.

This

1 J 1

tie-breaking-rule

was

suggested

by Devroye (1978)

and

is

computationally

convenient.

The

routines

described

below

compute

regression

estimates

and

many

other

like

density

estimates,

robust

conditional

M-estimates,

and

conditional

quantile

estimates.

50ft

copies

of

the

code

and

detailed

documentation

on

the

routines

can

be

obtained

by

e-mail

from

DELGADO@ECO.UC3M.ES,

or

sending

a

formatted

floppy

disk

and

a

self-address

stamped

envelope

to

the

author.

2.

ROUTINES FOR NONPARAMETRIC FUNCTIONAL ESTIMATION

The

output

of

the

routines

consists

of

a

vector

containing

m(X.l, i=l,

..

,n,

1

or

other

(conditional

robust

regression

and

conditional

quantiles

estimatesl

when

required.

Kernel

routines

al

so

provide

f (X.)

and

H 1

P (X.) =

m(X.)f

(X.l, i=l,

..

,n,

by

default.

H 1 1 H 1

The

ihput

for

kernel

estimation

consists

of

the

kernel

function

and

the

bandwidth

matrix.

Different

option

parameters

allow

to

choose

H

diagonal

or

H= h

t,

where

h

is

scalar

and

f:

is

the

sample

covariance

matrix

of

X.

The

user

can

choose

whether

or

not

employ

the

own

observation

when

computing

the

kernels.

If

instead

of

the

kernel

function,

kernel

derivatives

are-

provided,

the

output

will

consist

of

the

derivative

density

estimates

evaluated

at

each

data

point.

An

efficient

algorithm,

respect

to

storage

space

and

time,

is

employed

when

the

kernel

is

symmetric.

The

input

for

nearest

neighbor

estimation

consists

of

the

number

of

nearest

neighbors

k

and

the

weight

function.

The

user

can

choose

whether

or

not

employ

the

own

observation

when

computing

the

weights

and

the

distance

function.

The

nearest

neighbors

are

found

using

the

algoritnm

proposed

by

Friedman

et

al

(1985).

This

algorithm

is

pretty

fast

compared

with

a

brute

force

method.

3

3.

APPLlCATIONS

TO

SEMIPARt\METRIC PROBLEMS

Many

semiparametric

estimates

appearing

in

the

recent

econometric

literature

can

be

computed

using

these

routines

and

standard

econometric

software.

We

mention

only a

few.

Generalized

least

squares

(GLS)

estimators

in

the

presence

of

heteroskedasticity

of

unknown

form,

proposed

by

Carrol!

(1982)

and

Robinson

(1987) among

others,

are

straightforward

to

compute

using

standard

econometric

software

(TSP,

SAS,

PC-GIVE

etc).

Once

ordinary

least

squares

(OLS)

residuals

are

obtained,

their

squares

and

the

set

of

regressors

observations

is

the

input

of

the

routines

and

the

output

is

the

vector

of

weights

in

the

GLS

procedure.

The

kernel

routines

are

suitable

for

computing

semiparametric

estimates

of

the

parameter

vector

(3

in

the

partial

linear

regression

model

E(Y I X ,X

)=

X'

(3+

9(X

).

Once E(Y

IX)

and

E(X

IX)

evaluated

at

each

data

point

121

2 2

12

are

estimated,

(3

can

be

estimated

as

suggested

by

Robinson

(1988)

by

linear

regression.

Optimal

instrumental

variables

(IV)

estimates

in

nonlinear

equation

systems,

proposed

by Newey (1990),

are

obtained

by

computing

the

optimal

instruments

using

our

routines.

The

input,

in

this

case,

is

the

vector

of

derivatives

of

the

error

function

evaluated

at

sorne

root-n-consistent

preliminary

estimate,

and

the

regressors

observations.

The

output

is

the

vector

of

optimal

instruments.

Our

program

also

includes a

routine

for

computing

optimal

instruments

by

resampling,

as

proposed

bv

RC)binson (1990). Once

optimal

instruments

are

available,

instrumental

variables

estimates

can

be

computed

using

TSP.

Density

derivative

estimates

are

needed

when

computing

average

derivatives

of

regression

functions

as

suggested

by

Powell

et

al

(1989).

Sorne

nonparametric

and

semiparametric

test

procedures,

e.g.

Robinson

(1989),

require

fH(X

i

),

PH(X

i

),

and

their

derivatives.

ACKNOWLEDGEMENTS

These

routines

were

written

in

the

summer

of

1990,

when

1

was

visiting

the

3.

APPLlCATIONS

TO

SEMIPARt\METRIC PROBLEMS

Many

semiparametric

estimates

appearing

in

the

recent

econometric

literature

can

be

computed

using

these

routines

and

standard

econometric

software.

We

mention

only a

few.

Generalized

least

squares

(GLS)

estimators

in

the

presence

of

heteroskedasticity

of

unknown

form,

proposed

by

Carrol!

(1982)

and

Robinson

(1987) among

others,

are

straightforward

to

compute

using

standard

econometric

software

(TSP,

SAS,

PC-GIVE

etc).

Once

ordinary

least

squares

(OLS)

residuals

are

obtained,

their

squares

and

the

set

of

regressors

observations

is

the

input

of

the

routines

and

the

output

is

the

vector

of

weights

in

the

GLS

procedure.

The

kernel

routines

are

suitable

for

computing

semiparametric

estimates

of

the

parameter

vector

(3

in

the

partial

linear

regression

model

E(Y I X ,X

)=

X'

(3+

9(X

).

Once E(Y

IX)

and

E(X

IX)

evaluated

at

each

data

point

121

2 2

12

are

estimated,

(3

can

be

estimated

as

suggested

by

Robinson

(1988)

by

linear

regression.

Optimal

instrumental

variables

(IV)

estimates

in

nonlinear

equation

systems,

proposed

by Newey (1990),

are

obtained

by

computing

the

optimal

instruments

using

our

routines.

The

input,

in

this

case,

is

the

vector

of

derivatives

of

the

error

function

evaluated

at

sorne

root-n-consistent

preliminary

estimate,

and

the

regressors

observations.

The

output

is

the

vector

of

optimal

instruments.

Our

program

also

includes a

routine

for

computing

optimal

instruments

by

resampling,

as

proposed

bv

RC)binson (1990). Once

optimal

instruments

are

available,

instrumental

variables

estimates

can

be

computed

using

TSP.

Density

derivative

estimates

are

needed

when

computing

average

derivatives

of

regression

functions

as

suggested

by

Powell

et

al

(1989).

Sorne

nonparametric

and

semiparametric

test

procedures,

e.g.

Robinson

(1989),

require

fH(X

i

),

PH(X

i

),

and

their

derivatives.

ACKNOWLEDGEMENTS

These

routines

were

written

in

the

summer

of

1990,

when

1

was

visiting

the

4

London

School

of

Economics.

This

work

was

s~pported

by

the

Economic

and

Social

Research

Council (ESRC),

reference

number:

R00023411.

REFERENCES

Carrol!,

R.J. (1982):

"Adapting

for

heteroscedasticity

in

linear

models",

Annals

of

Statistics

10, 1224-1233.

Devroye,

L. (1978),

"Uniform

convergence

of

nearest

neighbor

regression

function

estimators

and

their

application

in

optimization",

lEE

Transactions

in

lnformation

Theory,

IT-24,

142-

151.

Friedman,

J.H., F.

Baskett

and

L.J.

Shustek

(1975), "An

algorithm

for

finding

nearest

neighbors",

lEE

Transactions

on

Computers

C-24, 1149-1158.

Robinson,

P.M. (1988):

"Root-n-consistent

semiparametric

regression",

Econometrica

56, 931-954.

Robinson,

P.M. (1989):

"Hypothesis

testing

in

semiparametric

and

nonparametric

models

for

econometric

time

series",

Review

of

Economic

Studies

56,

511-534.

Robinson,

P.M. (1990):

"Best

nonlinear

three-stage

least

squares

of

certain

econometric

models",

Econometrica,

.

Newey,

W.

(1990):

"Efficient

instrumental

variable

estimation

of

non

linear

models",

Econometrica

58,

809-837.

Powel!

J.L.,

J.H.

Stock

and

T.M.

Stoker

(1989):

"Semiparametric

estimation

of

index

coefficients",

Econometrica

57, 1403-1430.

London

School

of

Economics.

This

work

was

s~pported

by

the

Economic

and

Social

Research

Council (ESRC),

reference

number:

R00023411.

REFERENCES

Carrol!,

R.J. (1982):

"Adapting

for

heteroscedasticity

in

linear

models",

Annals

of

Statistics

10, 1224-1233.

Devroye,

L. (1978),

"Uniform

convergence

of

nearest

neighbor

regression

function

estimators

and

their

application

in

optimization",

lEE

Transactions

in

lnformation

Theory,

IT-24,

142-

151.

Friedman,

J.H., F.

Baskett

and

L.J.

Shustek

(1975), "An

algorithm

for

finding

nearest

neighbors",

lEE

Transactions

on

Computers

C-24, 1149-1158.

Robinson,

P.M. (1988):

"Root-n-consistent

semiparametric

regression",

Econometrica

56, 931-954.

Robinson,

P.M. (1989):

"Hypothesis

testing

in

semiparametric

and

nonparametric

models

for

econometric

time

series",

Review

of

Economic

Studies

56,

511-534.

Robinson,

P.M. (1990):

"Best

nonlinear

three-stage

least

squares

of

certain

econometric

models",

Econometrica,

.

Newey,

W.

(1990):

"Efficient

instrumental

variable

estimation

of

non

linear

models",

Econometrica

58,

809-837.

Powel!

J.L.,

J.H.

Stock

and

T.M.

Stoker

(1989):

"Semiparametric

estimation

of

index

coefficients",

Econometrica

57, 1403-1430.

Computing nonparametric functional estimates in semiparametric problems

Citations

The log of gravity revisited

Count Data Models with Variance of Unknown Form: An Application to a Hedonic Model of Worker Absenteeism

Nonparametric and semiparametric methods for economic research

The estimation of transformation models

Risk-related asymmetries in foreign exchange markets

References

The Art of Computer Programming

Numerical recipes

Density Estimation for Statistics and Data Analysis

On Estimating Regression

Non-Uniform Random Variate Generation.

Related Papers (5)

Bayesian variants of some classical semiparametric regression techniques

Semiparametric estimation in general repeated measures problems

Consistent covariate selection and post model selection inference in semiparametric regression

Identification and Estimation of Regression Models with Misclassification

Asymptotic normality of series estimators for nonparametric and semiparametric regression models