A Necessary Condition for Semiparametric

Efficiency of Experimental Designs

Hisatoshi Tanaka

Waseda INstitute of Political EConomy

Waseda University

Tokyo, Japan

WINPEC Working Paper Series No.E2024

March 2021

A Necessary Condition for Semiparametric

Eﬃciency of Experimental Designs

Hisatoshi Tanaka

School of Political Science and Economics, Waseda University

Shinjuku, Tokyo 169-8050, Japan

hstnk@waseda.jp

Abstract. Eﬃciency of estimation depends not only on a method of

the estimation, but also on the distribution of data. In statistical ex-

p eriments, statisticians can at least partially design the data generating

pro cess to obtain high performance of the estimation. In this paper, a

necessary condition for the semiparametrically eﬃcient experimental de-

sign is prop osed. A formula to determine the eﬃcient distribution of input

variables is derived. An application to the optimal bid design problem of

contingent valuation survey experiments is presented.

Keywords: Optimal Design · Semiparametric Eﬃciency · Binary Re-

sp onse Mo del · Contingent Valuation Survey Experiments

1 Introduction

In this paper a class of simple statistical experiments described by a 4-tuple,

E = {(µ, ν, ρ, φ) : µ ∈ M, ν ∈ N}, (1)

is investigated, where M is a set of probability measures on (W, A), N is a set

of probability measures on (X, B), ρ is a measurable map from W ×X to (Y, C),

and φ is a functional on M. In every experiment (µ, ν, ρ, φ) ∈ E, input x is

drawn from ν, output y = ρ(ω, x) with ω ∼ µ is observed, and the value of φ(µ)

is estimated from (x, y).

For example, imagine that there exist n lightning bulbs, whose life time

hours ω

1

, . . . , ω

n

are i.i.d. random variables distributed according to µ. In order

to estimate the expected life time hours φ(µ) = Eω, the following experiment is

conducted. First, all n bulbs are turned on at time 0. Second, one of the bulbs is

sampled without replacement at time x and its status is observed. If the sampled

bulb is alive, y is set 1. If otherwise, y is set 0. The procedure is repeated n times

until all of the bulbs are sampled. Finally, data of n indep endent pairs (x

1

, y

1

),

···, (x

n

, y

n

) are obtained, and Eω or any other moments of ω will be consistently

estimated by existing eﬃcient estimation methods, such as the nonparametric

maximum likeliho od estimation.

To be noted here is that eﬃciency of the estimation depends not only on the

estimation method, but also on the distribution ν of x

1

, ···, x

n

. In an extreme

case where x

1

= . . . = x

n

= 0, trivial outcomes y

1

= . . . = y

n

= 1 will be

obtained unless some of the bulbs are with initial failure. In the opposite extreme

case where x

1

= . . . = x

n

= +∞, y

1

= . . . = y

n

= 0 will occur with probability

one. In both cases, data are so poorly informative that consistent estimation of

Eω is not possible. To ﬁnd the best distribution ν of x, with which the experiment

produces the most informative data, is therefore an interesting problem.

The paper is organized as follows. In Section 2, the problem of the paper is

formally stated. For the purpose, geometric theory of semiparametric estimation

is introduced. In the theory, every statistical model is considered as a point

on an inﬁnite dimensional manifold, and the eﬃcient design is formulated as a

minimizer of the Fisher-information norm of the gradient of a functional on the

manifold. In Section 3, a necessary condition for the eﬃcient design is proposed.

In Section 4, application examples of the main theorem are given. In particular,

the optimal bid design problem of contingent valuation survey experiments is

solved. In Section 5, results from small Monte Carlo simulations are reported.

It is numerically conﬁrmed that the eﬃciently designed estimations outperform

opponents even with small samples.

2 The Model

2.1 The tangent space of a statistical manifold

In this section, geometric theory of semiparametric estimation is introduced

to formulate the eﬃcient design problem. Terms and deﬁnitions given in the

following are according to [12]. Equivalent deﬁnitions are also found in [1], [2],

[3], and [11].

Let µ be a probability measure on (W, A). Let M be a set of probability

measures, which are absolutely continuous with respect to µ. A map t 7→ µ

t

from (−ϵ, ϵ) ⊂ R to M such that µ

0

= µ is diﬀerentiable in quadratic mean at

t = 0 if there exists α ∈ L

2

(µ) such that

lim

t→0

√

dµ

t

−

√

dµ

t

−

1

2

α

dµ

2

= 0. (1)

Proposition 1. A map t 7→ µ

t

is diﬀerentiable in quadratic mean at t = 0 if

(i) a map t 7→ ℓ

t

(ω) := dµ

t

/dµ(ω) is continuously diﬀerentiable on (−ϵ, ϵ) and

if (ii) a map t 7→

(

˙

ℓ

t

/ℓ

t

)

2

dµ

t

becomes continuous on (−ϵ, ϵ), where

˙

ℓ

t

(ω) =

(dℓ

t

/dt)(ω). Under conditions (i) and (ii),

˙

ℓ

0

becomes a tangent vector of M at

µ.

Proof. See e.g. Proposition 1 in page 13 of [3]. ⊓⊔

A collection of those diﬀerentiable maps t 7→ µ

t

is denoted by M(µ). A

tangent space T

µ

M of M at µ is a set of tangent vectors α as in (1). A tangent

2

bundle T M relates each µ with T

µ

M. A pair (M, T M) is a statistical manifold,

which is an inﬁnite dimensional analog of a standard ﬁnite-dimensional manifold.

On (M, T M), the Fisher-information metric µ 7→ ⟨·, ·⟩

1/2

µ

is deﬁned by

⟨α, α

′

⟩

µ

=

W

αα

′

dµ (2)

for every α and α

′

in T

µ

M. The Fisher-information norm ∥·∥

µ

is also given by

∥α∥

µ

= ⟨α, α⟩

1/2

µ

. The following proposition characterizes T

µ

M.

Proposition 2 ([10], [13]). Let T

µ

P(W) be the closure of a tangent space

T

µ

P(W) with respect to ∥ ·∥

µ

, then

T

µ

P(W) = L

0

2

(µ) :=

α ∈ L

2

(µ)

α dµ = 0

. (3)

Proof. Choose an arbitrary α ∈ L

0

2

(µ) and M > 0. Let α

0

M

= α

M

−

α

M

dµ,

where α

M

= α · {|α| ≤ M}. Deﬁne a map t 7→ µ

t

by

ℓ

t

=

dµ

t

dµ

= exp

tα

0

M

− γ

t

, γ

t

= log

exp(tα

0

M

) dµ

. (4)

Since |α

0

M

| ≤ M, (4) is well-deﬁned and t 7→ ℓ

t

(ω) becomes continuously diﬀer-

entiable with derivative

˙

ℓ

t

(ω) =

d

dt

dµ

t

dµ

(ω) =

α

0

M

(ω) −

α

0

M

exp(tα

0

M

) dµ

exp(tα

0

M

) dµ

exp

tα

0

M

(ω) − γ

t

(5)

at every ω ∈ W. A map t 7→

(

˙

ℓ

t

/ℓ

t

)

2

dµ

t

is also well-deﬁned and continuous

in t ∈ (−ϵ, ϵ ), hence t 7→ µ

t

is diﬀerentiable in quadratic mean at t = 0 with

derivative

˙

ℓ

0

= α

0

M

. Let M ↑ ∞, then ∥α − α

0

M

∥

µ

→ 0. Thus, α ∈ T

µ

M is

shown.

On the other hand, for every (µ

t

)

t∈(−ϵ,ϵ)

∈ M(µ) and α ∈ L

2

(µ), let ξ

k

=

k(

dµ

1/k

−

√

dµ) − (α/2)

√

dµ for k ∈ N. Then, as k → ∞,

ξ

2

k

→ 0 and

α dµ

=

α

2

dµ

2

dµ

≤

ξ

2

k

1/2

· 2

dµ

1/2

+

1

k

ξ

k

+

α

2

dµ

2

≤ o(1) +

1

k

o(1) +

1

2

∥α∥

µ

2

→ 0,

which implies T

µ

M ⊂ L

0

2

(µ). ⊓⊔

3

2.2 The score operator

Let N be a class of probability measures on (X, B), and let P be a class of

probability measures on (X × Y, σ(B × C)). For every P ∈ P, let P(P ) be a

collection of diﬀerentiable maps t ∈ (−ϵ, ϵ) 7→ P

t

∈ P such that P

0

= P . Let

T

P

P be the tangent space of P at P . The tangent bundle T P relates each P

with T

P

P. The Fisher-information norm on (P, T P) is ∥·∥

P

such that ∥β∥

P

=

β(x, y) dP (x, y)

1/2

for every β ∈ T

P

P. The closure of T

P

P with respect to

∥ · ∥

P

is L

0

2

(P ) as shown in Proposition 2.

Given a measurable map ρ : W×X 7→ Y, at every ν ∈ N, a map ρ

ν

: M 7→ P

deﬁned by

ρ

ν

(µ)(D) =

{(x, ρ(ω, x)) ∈ D}µ(dω)ν(dx), D ∈ σ(B × C), (6)

is a diﬀerentiable map between (M, T M) and (P, T P). To see this, note that

dρ

ν

(µ

t

)

dρ

ν

(µ)

(x, y) = E

µ,ν

dµ

t

dµ

(ω)

x, y

(7)

because

ρ

ν

(µ

t

)(D) =

dµ

t

dµ

(ω){(x, ρ(ω, x)) ∈ D}µ(dω)ν(dx)

= E

µ,ν

E

µ,ν

dµ

t

dµ

(ω)

x, y

{(x, y) ∈ D}

.

Particularly when dµ

t

/dµ = exp(tα − γ

t

), where α ∈ L

0

2

(µ) is bounded and

γ

t

= log

exp(tα) dµ, t 7→ ℓ

ρ

t

(x, y) := dρ

ν

(µ

t

)/dρ

ν

(µ)(x, y) is continuously dif-

ferentiable with derivative

˙

ℓ

ρ

t

(x, y) :=

d

dt

dρ

ν

(µ

t

)

dρ

ν

(µ)

(x, y)

= E

µ,ν

α −

α exp(tα) dµ

exp(tα) dµ

exp (tα − γ

t

)

x, y

. (8)

Since t 7→

(

˙

ℓ

ρ

t

/ℓ

ρ

t

)

2

dρ

ν

(µ

t

) is continuous, t 7→ ρ

ν

(µ

t

) is a diﬀerentiable path on

P with a tangent vector

˙

ℓ

ρ

0

(x, y) = E

µ,ν

(α|x, y).

The derivative of ρ

ν

: M 7→ P at µ is the score operator (dρ

ν

)

µ

: T

µ

M 7→

L

2

(ρ

ν

(µ)), which maps every α ∈ T

µ

M to

((dρ

ν

)

µ

α)(x, y) = E

µ,ν

(α|x, y), (x, y) ∈ X × Y. (9)

Then, a tangent space of a submanifold ρ

ν

(M) := {ρ

ν

(µ) ∈ P |µ ∈ M}, which

is a set of statistical models to be estimated in experiment (µ, ν, ρ, φ), is the

range of the score operator: that is,

T

ρ

ν

(µ)

ρ

ν

(M) = (dρ

ν

)

µ

(T

µ

M) = R((dρ

ν

)

µ

), (10)

4