Optimal simultaneous detection and estimation under a false alarm constraint

doi:10.1109/18.382015

688

IEEE

TRANSACTIONS ON INFORMATION

THEORY,

VOL.

41,

NO.

3,

MAY

1995

Optimal Simultaneous Detection and

Estimation Under

a

False Alarm Constraint

Bulent Baygun,

Member,

IEEE,

and

Alfred

0.

Hero

111,

Member,

ZEEE

Abstruct-

This paper addresses the problem of finite sample

simultaneous detection and estimation which arises when esti-

mation of signal parameters is desired but signal presence is

uncertain. In general,

a

joint detection and estimation algorithm

cannot simultaneously achieve optimal detection and optimal

estimation performance. In this paper we develop a multihy-

pothesis testing framework for studying the tradeoffs between

detection and parameter estimation (classification) for a finite

discrete parameter set.

Our

multihypothesis testing problem is

based on the worst case detection and worst case classification

error probabilities of the class of joint detection and classification

algorithms which are subject to a false alarm constraint. This

framework leads to the evaluation

of

greatest lower bounds on

the worst case decision error probabilities and a construction of

decision rules which achieve these lower bounds. For illustration,

we apply these methods to signal detection, order selection, and

signal classification for

a

multicomponent signal in noise model.

For two

or

fewer signals, an

SNR

of

3

dB and signal space

dimension of

AV

=

10

numerical results are obtained which

establish the existence of fundamental tradeoffs between three

performance criteria: probability of signal detection, probability

of

correct order selection, and probability of correct classification.

Furthermore, based on numerical performance comparisons be-

tween

our

optimal decision rule and other suboptimal penalty

function methods, we observe that Rissanen’s order selection

penalty method is nearly min-max optimal in some nonasymp-

totic regimes.

Index

Terms-

Simultaneous decisions, fundamental tradeoffs,

min-max criterion, order selection, signal classification, signal

detection. likelihood ratio.

I. INTRODUCTION

ANY statistical decision problems in engineering ap-

M

plications fall into one of two categories: detection

and point estimation. In the

detection

problem an observed

random quantity may consist of “noise alone” or “signal

masked by noise;” the objective is to decide if there is a

signal in the observation subject to a constraint on false

alarm. In the

point estimation

problem a signal which is

known to be present in the observations has an unknown

feature represented by a parameter; the objective is to decide

on the parameter value. However, one frequently encounters

applications where estimation has to be performed under

Manuscript received August

16,

1993; revised November

30,

1994.

The

work of one

of

the authors

(B.

Baygun) was supported in part by a graduate

fellowship from Mikes, Inc., throughout this research.

The

material in this

paper was presented in part at ICASSP-92, San Francisco, CA, March 23-26,

1992.

B.

Baygun is with Schlumberger-Doll Research, Ridgefield,

CT

06877

USA.

A.

0.

Hero

I11

is with the Department

of

Electrical Engineering and

Computer Science,

The

University of Michigan, Ann Arbor, MI 48109

USA.

IEEE

Log Number 9410399.

uncertainty of signal presence. These include applications

such as fault detection and diagnosis in dynamical system

control [24], target detection and direction finding with an

array of sensors [27], image and speech segmentation

[

131, and

digital communications

[

181. The associated decision problem

is called

simultaneous

or

joint detection

and

estimation.

If we constrain the probability of false alarm to be equal

to

CY,

one can consider two approaches to the design of

decision rules for joint detection and estimation. The first is the

simple

coupled design strategy

where detection performance

is optimized under the false alarm constraint and the estimator

is gated by this optimal detector. In this case, one can

implement a conditionally optimal estimator which produces

an estimate only if the optimal detector decides that the

signal is present. While this uncoupled strategy guarantees

optimal detection performance, in general there is no guarantee

that the gated estimation performance will be acceptable.

The second approach is the

coupled

design strategy where

estimation performance is directly optimized under the false

alarm constraint. As in the uncoupled design, the false alarm

constraint prescribes a gated estimator. However, while this

gating is optimal for estimation, unlike the uncoupled design

it is generally not optimal for detection. Note that under

both the coupled and uncoupled strategies the false alarm

probabilities are identical. However, while in the uncoupled

case the false alarms are generated in such a way as to

minimize their impact on detection performance, in the cou-

pled case these false alarms

are

generated to minimize their

impact on estimation performance. The uncoupled strategy

provides an upper bound on the detection performance while

the coupled strategy provides an upper bound on estimation

performance.

By

comparing the detectiodestimation perfor-

mance of the uncoupled detection-optimal strategy to the

detectiodestimation performance of the coupled estimation-

optimal strategy we can study the fundamental tradeoff be-

tween optimal detection and optimal estimation subject to a

false alarm constraint.

This paper provides a framework for studying the tradeoffs

between detection and estimation based on the worst case

detection and worst case estimation error probabilities of the

class of simultaneous detection and estimation rules for a

finite discrete parameter space. We then formulate and solve

a constrained min-max multihypothesis testing problem with

nonstandard cost structure. This gives the form for the optimal

estimator and optimal detector and gives tight lower bounds

on the worst case estimation and detection error probabilities

which can be used to study tradeoffs.

0018-9448/95$04.00

0

1995

IEEE

Authorized licensed use limited to: University of Michigan Library. Downloaded on February 12, 2009 at 14:21 from IEEE Xplore. Restrictions apply.

BAYGUN AND HERO: OPTIMAL SlMULTANEOUS DETECTION AND ESTIMATION

689

To illustrate our results, we focus on the following mul-

ticomponent signal in noise model. A measured waveform

Y

consists of either a compound signal in additive noise,

or noise alone. If present, the signal is the sum of

p

ran-

domly scaled waveforms (components), out of a possible

N

equal-power orthogonal waveforms

{SI,

.

. ,

SN}

which are

known

a priori.

That is, the signal is known to lie in an

N-

dimensional subspace, called the

signal space,

whose basis is

{

S1

,

. . .

,

SN}.

Hence the observation model has the form

Here both the number

p

and the identity (indices) of the

p

signal components

{Sil,.

.

Si,}

are unknown. Assume that

it is known

a priori

that

p

is upper-bounded by some given

constant

P, P

5

N.

We define three related objectives: i)

signal detection

which is to decide if

p

>

0;

ii)

signal power

estimation

(order selection) which, if

p

>

0,

is to specify

the actual number

p

E

(1,

...

, P}

of signal components;

and iii)

signal component estimation

(classification) which, if

p

=

p,

>

0,

is to identify the

p,

signal components present.

These objectives arise in a number of applications including

telecommunications, harmonic retrieval, surveillance, and air-

traffic control.

In the context of the multicomponent signal model (l), our

results yield the following structure for the optimal constrained

rules. The optimal constrained classifier uses a set of

M=f:

p=l

(:)

likelihood ratios (one for each hypothesized set

{Si,,

. .

.

,

Si,}

of signal components,

il,

. . . ,

i,

E

{

1,

. .

.

,

N},

p

=

1,.

.

, P)

to implement a weighted generalized-likelihood ratio test, with

randomized threshold, followed by a weighted maximum-

likelihood estimator. The optimal constrained order selector

uses a set of

P

weighted averages of

(r

)

likelihood ratios,

p

=

1,

.

. .

,

P,

each average corresponding to a fixed number

p

of signal components. The optimal constrained detector

compares a weighted average of all

M

likelihood ratios to

a threshold. In each of the above three cases the weights and

the detection threshold are determined by 1) the solution to a

related nonlinear optimization problem; and 2) the false alarm

constraint

a.

We show that the optimal constrained classifier in the

multiple-component signal example

(1)

has an equivalent

form: compare the maximum of the sum of the log-likelihood

function and an optimal penalty function of

p

to a threshold

and if the threshold is exceeded use this penalized log-

likelihood to perform maximum-likelihood estimation. This

penalized likelihood structure is closely related to Akaike’s

AIC [27], and Rissanen’s MDL [19] order selection cri-

teria. The common feature is that the optimal constrained

classifier, AIC, and MDL all penalize the log-likelihood for

overestimation of

p.

Unlike the AIC and MDL penalties,

the penalty associated with the optimal constrained classifier

ensures optimal worst case estimation performance in the

finite sample regime. Furthermore, this “optimal penalty” takes

specific account of a false alarm constraint. We perform a

numerical study in which we construct the optimal weight

functions for optimal detection. order selection, and classi-

fication, implement the optimal likelihood ratio tests, and

analyze the relative performances for the case of

p

=

2

or

fewer signal components. In this manner, we establish the

existence of significant tradeoffs between optimal detection,

optimal estimation, and optimal order selection. This study

also establishes the remarkable result that the MDL order se-

lection penalty is nearly optimal, in the sense of achieving the

finite sample min-max constrained classification performance

attained with our optimal penalty function, when

SNR

is

3

dB, signal space dimension is

N

=

10,

and the number of

independent snapshots is between 18 and 26.

A.

Relation to Previous Work

Optimal coupled design strategies for detection and estima-

tion have been studied by only a few authors. Pioneering works

along the lines of coupled design in simultaneous detection

and estimation include the papers by Middleton and Esposito

[14],

[15],

Fredriksen

et al..

[7], and Birdsall and Gobien

[3].

The common ground in each of these studies is the

Bayesian viewpoint; that is, the parameters

are

assigned prior

probabilities

so

that average performance can be optimized.

Kelly

et al..

[lo], [ll] studied the problem of simultaneous

detection and estimation using a combination

of

a generalized-

likelihood ratio test and a maximum-likelihood classifier.

They noted that this strategy is optimal only for certain

cases; our work reinforces this point by specifying conditions

for optimality of their strategy. Stuller [23] extended the

generalized-likelihood ratio test approach to multiple com-

posite hypothesis testing, by breaking the problem into a

sequence of binary composite hypothesis tests. He provided

rather stringent sufficient conditions for min-max optimality

of this strategy, pointing out that the question of min-max

optimality in the general case is yet to be investigated. The

min-max multiple hypothesis testing strategy presented in our

paper can also be interpreted as a sequence of binary composite

hypothesis tests, thereby providing a link to Stuller’s paper and

establishing the structure of optimal sequential binary tests.

An outline of the paper is as follows. Section I1 introduces

the statistical framework that will be used in this paper. Section

I11 provides theoretical results whose proofs are contained

in the Appendix. In Section

V,

we specialize the theory

to

three different problems: outlier detection and identification,

detection and classification of a step change, and detection

and parameter estimation of a multicomponent signal in noise.

11.

PROBLEM

STATEMENT

A parametric statistical experiment [9] is defined as the

indexed probability space

(R,o,

Po)

where

0

is a parameter

lying in a parameter space

0,

R

is the set of possible outcomes

of the experiment, is a sigma algebra consisting of subsets of

R,

and

Po

is a probability measure defined on

o.

The parameter

space

0

summarizes all of the uncertainty in the probability

model

Po

for the experiment. It is important to emphasize that

6’

is a fixed nonrandom parameter.

Authorized licensed use limited to: University of Michigan Library. Downloaded on February 12, 2009 at 14:21 from IEEE Xplore. Restrictions apply.

690

IEEE

TRANSACTIONS ON INFORMATION THEORY,

VOL.

41,

NO.

3.

MAY

1995

Define the finite partition, called a

(J

+

1)-ary partition,

{eo,.

. . ,

OJ}

of

0.

For fixed

B

=

etrue,

denoted the “true

8,”

let

X

be a random variable defined on

Cl

and taking values

in

a set

X

called the observation space. We assume that

X

has a probability density function

fe(x)

with respect to some

dominating measure

p.

Let

etrue

be contained in partition

element

0j

for a particular

j

E

(0,

.

. .

,

J}.

The objective

is to correctly decide on the partition element

0,

containing

etrue

based on a realization

X(w)

=

x

of

X.

We can express

this classification problem

in

terms

of

testing between the

J+

1

exhaustive and mutually exclusive hypotheses

[25]

-

HO:

X

N

fe,

8

E

00

When

Btrue

is contained in

Oj

the hypothesis

fIj

is said

to be true and the other hypotheses are said to be false. In

this case,

Ilj

is said to be the “true state of nature.” If the

partition elements

00,

. .

.

,

OJ

are single-point sets, then the

hypotheses

(2)

are called simple hypotheses. Otherwise, if

a

partition element

01

consists of more than one point

6’

then

specification of

El,

does not specify a unique distribution

PO

and

El

is called a composite hypothesis.

A

simple hypothesis

will be identified by the absence of an underscore, e.g.,

Hl.

We specialize our treatment to the case

of

a discrete pa-

rameter space

0

with

K

+

l

elements denoted by indices

(0;’.

.K}.

We will assume that

Oo

corresponds to the set

of

K

-

M

+

1

parameters

00

=

(0,

.

,

K

-

M}

where

M

is a positive integer less than or equal to

K.

We identify

two special partitions which will play an important role

in

the

sequel. The binary partition,

(00,

01}

where

O1

=

(K-M+

1,

.

,

K}.

which specifies

a

composite

detection

problem

EO:

xNfO,

6’E(o,”..K-hf}

I&:

X~f0,

8E{K-M+1,...,K}

(3)

where

Bo

is called the

null

hypothesis

and

El

is called

the

alternative hypothesis.

The

(M

+

1)-ary partition,

{

00,01,

.

. . ,

OM},

where

01,

.

,

Onf

are the single-point

sets

(K

-

M

+

I},

.

. .

,

{K},

respectively, specifies a

joint

detection-class~cation

problem with simple alternatives:

Eo:

X~f0,

6’€(0,....K-M}

HI:

X- f0.

B=K-M+l

The primary difference between detection

(3)

and joint

detection-classification

(4)

is that decision strategies for

detection can only be penalized for erroneously deciding

on the composite alternative

El

while decision strategies for

joint detection-classification can bear an additional penalty for

erroneous classification among the alternatives

H1,

.

,

H~z.

The set of decision strategies for the general

(J

+

l)-ary

hypothesis testing problem

(2)

is specified by the set of test

functions

[25].

Definition 1:

A

test function

4

=

[40,...,4~]~

for the

multiple hypotheses

go,

. .

.

,

&-is a

(J

+

1)-dimensional

vector function on

X

such that

-

4(x)

E

[0,

l](J+l)

and

J

q$(x)

=

1.vx

E

X.

3

=O

For a given realization

X

=

x,

4](x)

is the conditional

probability of deciding

E3.

Consequently,

1

-

4](x)

is

the

conditional probability of not deciding

HJ

and

43

(x)

+

&(x)

is the conditional probability

of

deciding either

KJ

or

H,.

The summation condition

J

43(x)

=

1

3=0

ensures that exactly one of

Eo,

.

fl,

must be decided.

Let

4

=

[q50.q51,...,4~,~]~

be an arbitrary test function

for testing among the hypotheses

E,,

HI,

. .

. ,

HM

.

This test

function defines a simultaneous detection-classification rule.

Specifically, since detection is a binary decision between

KO

:

6’

E

00

and

E,:

6’

E

GO,

where

A1

-

oO=O-OO=

UOk

k=l

the first element

40

of

-

4

specifies a binary test function

-

4D

for detection

r

iT

On the other hand, define

XE1

as the set of

x

for which

&(x)

#

1,

that is, for

X

=

5

E

Xg1

the decision

E,

occurs with nonzero probability. Then

-

4

specifies an M-ary

test function

4c

on

xH~

for classification

-

where

43(x)/(l

-

&(x))

is the conditional probability of

classifying

6’

into

0,

=

{e,}

given that

X

=

z

E

Xg1.

Conversely, if a test function

q5D

=

[@

,1-

@IT

for detec-

tion and

a

test function

4c

=

[G.

.

. .

.

4EZlT

for classification

are available, a simultaneous detection-classification rule

4

=

[40,

.

,4~]~

is easily constructed via the identification

-

4

=

[4F,

(1

-

4F)4?,.

..

.

(1

-

434w

(7)

We call

4

a “gated’ classification rule since the classification

rule

q5c

isenabled by the detection rule

4:

when

1

-

4:

#

0,

i.e., when signal detection can occur with nonzero probability.

The average performance of a particular test function

$

is determined by i) probability

of

false alarm

Pe(FA);

ii)

Authorized licensed use limited to: University of Michigan Library. Downloaded on February 12, 2009 at 14:21 from IEEE Xplore. Restrictions apply.

BAYGUN AND HERO: OPTIMAL SIMULTANEOUS DETECTION AND ESTIMATION

69

1

probability of

miss

Pe(M);

and iii) probability of

erroneous

classijication

PO

(EC)

null hypothesis

go

can be reduced to an equivalent simple

null hypothesis. Define the K-dimensional unit simplex

CK

Pe(FA)

=Ee[l

-

401,

0

E

00,

K

Po

(M)

=

Ee

[401,

Pe(EC)

=&[I

-

4rJ(e)l,

0

$00

(8)

CK

=

p

E

[o,llK:

Cpj

11

{-

j=O

0

e

eo

where

r~(0)

E

(0,

. . .

,

J}

is the

set partition function

which

takes the value

j

if

0

E

Oj.

We will be interested in those test functions whose false

alarm probability

Pe(FA)

is less than or equal to a prespec-

ified constant

a

E

[0,1] [25].

Definition

2:

A test function

q5

is

of

level

a

if

-

(9)

for a specified

a

E

[0,1].

The classical Neyman-Pearson criterion of signal detection

[12] states that it is desirable to minimize the miss prob-

ability

Pe(M),0

$!

00,

subject to the constraint (9). On

the other hand, in terms of signal classification, minimizing

Pe(EC),

0

e

00

is desirable. However, since

Pe(M)

and

Pe(EC)

generally vary as a function of 0,0-uniform mini-

mization of these probabilities is in general impossible and

a

different approach must be taken.

The weights

{be}eEe,

can be regarded as unit normalized

weights on the null states of nature

0

E

00;

the weights

{qj}f,l

can be regarded as unit normalized weights on

the composite states of nature

{Oj}&,;

and the weights

{ce/qj}ece, can be regarded as unit normalized weights on

the states of nature

0

E

Oj.

Consider the following reduced hypotheses:

111. CONSTRAINED

MIN-MAx

TESTS

HAb):

x

fib)

For the purposes of establishing &uniform lower bounds on

El:

X

N

fe,

0

E

01

Pe(M)

and

Pe(EC)

it makes sense to consider the form and

performance of constrained min-max test functions of level

of level

a

by

a.

Define the set

Va

of all test functions

4

=

[$o,

. . .

,4~]~

&:

X

N

fe,

0

E

OJ.

(16)

-

Note that relative to (2) the null hypothesis in (16) has been

reduced to a simple null hypothesis. Define the expectation

EA')[g(X)] of g(X) under the simple hypothesis

Hib)

J

j=O

x

H

[O,

l](Jfl),

+j

=

1,

Definition

3:

A

test function

4*

=

[4:,

.

'.

,

4?lT

is a

constrained min-max test of level-a between the hypotheses

Ho,...,HJ

if

4*

E

D,,

i.e.

The following theorem is proven in the Appendix.

-

Theorem

1:

For arbitrary

b

E

CK-M+1,

let

-

and if for any other test function

4

=

[40,

.

,

$J]*

E

D,

be a constrained min-max test of level

a

for testing among

the hypotheses (16) with simple null hypothesis

Hib).

If there

exists a weight vector

b

=

b*

such that

-

max

Eel1

-

4fJ(e)]

5

max

Ee[l

-

hJ(e)].

weo

eeeO

(12)

Observe that, if a constrained min-max test

$*

of level

a

can be found, the left-hand side of (12) provides& achievable max

Ee[l

-

4ib*)]

=

a

(18)

then

q5*ef4@*)

is a constrained min-max test of level

a

for testingamong the hypotheses (2) with composite null

hypothesis

E,.

Furthermore, such a

b*

exists if

eEeo

lower bound on the maximum error probability

maxEe[l

-

4TJ(@)]

eeoo

of any level

a

test.

The first step in deriving the form of constrained min-max

tests

-

4*

for the hypotheses in (2) is to show that a composite

Ehb*)[1

-

&*)I

=

(19)

Authorized licensed use limited to: University of Michigan Library. Downloaded on February 12, 2009 at 14:21 from IEEE Xplore. Restrictions apply.

692

IEEE

TRANSACTIONS

ON

INFORMATION THEORY,

VOL.

41,

NO. 3,

MAY

1995

and

b*

is a "least favorable prior distribution" in the sense

that for any other

4

E

CK-M+~

[l

-

&*)(Z)]fp*)(z)

dp(x).

(20)

The condition

(18)

says that for a specific

b*

the level

(Y

constrained min-max test

4@*

)

for the reduced hypotheses

(16) must also be of level for the original hypotheses (2).

Under this condition Theorem

1

states that the composite null

hypothesis

I&

can be reduced to simple null hypothesis

Hi')

by a

b

weighting of the

fe

over

8

E

00.

Once such a reduction

is achieved we need merely consider constrained min-max

tests for the hypotheses (16) with simple null hypothesis

Hib)

and then select appropriate

b*

to satisfy condition

(18).

The

existence of a weight vector

4*

which satisfies the sufficient

conditions

(19),

(20)

is related to the existence of a detector

having

constant false alarm rate

(CFAR) [21].

The following theorem, proven in the Appendix, specifies

the form of constrained min-max tests for the set of hypotheses

Ho:

x

-

fo

x

f,9,

8

E

01

H,:

x

N

fe,

8

E

OJ

(21)

where

fo

is an arbitrary pdf, e.g.,

fo

=

fib*).

define

yJ

and

f:')

as in

(14),

(15).

Let

Theorem

2:

Fix the level

CY

E

[O.

11. For arbitrary

c

E

CM,

and define the test function

and for

j

=

I,...

.J

and

j

=

j,,,

Io.

else

where

X

2

0

and

<

E

[O.

11 are functions

of

I:

selected to

satisfy the constraint on the false alarm probability

Eo[l

-

43

=

a.

(24)

Then there exists a weight vector

c

=

c*.

called the "optimal

weight vector," for which

and

4*ef4(,*)

defined by

(22)-(25)

is a constrained

min-Gax test

of

level

a

for testing among the hypotheses

Next we give a corollary which specifies the form of

the constrained min-max tests for composite hypotheses

E,

. .

. ,

Corollaly

1:

Fix the level

a

E

[0,1]. For arbitrary

c

E

CM

and

b

E

CK-M+~~

let

fi".qq3,

and

fjc).j

=

l,....J

.

be as

defined in (13)-(

15).

Let

j,,,

=

arg

max

y3

fjG)(z)

WO,

HI

,

. . .

,

HJ.

by combining the results of Theorems 1 and 2.

J>o

and define the test function

by the following assignments:

and

-

4(b,c*)

is a constrained min-max test of level

a

for testing

among the hypotheses (16) with simple null hypothesis

Hib).

Furthermore, if there exists a weight vector

b

=

b*

for which

then

$*~fq5(b*1c*)

defined by (26)-(30) is a constrained

min-max test

of

level

a

for testing among the hypotheses

(2) with composite null

EO.

Authorized licensed use limited to: University of Michigan Library. Downloaded on February 12, 2009 at 14:21 from IEEE Xplore. Restrictions apply.

Optimal simultaneous detection and estimation under a false alarm constraint

Summary (3 min read)

I. INTRODUCTION

A. Relation to Previous Work

probability of miss Pe(M); and iii) probability of erroneous classijication P O ( E C )

HAb): x fib)

I

Remark 6:

A. Detection and Classijication of Changes in a Distribution

Proof of Theorem 2

Citations

Cites background from "Optimal simultaneous detection and ..."

References

Related Papers (5)