Stability and performance analysis of pitch filters in speech coders

doi:10.1109/TASSP.1987.1165238

IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING,

VOL.

ASSP-35, NO.

7,

JULY

1987

931

Stability and Performance Analysis

of

Pitch Filters

in

Speech Coders

RAVI P. RAMACHANDRAN

AND

PETER KABAL

Abstract-This paper analyzes the stability and performance of pitch

filters in speech coding when pitch prediction is combined with formant

prediction.

A

computationally simple stability test based

on

a sufficient

condition is formulated

for

pitch synthesis filters. For typical orders

of

pitch filters, this sufficient test is very tight. Based on the test, a simple

stabilization technique that minimizes the loss in prediction gain of the

pitch predictor is employed to generate stable synthesis filters. Finally,

it is observed that the quality

of

decoded speech improves significantly

when stable synthesis filters are employed.

I. INTRODUCTION

I

N

the speech coderi considered in this paper, two non-

recursive prediction error filters are used to process the

incoming speech signal. The first which removes near-

sample redundancies is referred to here as the formant

predictor. It

is

followed by the pitch predictor which re-

moves distant-sample based redundancies. The resulting

residual signal after both formant and pitch prediction is

then coded for transmission. An adaptive predictive coder

(APC) places these predictors in a feedback loop around

the residual quantizer. An additional quantization noise

shaping filter can also be employed to reduce the percep-

tual distortion in the decoded speech

[

11,

[2]. An alternate

description of an APC coder uses an open-loop predictor

configuration and a noise feedback filter

[3].

A block dia-

gram

of

such a configuration is shown in Fig.

l.

This type

of open-loop arrangement is also used in code-excited lin-

ear prediction (CELP) [4]. In CELP, the coding is accom-

plished by selecting the candidate waveform (from a dic-

tionary) that best represents the residual. Also, noise

shaping is accomplished implicitly in the process of

choosing a representational residual signal.

In both APC and CELP, the residual signal or the se-

lected codeword (after scaling by the gain factor) is passed

through a pitch synthesis and a formant synthesis filter to

reproduce the decoded speech. The filtering in the syn-

thesis phase can be viewed in the frequency domain as

first inserting the fine pitch structure and then inserting

the spectral envelope (formant structure). The synthesis

Manuscript received June 13, 1986; revised January

12,

1987. This work

was supported by the Natural Sciences and Engineering Research Council

of

Canada.

R.

P.

Ramachandran is with the Department

of

Electrical Engineering,

McGill University, Montreal,

P.Q.,

Canada, H3A

2A7.

P.

Kabal

is

with the Department of Electrical Engineering, McGill Uni-

versity, Montreal,

P.Q.,

Canada, H3A 2A7, and INRS-Tblkommunica-

tions, Universitb du QuBbec, Verdun,

P.Q.,

Canada, H3E 1H6.

IEEE

Log

Number 8714483.

U

(b)

Fig.

1.

Block diagram

of

an

APC

coder with

noise

feedback.

(a)

Analysis

phase.

(b)

Synthesis phase.

filters are recursive and potentially may be unstable. For

the formant filter, the autocorrelation

[5],

modified co-

variance

[

11,

[6],

or Burg [7] methods can be used to de-

termine filter coefficients which ensure stability of the for-

mant synthesis filter.

The procedure to determine a set of pitch predictor coef-

ficients can result in an unstable pitch synthesis filter. This

paper addresses the stability and performance issues

of

the pitch filter by formulating a computationally simple

stability test based on a tight sufficient condition, intro-

ducing a stabilization technique, and evaluating the per-

formance of the resulting suboptimum predictor. The

ef-

fect of unstable pitch synthesis filters on decoded speech

is also examined for a CELP system.

It should be noted that the filters in the speech coder

are updated frame by frame and hence form a time vary-

ing system. Conventional notions of stability are in es-

sence asymptotic properties of systems.

In

speech coding,

an “unstable” filter may persist for a few frames (often

corresponding to an interval with increasing energy-see

the experimental results cited later), but eventually pe-

riods of stable filters are encountered. This means that, in

practice, the output does not continue to increase in am-.

plitude with time.

Consider the canonical case of an all-zero prediction

error filter in cascade with a quantizer, followed by an all-

pole synthesis filter. The quantizer can be modeled as

adding noise (possibly correlated with the signal) to the

residual signal.

As long as the synthesis filter is the in-

verse to the prediction error filter and the filter coefficients

are updated in step, the signal component emerges unal-

tered. For the signal component, stability is not a problem

because of polelzero cancellation. However, the quanti-

zation noise passes through only the synthesis filter. An

“unstable” synthesis filter can cause the output noise to

0096-3518/87/0700-0937$01

.OO

O

1987

IEEE

938

IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING,

VOL.

ASSP-35, NO.

7,

JULY

1987

build up during the period of instability and can lead to

degraded speech quality.

The effect of the quantization noise may be measured

in a number of different ways. If the quantization noise is

modeled as white noise, the output noise power can be

expressed as the input noise power multiplied by the

power gain of the filter. The power gain is the sum of the

squares of the filter coefficients. The sufficient stability

test introduced later results in a power gain for the pitch

synthesis filter that is less than unity.

11.

FORMANT

AND

PITCH PREDICTORS

The formant predictor has a transfer function

Nf

F(z)

=

akz-k.

(1)

k=

1

The order

Nf

is typically between

8

and 16. The system

function of the noise feedback filter is related to that of

the formant predictor and is expressed as

N(z)

=

F(

z

/y

),

where

0

<

y

<

1.

The pitch predictor has a small number of taps (typi-

cally

1-3

)

centered around a large delay

M

corresponding

to the estimated pitch period in samples. The system func-

tion for the predictor

is

PlZ

-M

1

tap

P(z)

=

+

P*z

-(M+l)

{olz-(M-l)

+

02z-*

+

~~z-(~+')

2

3

tap tap.

(2)

At the receiver, the formant and pitch synthesis filters have

transfer functions

&(z)

=

1

/(

1

-

F(z))

and

Hp(z)

=

1/(1

-

P(z))),

respectively.

In the case of a pitch predictor, both the value of

M

(pitch lag) and the predictor coefficients have to be deter-

mined. The conventional strategy to determine the pitch

lag is to search for the lag corresponding to the peak value

of

the correlation of the input signal [referred to as

d

(n)]

to the pitch predictor

[

13.

The search range is often lim-

ited to those pitch values encountered in speech. After the

value

of

M

is determined, the coefficients of

P(

z)

are

found by minimizing the mean-square value of the resid-

ual over a frame size of

N

samples

[

11,

[6]. This covari-

ance-type formulation results in a linear system

of

equa-

tions

+fl

=

a,

where

+

is a matrix of correlation terms,

p

is the vector of predictor coefficients, and

a

is a vector

of correlation terms. Specifically for a

3

tap predictor, the

system of equations is

where

N-

1

+(i,j)

=

c

d(n

-

i)d(n

-j).

(4)

n=O

Given any vector of predictor coefficients

p,

the energy

of the prediction residual is

E2

=

4(0,

0)

-

20%

+

p%p.

(5

1

The residual energy for the optimum predictor is

&in

=

+(O,

0)

-

p*a.

(6)

A normalized quantity related to

e2

will be used in the

sequel as the performance measure. The prediction gain

is defined to be the ratio (in decibels) of the energy

of the

signal at the input to the predictor to the energy of the

prediction residual.

111.

STABILITY TEST

FOR

PITCH SYNTHESIS FILTERS

The covariance formulation does not guarantee that

Hp

(z

)

is a stable function. To ensure stability, the de-

nominator polynomial

D

(z)

of

Hp(z)

must have all its

zeros within the unit circle

in

the z-plane. The polynomial

D(z)

is sparse in that it is of high order but has few non-

zero coefficients. The Schur-Cohn procedure is a neces-

sary and sufficient stability test

[SI.

Furthermore, an im-

plementation can take into account the sparse nature of

the characteristic polynomial of a pitch synthesis filter.

Appendix

A

gives the general form of this test and shows

how it can be applied to pitch synthesis filters. The Schur-

Cohn test will be used later to evaluate the tightness of

the test developed in this paper.

The Schur-Cohn test specialized for the case

of

pitch

synthesis filters has a computational complexity which

is

proportional to the order (approximately equal to the pitch

lag). The complexity of such

a

test is still large for pitch

lags encountered in practice.

In

the following sections, a

simple alternative test based on an asymptotically tight

sufficient condition is derived. In addition, the new test

will allow for the simple stabilization of unstable pitch

synthesis filters.

A.

Simple

Su$cient

Test

Two different sufficient tests will be developed. The first

is a simple sufficient test, which will also serve to intro-

duce the notation. The second is the final asymptotically

tight sufficient test.

Consider a general denominator polynomial

D

(z)

of the

RAMACHANDRAN AND KABAL: PITCH FILTERS IN

SPEECfl

CODERS

939

form

D(z)

=

Z"

-

B(z),

(7)

where

n-1

B(z)

=

bizi.

(8)

Then,

D(z)

=

z"

-

B(z)

=

z"(

1

-

z-~B(z)).

The con-

ditions for stability are that 1

-

z-"B(z)

#

0

or equiva-

lently that

z-"B(

z)

#

1

on

and outside the unit circle

z

=

e

je.

By the maximum modulus theorem

[9],

z-"B

(z

)

has its maximurn modulus

on

the contour surrounding any

region in which it is analytic. The expression

z-"B(

z)

being a polynomial in

z-l

is analytic on and outside the

unit circle in the z-plane. Therefore, a sufficient condition

for stability is that

1

z-"B(z)

1

<

1

on the unit circle in

the z-plane. This condition can be expressed as

IB(eje)[

<

1.

(9)

i=D

The left-hand side can be upper bounded,

IB(eje)(

I.

\bo)

+

(bll

+

*

+

Ibn-ll.

(10)

Then, a simple sufficient condition for stability is that the

sum of the moduli of the coefficients be less than one.

This simple test is well known (e.g., [lo, p. 2253) and

can be applied to any filter. For pitch filters,

B

(z

)

=

z"P(z),

where

n

is the,highest power of

z-l

in

P(z).

The

sufficient condition for stability becomes

lP1l

1

1taP

(P1)

+

ID21

1

2tap

(P1)

+

ID21

ID31

1

3taP. (11)

This test is both necessary, and Sufficient for a

1

tap filter.

As will be shown later, this test applied to 2 tap filters

also becomes asymptotically necessary and sufficient as

n

increases.

B.

Tight

Suficient

Test

A further examination of the expression for

I

B( eie)

1

will lead to a tight sufficient test for a 3 tap pitch filter.

The

1

and

2

tap pitch filters will be special cases of the

3

tap filter.

For a

3

tap filter,

B(z)

=

p1z2

+

p2z

+

p3.

For con-

venience, define

a

=

p1

+

p3

and

6

=

p1

-

p3.

(12)

Then,

B

(z)

evaluated on the unit circle becomes

B(eje)

=

+

d cos 8

+

jb

sin

81.

(13)

The bracketed term in

(1

3)

defines an ellipse in the com-

plex plane with center

p2.

The major axis

is

I

a

I

if

p,

and

p3

have the same signs, or

I

b

I

if

p1

and

p3

have opposite

signs. The two cases are illustrated in Fig.

2.

A sufficient

condition for stability is that the ellipse iie entirely within

the unit circle. Since the cases

p2

>

0

and

p2

<

0

are

symmetrical, the analysis proceeds by using

I

p2

I.

Also,

(b)

Fig.

2.

Illustration

of

the stability ellipse for a

3

tap

filter.

(a) The hori-

zontal axis is the major axis.

(b)

Tangency when the vertical axis

is

the

major

axis.

the cases far

u

>

0

and

b

>

0

are symmetrical with the

cases

a

<

0

and

b

<

0,

respectively.

If and

p3

have the same signs

(

1

a

1

>

1

b

1

),

the el-

lipse lies entirely within the unit circle if

1/32)

+

la)

<

1, (14)

IPII

+

lP2l +

IP3I

1.

(15)

or equivalehtly if

If

p1

and

p3

have opposite signs

(

1

a

1

<

I

b

1

),

the anal-

ysis

is

more complicated. The condition

1

o2

I

+

1

a

1

<

.1

ensures that

no point on the minor axis lies outside the

circle. This

is

a necessary condition for the ellipse

to

lie

within the unit circle and will be assumed to be satisfied

for the following discussion.

The aim is to establish the hitical conditions

on,

1

a

1,

I

b

1

;

and

1

p2

I

which, cause the ellipse to be tangent to the

circle. Such

a

case is shown in Fig.

2(b).

,Let 8, be the

vaiue of the angle

8

which gives tangency. Since the case

for

I

p2

I

is being considered, it suffices to consider

0

I

ec

I

7~

/2.

A point of tangency occurs at

X

when the

length

of

OX

is

equal to unity,

()p2(

+

la1

cos 8,)

+

(bsin

8,).

=

1.

(16)

In addition, tangency requires that

X

be the point on the

ellipse that is furthest away from the origin, which in turn

requires that the derivative of the left-hand side of

(16)

be

2

940

IEEE

TRANSACTIONS

ON

ACOUSTICS, SPEECH, AND SIGNAL PROCESSING,

VOL.

ASSP-35, NO.

7.

JULY

1987

zero. This condition gives

(a1

sin8,(1P21

+

(a(

cos8,)

=

b2sin8,cos8,. (17)

Solving this equation for cos

0,

gives

where sin

8,

is assumed to be nonzero. Tangency can also

occur if

8,

=

0

(sin

8,

=

o),

giving

Ip21

+

la1

=

1.

However, this case is precluded by the assumption that

the minor axis lies within the unit circle

(

I

P2

I

+

I

a

1

<

1).

If tangency occurs for

8,

#

0,

from (16) and

(1

8),

it

can be shown that the critical values of

1

a

1,

I

b

1,

and

1

p2

I

are those which satisfy

f

(

b2,

a2,

6;)

=

0

where

The analysis proceeds by assuming that two

of

the three

parameters

a,

b,

and

p2

are given, and then finding the

critical value of the third. Consider

a

and

b

given. This

determines the shape factor for the ellipse. Varying

p2

slides the ellipse along the horizontal axis. For

I

a

I

<

1

and

I

b

I

<

1,

if

I

,B2

I

is less than a critical value

&,

where

0

I

pZc

<

1,

the ellipse lies entirely within the unit cir-

cle. The critical value is that which results in tangency

of

the ellipse with the unit circle. Depending on the relatiye

values of

a

and

b,

this can occur in one of two ways.

For

b2

5

I

a

1,

the point of tangency occurs for

0,

=

0,

giving

I

a

I

+

PZc

=

1.

Since

I

P2

1

<

p2,,

this condition

is equivalent to having the minor axis within the unit cir-

cle. For

b2

>

1

a

1,

the point of tangency occurs for

0

<

8,

5

a

/2.

Then the equation

f

(

b2,

a2,

&.)

=

0

can be

solved for the critical value

p2,.

Since this function is

monotonic in

&,

the check that

I

P2

I

<

p2,

is equivalent

to the check that

f

(b2,

a2,

0;)

<

0.

For

I

a

I

2

b2,

having the horizontal axis of the ellipse

lie inside the unit circle is sufficient for stability. Other-

wise, the function

f

(b2,

a2,

&)

must be tested. It can be

shown that this test along with the minor axis requirement

is sufficient for stability. The stability test for a

3

tap pitch

synthesis filter is summarized below.

Stability Test:

Let

a

=

PI

+

p3

and

b

=

PI

-

&.

1)

If

1

a

1

2

I

b

1,

the following is sufficient for stabil-

ity:

a)

(PI(

+

IP2I

+

IP3I

<

1.

2)

If

1

a

I

<

I

b

1,

the satisfaction of the two following

conditions is sufficient for stability:

a>

IP2I,+

la1

<

1

b) i)

b*

5

Jal

or

ii)

b2&

-

(1

-

b2)(b2

-

a2)

<

0.

Part

2

of this stability test is tighter than the simple

sufficient test given earlier. This part of the test is invoked

when

1

a1

<

1

b

1

or equivalently when

0,

and

P2

have

opposite signs. Experiments show that in voiced speech,'

P2

is greater than zero in about

90

percent

of

the frames.

'A

frame was considered

to

be

voiced when the pitch prediction gain

was greater than

1

dB.

Given that

p2

>

0,

the number of voiced frames in which

PI

and

,B3

have opposite signs is about

3.7

times the num-

ber of voiced frames in which they have the same signs.

Therefore, the presence of a tighter test when

I

a

I

<

I

b

I

is important for speech coders.

The test for

3 tap filters subsumes the test for

2

tap

filters. By setting

PI

=

0,

a

3

tap filter becomes a

2

tap

filter. Then

1

a

[

=

1

b

(

and the test involves checking that

the sum of the moduli of the coefficients is less than 1.

This is equivalent to the simple sufficient test given ear-

lier.

C.

Further Examination

of

the Suficient Condition

The sufficient test defines a stability region for

3

tap

filters that is independent of the order

n.

Consider first the

simple sufficient test

(

1

PI

1

+

1

p2

I

+

,L13

1,

<

1

).

The

stability region can be viewed in

(@,

,

P2,

p3)

space as a

region bounded by

8

flat surfaces. The volume enclosed

is

4/3

units.

The stability region described by the tight sufficient test

has two types of surfaces. Four

of

the surfaces are flat and

coincide with the flat surfaces of the simple sufficient test.

The other four surfaces bulge out significantly beyond the

flat surfaces. The volume enclosed can be determined in

closed form and is calculated to be 16

/9

units. Fig.

3

shows a contour plot

of

the stability region defined by the

tight sufficient test for

p3

2

0.

A

plot for

p3

5

0

is a

mirror image reflected about the vertical axis. It can be

shown that this stability region is enclosed by a unit sphere

and hence, the sum of the squares of the coefficients

(power gain) is less than unity.

Both the magnitude and phase of

B(

eJe)

determine the

necessary and sufficient conditions.

If

the stability ellipse

lies entirely inside the circle of unit radius, all of the roots

of

D

(z)

are inside the unit circle in the z-plane.

As

some

combination of the parameters

I

a

1,

I

b

I,

or

I

p2

1

is in-

creased, the stability ellipse will emerGe outside the circle

of unit radius and the roots of

D(z)

will eventually cross

the unit circle. The critical combination of parameters

which cause roots to lie on the unit circle can be deter-

mined from

B(ej')

=

(20)

The points at which the ellipse crosses the unit circle cor-

respond to points which satisfy the above equation in

magnitude. For a given

n,

these intersection points cor-

respond to phase angles which may or may not satisfy the

above equation. However, as

n

increases, the phase an-

gles which satisfy the above equation become increas-

ingly dense. In the limit

of

large

n,

as soon as the stability

ellipse crosses the unit circle, at least one root

of

D

(z)

crosses the unit circle. This indicates that the stability el-

lipse must lie entirely within the unit circle. The stability

test given earlier becomes both necessary and sufficient in

the limit of large

n.

The necessary and sufficient conditions as determined

by the Schur-Cohn test define a region which depends on

the order

n.

This region is bounded

by

four types of sur-

RAMACHANDRAN AND KABAL: PITCH FILTERS IN SPEECH CODERS

94

1

Fig.

3.

Region described by the tight sufficient test. Equal value contours

are shown for

p3

=

0,

0.1,

.

. .

,

0.9.

The dotted lines represent lines

on the stability surface with

b2

=

1

a

1.

Fig.

4.

Necessary and sufficient stability region

(n

=

7).

Equal value con-

tours are shown

for

&

=

0,

0.1,

. . .

,

0.9.

faces. Two surfaces are flat and coincide with the flat sur-

faces for the regions described above. Two other surfaces

bulge slightly but become flat in the limit of large

n.

Two

other pairs of surfaces bulge significantly and coincide

with the bulging surfaces

of

the tight sufficient test in the

limit of large

n.

The symmetries

of

the surfaces switch

depending on whether

n

is even or odd. Fig.

4

shows the

stability region determined by the Schur-Cohn test for

n

=

7

and

p3

1

0.

This rather small value of

n

is

used to

accentuate the differences between this region and that de-

termined by the sufficient test. Note also that the contour

with

p3

=

0

is the stability region for the

2

tap filter with

n

=

6.

It remains to ascertain how tight the sufficient test is for

finite

n.

The area and volume enclosed by the true stabil-

ity regions for

2

and

3

tap filters were computed using

numerical integration techniques for various values of

n.

The boundaries of the true stability regions were com-

puted using the simplified Schur-Cohn procedure de-

scribed in Appendix A. The region defined by the suffi-

cient test is contained in that defined by the Schur-Cohn

test. Fig.

5

shows the percent differences between the

areas and volumes enclosed by the sufficient test and the

Schur-Cohn test for

2

and

3

tap filters. It is observed that

the percent difference decreases rapidly as

n

increases.

The lowest order of a pitch filter is typically around

20.

Even at this low order, the difference in volume is below

1

percent. For higher orders, the sufficient test is

very

tight and involves much

less

computation than the Schur-

Cohn test.

D.

Extension

to

Circles

of

Arbitrary Radius

The sufficient condition can be extended in order to de-

termine whether or not all the roots

of

D

(z)

=

zn

-

B

(z

>;

[B(z)

defined in

(S)]

are within a circle of radius

r

cen-

tered at the origin in the z-plane. Just as before, the max-

Order

n

Fig.

5.

Percent differences between areas and volumes enclosed by the

suf-

ficient test and the Schur-Cohn test.

imum modulus theorem is used to derive the condition

IB(rejs)

I

<

rn.

(21)

Expanding

1

B

(

re

j')

I

yields

IB(rej')(

=

Ibo

+

blreJe

+ +

bn-lrn-'eJ(n-')O

I

5

(bo(

+

Ibllr

+

-

+

\bnPl

Ira-'.

(22)

A simple sufficient condition that ensures that all the roots

of

D

(z)

are within the circle

1

z

1

=

r

is

\bo\

+

lbllr

+

*

-k

lbn-llrn-'

<

rn.

(23)

The condition

1

p1

I

<

rn is necessary and sufficient for

1

tap filters. For

3

tap filters, a more detailed examination

ofIB(reJe)IisdoneinthesamefashionasforIB(eis)(.

The conditions are given below. The

3

tap case subsumes

the

2

tap case. Hence, the condition for

2

tap filters is

merely

I

p1

1

r

+

1

p2

1

<

r'.

Stability and performance analysis of pitch filters in speech coders

Figures

Citations

Digital processing of speech signals

Digital Coding of Waveforms

Speech coding: a tutorial review

Advances in speech and audio compression

Theory and Application of the Z-Transform Method

References

Adaptive Filter Theory

Digital Processing of Speech Signals

Digital Coding of Waveforms

Digital processing of speech signals

Code-excited linear prediction(CELP): High-quality speech at very low bit rates

Related Papers (5)

Code-excited linear prediction(CELP): High-quality speech at very low bit rates

Predictive Coding of Speech at Low Bit Rates

Predictive coding of speech signals and subjective error criteria

Digital Coding of Waveforms

Adaptive predictive coding of speech signals