scispace - formally typeset
Open AccessJournal ArticleDOI

Stability and performance analysis of pitch filters in speech coders

TLDR
It is observed that the quality of decoded speech improves significantly when stable synthesis filters are employed and the stability and performance of pitch filters in speech coding when pitch prediction is combined with formant prediction is analyzed.
Abstract
This paper analyzes the stability and performance of pitch filters in speech coding when pitch prediction is combined with formant prediction. A computationally simple stability test based on a sufficient condition is formulated for pitch synthesis filters. For typical orders of pitch filters, this sufficient test is very tight. Based on the test, a simple stabilization technique that minimizes the loss in prediction gain of the pitch predictor is employed to generate stable synthesis filters. Finally, it is observed that the quality of decoded speech improves significantly when stable synthesis filters are employed.

read more

Content maybe subject to copyright    Report

IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING,
VOL.
ASSP-35, NO.
7,
JULY
1987
931
Stability and Performance Analysis
of
Pitch Filters
in
Speech Coders
RAVI P. RAMACHANDRAN
AND
PETER KABAL
Abstract-This paper analyzes the stability and performance of pitch
filters in speech coding when pitch prediction is combined with formant
prediction.
A
computationally simple stability test based
on
a sufficient
condition is formulated
for
pitch synthesis filters. For typical orders
of
pitch filters, this sufficient test is very tight. Based on the test, a simple
stabilization technique that minimizes the loss in prediction gain of the
pitch predictor is employed to generate stable synthesis filters. Finally,
it is observed that the quality
of
decoded speech improves significantly
when stable synthesis filters are employed.
I. INTRODUCTION
I
N
the speech coderi considered in this paper, two non-
recursive prediction error filters are used to process the
incoming speech signal. The first which removes near-
sample redundancies is referred to here as the formant
predictor. It
is
followed by the pitch predictor which re-
moves distant-sample based redundancies. The resulting
residual signal after both formant and pitch prediction is
then coded for transmission. An adaptive predictive coder
(APC) places these predictors in a feedback loop around
the residual quantizer. An additional quantization noise
shaping filter can also be employed to reduce the percep-
tual distortion in the decoded speech
[
11,
[2]. An alternate
description of an APC coder uses an open-loop predictor
configuration and a noise feedback filter
[3].
A block dia-
gram
of
such a configuration is shown in Fig.
l.
This type
of open-loop arrangement is also used in code-excited lin-
ear prediction (CELP) [4]. In CELP, the coding is accom-
plished by selecting the candidate waveform (from a dic-
tionary) that best represents the residual. Also, noise
shaping is accomplished implicitly in the process of
choosing a representational residual signal.
In both APC and CELP, the residual signal or the se-
lected codeword (after scaling by the gain factor) is passed
through a pitch synthesis and a formant synthesis filter to
reproduce the decoded speech. The filtering in the syn-
thesis phase can be viewed in the frequency domain as
first inserting the fine pitch structure and then inserting
the spectral envelope (formant structure). The synthesis
Manuscript received June 13, 1986; revised January
12,
1987. This work
was supported by the Natural Sciences and Engineering Research Council
of
Canada.
R.
P.
Ramachandran is with the Department
of
Electrical Engineering,
McGill University, Montreal,
P.Q.,
Canada, H3A
2A7.
P.
Kabal
is
with the Department of Electrical Engineering, McGill Uni-
versity, Montreal,
P.Q.,
Canada, H3A 2A7, and INRS-Tblkommunica-
tions, Universitb du QuBbec, Verdun,
P.Q.,
Canada, H3E 1H6.
IEEE
Log
Number 8714483.
U
U
U
U
(b)
Fig.
1.
Block diagram
of
an
APC
coder with
noise
feedback.
(a)
Analysis
phase.
(b)
Synthesis phase.
filters are recursive and potentially may be unstable. For
the formant filter, the autocorrelation
[5],
modified co-
variance
[
11,
[6],
or Burg [7] methods can be used to de-
termine filter coefficients which ensure stability of the for-
mant synthesis filter.
The procedure to determine a set of pitch predictor coef-
ficients can result in an unstable pitch synthesis filter. This
paper addresses the stability and performance issues
of
the pitch filter by formulating a computationally simple
stability test based on a tight sufficient condition, intro-
ducing a stabilization technique, and evaluating the per-
formance of the resulting suboptimum predictor. The
ef-
fect of unstable pitch synthesis filters on decoded speech
is also examined for a CELP system.
It should be noted that the filters in the speech coder
are updated frame by frame and hence form a time vary-
ing system. Conventional notions of stability are in es-
sence asymptotic properties of systems.
In
speech coding,
an “unstable” filter may persist for a few frames (often
corresponding to an interval with increasing energy-see
the experimental results cited later), but eventually pe-
riods of stable filters are encountered. This means that, in
practice, the output does not continue to increase in am-.
plitude with time.
Consider the canonical case of an all-zero prediction
error filter in cascade with a quantizer, followed by an all-
pole synthesis filter. The quantizer can be modeled as
adding noise (possibly correlated with the signal) to the
residual signal.
As long as the synthesis filter is the in-
verse to the prediction error filter and the filter coefficients
are updated in step, the signal component emerges unal-
tered. For the signal component, stability is not a problem
because of polelzero cancellation. However, the quanti-
zation noise passes through only the synthesis filter. An
“unstable” synthesis filter can cause the output noise to
0096-3518/87/0700-0937$01
.OO
O
1987
IEEE

938
IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING,
VOL.
ASSP-35, NO.
7,
JULY
1987
build up during the period of instability and can lead to
degraded speech quality.
The effect of the quantization noise may be measured
in a number of different ways. If the quantization noise is
modeled as white noise, the output noise power can be
expressed as the input noise power multiplied by the
power gain of the filter. The power gain is the sum of the
squares of the filter coefficients. The sufficient stability
test introduced later results in a power gain for the pitch
synthesis filter that is less than unity.
11.
FORMANT
AND
PITCH PREDICTORS
The formant predictor has a transfer function
Nf
F(z)
=
akz-k.
(1)
k=
1
The order
Nf
is typically between
8
and 16. The system
function of the noise feedback filter is related to that of
the formant predictor and is expressed as
N(z)
=
F(
z
/y
),
where
0
<
y
<
1.
The pitch predictor has a small number of taps (typi-
cally
1-3
)
centered around a large delay
M
corresponding
to the estimated pitch period in samples. The system func-
tion for the predictor
is
PlZ
-M
1
tap
P(z)
=
+
P*z
-(M+l)
{olz-(M-l)
+
02z-*
+
~~z-(~+')
2
3
tap tap.
(2)
At the receiver, the formant and pitch synthesis filters have
transfer functions
&(z)
=
1
/(
1
-
F(z))
and
Hp(z)
=
1/(1
-
P(z))),
respectively.
In the case of a pitch predictor, both the value of
M
(pitch lag) and the predictor coefficients have to be deter-
mined. The conventional strategy to determine the pitch
lag is to search for the lag corresponding to the peak value
of
the correlation of the input signal [referred to as
d
(n)]
to the pitch predictor
[
13.
The search range is often lim-
ited to those pitch values encountered in speech. After the
value
of
M
is determined, the coefficients of
P(
z)
are
found by minimizing the mean-square value of the resid-
ual over a frame size of
N
samples
[
11,
[6]. This covari-
ance-type formulation results in a linear system
of
equa-
tions
+fl
=
a,
where
+
is a matrix of correlation terms,
p
is the vector of predictor coefficients, and
a
is a vector
of correlation terms. Specifically for a
3
tap predictor, the
system of equations is
where
N-
1
+(i,j)
=
c
d(n
-
i)d(n
-j).
(4)
n=O
Given any vector of predictor coefficients
p,
the energy
of the prediction residual is
E2
=
4(0,
0)
-
20%
+
p%p.
(5
1
The residual energy for the optimum predictor is
&in
=
+(O,
0)
-
p*a.
(6)
A normalized quantity related to
e2
will be used in the
sequel as the performance measure. The prediction gain
is defined to be the ratio (in decibels) of the energy
of the
signal at the input to the predictor to the energy of the
prediction residual.
111.
STABILITY TEST
FOR
PITCH SYNTHESIS FILTERS
The covariance formulation does not guarantee that
Hp
(z
)
is a stable function. To ensure stability, the de-
nominator polynomial
D
(z)
of
Hp(z)
must have all its
zeros within the unit circle
in
the z-plane. The polynomial
D(z)
is sparse in that it is of high order but has few non-
zero coefficients. The Schur-Cohn procedure is a neces-
sary and sufficient stability test
[SI.
Furthermore, an im-
plementation can take into account the sparse nature of
the characteristic polynomial of a pitch synthesis filter.
Appendix
A
gives the general form of this test and shows
how it can be applied to pitch synthesis filters. The Schur-
Cohn test will be used later to evaluate the tightness of
the test developed in this paper.
The Schur-Cohn test specialized for the case
of
pitch
synthesis filters has a computational complexity which
is
proportional to the order (approximately equal to the pitch
lag). The complexity of such
a
test is still large for pitch
lags encountered in practice.
In
the following sections, a
simple alternative test based on an asymptotically tight
sufficient condition is derived. In addition, the new test
will allow for the simple stabilization of unstable pitch
synthesis filters.
A.
Simple
Su$cient
Test
Two different sufficient tests will be developed. The first
is a simple sufficient test, which will also serve to intro-
duce the notation. The second is the final asymptotically
tight sufficient test.
Consider a general denominator polynomial
D
(z)
of the

RAMACHANDRAN AND KABAL: PITCH FILTERS IN
SPEECfl
CODERS
939
form
D(z)
=
Z"
-
B(z),
(7)
where
n-1
B(z)
=
bizi.
(8)
Then,
D(z)
=
z"
-
B(z)
=
z"(
1
-
z-~B(z)).
The con-
ditions for stability are that 1
-
z-"B(z)
#
0
or equiva-
lently that
z-"B(
z)
#
1
on
and outside the unit circle
z
=
e
je.
By the maximum modulus theorem
[9],
z-"B
(z
)
has its maximurn modulus
on
the contour surrounding any
region in which it is analytic. The expression
z-"B(
z)
being a polynomial in
z-l
is analytic on and outside the
unit circle in the z-plane. Therefore, a sufficient condition
for stability is that
1
z-"B(z)
1
<
1
on the unit circle in
the z-plane. This condition can be expressed as
IB(eje)[
<
1.
(9)
i=D
The left-hand side can be upper bounded,
IB(eje)(
I.
\bo)
+
(bll
+
*
+
Ibn-ll.
(10)
Then, a simple sufficient condition for stability is that the
sum of the moduli of the coefficients be less than one.
This simple test is well known (e.g., [lo, p. 2253) and
can be applied to any filter. For pitch filters,
B
(z
)
=
z"P(z),
where
n
is the,highest power of
z-l
in
P(z).
The
sufficient condition for stability becomes
lP1l
1
1taP
(P1)
+
ID21
1
2tap
(P1)
+
ID21
ID31
1
3taP. (11)
This test is both necessary, and Sufficient for a
1
tap filter.
As will be shown later, this test applied to 2 tap filters
also becomes asymptotically necessary and sufficient as
n
increases.
B.
Tight
Suficient
Test
A further examination of the expression for
I
B( eie)
1
will lead to a tight sufficient test for a 3 tap pitch filter.
The
1
and
2
tap pitch filters will be special cases of the
3
tap filter.
For a
3
tap filter,
B(z)
=
p1z2
+
p2z
+
p3.
For con-
venience, define
a
=
p1
+
p3
and
6
=
p1
-
p3.
(12)
Then,
B
(z)
evaluated on the unit circle becomes
B(eje)
=
+
d cos 8
+
jb
sin
81.
(13)
The bracketed term in
(1
3)
defines an ellipse in the com-
plex plane with center
p2.
The major axis
is
I
a
I
if
p,
and
p3
have the same signs, or
I
b
I
if
p1
and
p3
have opposite
signs. The two cases are illustrated in Fig.
2.
A sufficient
condition for stability is that the ellipse iie entirely within
the unit circle. Since the cases
p2
>
0
and
p2
<
0
are
symmetrical, the analysis proceeds by using
I
p2
I.
Also,
(b)
Fig.
2.
Illustration
of
the stability ellipse for a
3
tap
filter.
(a) The hori-
zontal axis is the major axis.
(b)
Tangency when the vertical axis
is
the
major
axis.
the cases far
u
>
0
and
b
>
0
are symmetrical with the
cases
a
<
0
and
b
<
0,
respectively.
If and
p3
have the same signs
(
1
a
1
>
1
b
1
),
the el-
lipse lies entirely within the unit circle if
1/32)
+
la)
<
1, (14)
IPII
+
lP2l +
IP3I
1.
(15)
or equivalehtly if
If
p1
and
p3
have opposite signs
(
1
a
1
<
I
b
1
),
the anal-
ysis
is
more complicated. The condition
1
o2
I
+
1
a
1
<
.1
ensures that
no point on the minor axis lies outside the
circle. This
is
a necessary condition for the ellipse
to
lie
within the unit circle and will be assumed to be satisfied
for the following discussion.
The aim is to establish the hitical conditions
on,
1
a
1,
I
b
1
;
and
1
p2
I
which, cause the ellipse to be tangent to the
circle. Such
a
case is shown in Fig.
2(b).
,Let 8, be the
vaiue of the angle
8
which gives tangency. Since the case
for
I
p2
I
is being considered, it suffices to consider
0
I
ec
I
7~
/2.
A point of tangency occurs at
X
when the
length
of
OX
is
equal to unity,
()p2(
+
la1
cos 8,)
+
(bsin
8,).
=
1.
(16)
In addition, tangency requires that
X
be the point on the
ellipse that is furthest away from the origin, which in turn
requires that the derivative of the left-hand side of
(16)
be
2
2

940
IEEE
TRANSACTIONS
ON
ACOUSTICS, SPEECH, AND SIGNAL PROCESSING,
VOL.
ASSP-35, NO.
7.
JULY
1987
zero. This condition gives
(a1
sin8,(1P21
+
(a(
cos8,)
=
b2sin8,cos8,. (17)
Solving this equation for cos
0,
gives
where sin
8,
is assumed to be nonzero. Tangency can also
occur if
8,
=
0
(sin
8,
=
o),
giving
Ip21
+
la1
=
1.
However, this case is precluded by the assumption that
the minor axis lies within the unit circle
(
I
P2
I
+
I
a
1
<
1).
If tangency occurs for
8,
#
0,
from (16) and
(1
8),
it
can be shown that the critical values of
1
a
1,
I
b
1,
and
1
p2
I
are those which satisfy
f
(
b2,
a2,
6;)
=
0
where
The analysis proceeds by assuming that two
of
the three
parameters
a,
b,
and
p2
are given, and then finding the
critical value of the third. Consider
a
and
b
given. This
determines the shape factor for the ellipse. Varying
p2
slides the ellipse along the horizontal axis. For
I
a
I
<
1
and
I
b
I
<
1,
if
I
,B2
I
is less than a critical value
&,
where
0
I
pZc
<
1,
the ellipse lies entirely within the unit cir-
cle. The critical value is that which results in tangency
of
the ellipse with the unit circle. Depending on the relatiye
values of
a
and
b,
this can occur in one of two ways.
For
b2
5
I
a
1,
the point of tangency occurs for
0,
=
0,
giving
I
a
I
+
PZc
=
1.
Since
I
P2
1
<
p2,,
this condition
is equivalent to having the minor axis within the unit cir-
cle. For
b2
>
1
a
1,
the point of tangency occurs for
0
<
8,
5
a
/2.
Then the equation
f
(
b2,
a2,
&.)
=
0
can be
solved for the critical value
p2,.
Since this function is
monotonic in
&,
the check that
I
P2
I
<
p2,
is equivalent
to the check that
f
(b2,
a2,
0;)
<
0.
For
I
a
I
2
b2,
having the horizontal axis of the ellipse
lie inside the unit circle is sufficient for stability. Other-
wise, the function
f
(b2,
a2,
&)
must be tested. It can be
shown that this test along with the minor axis requirement
is sufficient for stability. The stability test for a
3
tap pitch
synthesis filter is summarized below.
Stability Test:
Let
a
=
PI
+
p3
and
b
=
PI
-
&.
1)
If
1
a
1
2
I
b
1,
the following is sufficient for stabil-
ity:
a)
(PI(
+
IP2I
+
IP3I
<
1.
2)
If
1
a
I
<
I
b
1,
the satisfaction of the two following
conditions is sufficient for stability:
a>
IP2I,+
la1
<
1
b) i)
b*
5
Jal
or
ii)
b2&
-
(1
-
b2)(b2
-
a2)
<
0.
Part
2
of this stability test is tighter than the simple
sufficient test given earlier. This part of the test is invoked
when
1
a1
<
1
b
1
or equivalently when
0,
and
P2
have
opposite signs. Experiments show that in voiced speech,'
P2
is greater than zero in about
90
percent
of
the frames.
'A
frame was considered
to
be
voiced when the pitch prediction gain
was greater than
1
dB.
Given that
p2
>
0,
the number of voiced frames in which
PI
and
,B3
have opposite signs is about
3.7
times the num-
ber of voiced frames in which they have the same signs.
Therefore, the presence of a tighter test when
I
a
I
<
I
b
I
is important for speech coders.
The test for
3 tap filters subsumes the test for
2
tap
filters. By setting
PI
=
0,
a
3
tap filter becomes a
2
tap
filter. Then
1
a
[
=
1
b
(
and the test involves checking that
the sum of the moduli of the coefficients is less than 1.
This is equivalent to the simple sufficient test given ear-
lier.
C.
Further Examination
of
the Suficient Condition
The sufficient test defines a stability region for
3
tap
filters that is independent of the order
n.
Consider first the
simple sufficient test
(
1
PI
1
+
1
p2
I
+
,L13
1,
<
1
).
The
stability region can be viewed in
(@,
,
P2,
p3)
space as a
region bounded by
8
flat surfaces. The volume enclosed
is
4/3
units.
The stability region described by the tight sufficient test
has two types of surfaces. Four
of
the surfaces are flat and
coincide with the flat surfaces of the simple sufficient test.
The other four surfaces bulge out significantly beyond the
flat surfaces. The volume enclosed can be determined in
closed form and is calculated to be 16
/9
units. Fig.
3
shows a contour plot
of
the stability region defined by the
tight sufficient test for
p3
2
0.
A
plot for
p3
5
0
is a
mirror image reflected about the vertical axis. It can be
shown that this stability region is enclosed by a unit sphere
and hence, the sum of the squares of the coefficients
(power gain) is less than unity.
Both the magnitude and phase of
B(
eJe)
determine the
necessary and sufficient conditions.
If
the stability ellipse
lies entirely inside the circle of unit radius, all of the roots
of
D
(z)
are inside the unit circle in the z-plane.
As
some
combination of the parameters
I
a
1,
I
b
I,
or
I
p2
1
is in-
creased, the stability ellipse will emerGe outside the circle
of unit radius and the roots of
D(z)
will eventually cross
the unit circle. The critical combination of parameters
which cause roots to lie on the unit circle can be deter-
mined from
B(ej')
=
(20)
The points at which the ellipse crosses the unit circle cor-
respond to points which satisfy the above equation in
magnitude. For a given
n,
these intersection points cor-
respond to phase angles which may or may not satisfy the
above equation. However, as
n
increases, the phase an-
gles which satisfy the above equation become increas-
ingly dense. In the limit
of
large
n,
as soon as the stability
ellipse crosses the unit circle, at least one root
of
D
(z)
crosses the unit circle. This indicates that the stability el-
lipse must lie entirely within the unit circle. The stability
test given earlier becomes both necessary and sufficient in
the limit of large
n.
The necessary and sufficient conditions as determined
by the Schur-Cohn test define a region which depends on
the order
n.
This region is bounded
by
four types of sur-

RAMACHANDRAN AND KABAL: PITCH FILTERS IN SPEECH CODERS
94
1
Fig.
3.
Region described by the tight sufficient test. Equal value contours
are shown for
p3
=
0,
0.1,
.
. .
,
0.9.
The dotted lines represent lines
on the stability surface with
b2
=
1
a
1.
Fig.
4.
Necessary and sufficient stability region
(n
=
7).
Equal value con-
tours are shown
for
&
=
0,
0.1,
. . .
,
0.9.
faces. Two surfaces are flat and coincide with the flat sur-
faces for the regions described above. Two other surfaces
bulge slightly but become flat in the limit of large
n.
Two
other pairs of surfaces bulge significantly and coincide
with the bulging surfaces
of
the tight sufficient test in the
limit of large
n.
The symmetries
of
the surfaces switch
depending on whether
n
is even or odd. Fig.
4
shows the
stability region determined by the Schur-Cohn test for
n
=
7
and
p3
1
0.
This rather small value of
n
is
used to
accentuate the differences between this region and that de-
termined by the sufficient test. Note also that the contour
with
p3
=
0
is the stability region for the
2
tap filter with
n
=
6.
It remains to ascertain how tight the sufficient test is for
finite
n.
The area and volume enclosed by the true stabil-
ity regions for
2
and
3
tap filters were computed using
numerical integration techniques for various values of
n.
The boundaries of the true stability regions were com-
puted using the simplified Schur-Cohn procedure de-
scribed in Appendix A. The region defined by the suffi-
cient test is contained in that defined by the Schur-Cohn
test. Fig.
5
shows the percent differences between the
areas and volumes enclosed by the sufficient test and the
Schur-Cohn test for
2
and
3
tap filters. It is observed that
the percent difference decreases rapidly as
n
increases.
The lowest order of a pitch filter is typically around
20.
Even at this low order, the difference in volume is below
1
percent. For higher orders, the sufficient test is
very
tight and involves much
less
computation than the Schur-
Cohn test.
D.
Extension
to
Circles
of
Arbitrary Radius
The sufficient condition can be extended in order to de-
termine whether or not all the roots
of
D
(z)
=
zn
-
B
(z
>;
[B(z)
defined in
(S)]
are within a circle of radius
r
cen-
tered at the origin in the z-plane. Just as before, the max-
Order
n
Fig.
5.
Percent differences between areas and volumes enclosed by the
suf-
ficient test and the Schur-Cohn test.
imum modulus theorem is used to derive the condition
IB(rejs)
I
<
rn.
(21)
Expanding
1
B
(
re
j')
I
yields
IB(rej')(
=
Ibo
+
blreJe
+ +
bn-lrn-'eJ(n-')O
I
5
(bo(
+
Ibllr
+
-
+
\bnPl
Ira-'.
(22)
A simple sufficient condition that ensures that all the roots
of
D
(z)
are within the circle
1
z
1
=
r
is
\bo\
+
lbllr
+
*
-k
lbn-llrn-'
<
rn.
(23)
The condition
1
p1
I
<
rn is necessary and sufficient for
1
tap filters. For
3
tap filters, a more detailed examination
ofIB(reJe)IisdoneinthesamefashionasforIB(eis)(.
The conditions are given below. The
3
tap case subsumes
the
2
tap case. Hence, the condition for
2
tap filters is
merely
I
p1
1
r
+
1
p2
1
<
r'.

Citations
More filters
Journal ArticleDOI

Digital processing of speech signals

Journal ArticleDOI

Digital Coding of Waveforms

K.H. Barratt
Journal ArticleDOI

Speech coding: a tutorial review

TL;DR: The objective of this paper is to provide a tutorial overview of speech coding methodologies with emphasis on those algorithms that are part of the recent low-rate standards for cellular communications.
Journal ArticleDOI

Advances in speech and audio compression

TL;DR: Current activity in speech compression is dominated by research and development of a family of techniques commonly described as code-excited linear prediction (CELP) coding, which offer a quality versus bit rate tradeoff that significantly exceeds most prior compression techniques.
References
More filters
Book

Adaptive Filter Theory

Simon Haykin
TL;DR: In this paper, the authors propose a recursive least square adaptive filter (RLF) based on the Kalman filter, which is used as the unifying base for RLS Filters.
Book

Digital Processing of Speech Signals

TL;DR: This paper presents a meta-modelling framework for digital Speech Processing for Man-Machine Communication by Voice that automates the very labor-intensive and therefore time-heavy and expensive process of encoding and decoding speech.

Digital Coding of Waveforms

Peter No
Journal ArticleDOI

Digital processing of speech signals

Proceedings ArticleDOI

Code-excited linear prediction(CELP): High-quality speech at very low bit rates

TL;DR: A code-excited linear predictive coder in which the optimum innovation sequence is selected from a code book of stored sequences to optimize a given fidelity criterion, indicating that a random code book has a slight speech quality advantage at low bit rates.