Least squares quantization in PCM

doi:10.1109/TIT.1982.1056489

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-28, NO. 2, MARCH 1982

Least Squares Quantization in PCM

STUART P. LLOYD

129

Abstract-It

has long been realized that in pulse-code modulation

(PCM), with a given ensemble of signals to handle, the quantum values

should be spaced more closely

in

the voltage regions where the signal

amplitude is more likely to fall. It has been shown by Panter and Dite that,

in the limit as the number of quanta becomes infinite, the asymptotic

fractional density of quanta per unit voltage should vary as the one-third

power of the probability density per unit voltage of signal amplitudes. In

this paper the corresponding result for any finite number of quanta is

derived; that is, necessary conditions are found that the quanta and

associated quantization intervals of an optimum finite quantization scheme

must satisfy. The optimization criterion used is that the average quantiza-

tion noise power be a minimum. It is shown that the result obtained here

goes over into the Panter and Dite result as the number of quanta become

large. The optimum quantization schemes for 26 quanta, b = 1,2, t ,7, are

given numerically for Gaussian and for Laplacian distribution of signal

amplitudes.

I. INTRODUCTION

T

HE BASIC IDEAS in the pulse-code modulation

(PCM) system [ 11, [2, ch. 191 are the Shannon-Nyquist

sampling theorem and the notion of quantizing the sample

values.

The sampling theorem asserts that a signal voltage s(t),

- 00 < t < cc, containing only frequencies less than

W

cycles/s can be recovered from a sequence of its sample

values according to

s(t) = f$ s(tj)K(t - t,),

-ccoOttm,

(1)

jz-00

where s(tj) is the value of s at thejth sampling instant

t,=&k

-coCj-Co3,

and where

sin2rWt

K(t) = 2mJ,J,7t T

-KloOttco,

(2)

is a “sin

t/t

” pulse of the appropriate width.

The pulse-amplitude modulation (PAM) system [2, ch.

161 is based on the sampling theorem alone. One sends

over the system channel, instead of the signal values

s(t)

for all times

t,

only a sequence

. . . > s(t-I>, &), s(t*>, * * -

(3)

of samples of the signal. The (idealized) receiver constructs

the pulses

K(t

-

tj)

and adds them together with the

Manuscript received May 1, 1981. The material in this paper was

presented in part at the Institute of Mathematical Statistics Meeting,

Atlantic City, NJ, September lo- 13, 1957.

The author is with Bell Laboratories, Whippany Road, Whippky, NJ

0798 1.

received amplitudes s!i’), as in (1), to produce an exact

reproduction of the ongmal band-limited signal s.

PCM is a modification of this. Instead of sending the

exact sample values (3), one partitions the voltage range of

the signal into a finite number of subsets and transmits to

the receiver only the information as to which subset a

sample happens to fall in. Built into the receiver there is a

source of fixed representative voltages-“quanta’‘-one

for each of the subsets. When the receiver is informed that

a certain sample fell in a certain subset, it uses its quantum

for that subset as an approximation to the true sample

value and constructs a band-limited signal based on these

approximate sample values.

We define the noise signal as the difference between the

receiver-output signal and the original signal and the noise

power as the average square of the noise signal. The prob-

lem we consider is the following: given the number of

quanta and certain statistical properties of the signal, de-

termine the subsets and quanta that are best in minimizing

the noise power.

II. QUANTIZATION

Let us formulate the quantization process more ex-

plicitly. A quantization scheme consists of a class of sets

{Q,,

Qz>.

. ->

Q,} and a set of quanta

{q,, q2;. -,q,}.

The

{Q,} are any v disjoint subsets of the voltage axis which,

taken together, cover the entire voltage axis. The

{qa}

are

any v finite voltage values. The number v of quanta is to be

regarded throughout as a fixed finite preassigned number.

We associate with a partition (Q,} a label function y(x),

- 00 < x < co, defined for all (real) voltages x by

y(x) = 1 if x liesin Q,,

y(x) =2 if x liesin Q2,

(4)

y(x) = v if x liesin Q,.

If

s(t,)

is the jth sample of the signal s, as in Section I,

then we denote by aj the label of the set that this sample

falls in:

aj

=

Y(s(tj)),

-m<j<co.

In PCM the signal sent over the channel is (in some code

or another) the sequence of labels

. . .

,a-,,a,,q;~~,

(5)

each aj being one of the integers { 1,2,. . +, v}. The technol-

ogy of this transmission does not concern us, except that

001%9448/82/0300-0129$00.75 01982 IEEE

130

IEEE TRANSACTIONSON INFORMATIONTHEORY,VOL. IT-28,~o. ~,MARCH 1982

we assume that such a sequence can be delivered to the

receiver without error.

The receiver uses the fixed voltage 4, as an approxima-

tion to all sample voltages in Q,, (Y = 1,2,. * *, v. That is,

the receiver, being given the value of uj in the sequence (5),

proceeds as if the jth sample of s had value

q,

and

produces the receiver-output signal

r(t) = 5 q,K(r -

tj)P

-co<ttoQ.

j=-*

To put it another way, the system mutilates an actual

sample voltage value x to the quantized value y(x) given

by

Y(X) = 4U(X)’

-cQcx)xxcc,

(6)

and we may express the receiver output in terms of this as

r(t) = g y(s(tj))K(t - tj)9

-03~<~<.

jz-03

(7)

Hence the noise signal, defined as

n(t)

=

r(t)

-

s(t),

-m<ttcc,

is given by

n(t) = 5 z(s(t,))K(t - t,), -co~tt<,

jz-00

(8)

where

z(x) =y(x> - x,

-Mexico,

(9)

may be regarded as the quantization error added to a

sample which has voltage value x.

Note that we assume that the receiver uses the nonrealiz-

able pulses (2). If other pulses are used (e.g:, step functions

or other realizable pulses) there will be sampling noise, in

general, even without quantization [3]. Our noise (8) is due

strictly to quantization.

Finally we must emphasize that we assume that the {Q,}

and

{q,)

are constant in time. In deltamodulation and its

refinements the {Q,}

and

{q,}

change from sampling

instant to sampling instant, depending on the past behav-

ior of the signal being handled. Such systems are very

difficult to treat theoretically.

III.

NOISE POWER

Instead of working with a particular band-limited signal,

we assume that there is given a probabilistic family of such

signals. That is, the s of the preceding sections and hence

the various signals derived from it are to be regarded as

stochastic processes [4]. We denote the underlying proba-

bility measure by P{ .} and averages with respect to this

measure (expectations) by E{ . }.

We use the following results of the probabilistic treat-

ment. We assume that the s process is stationary, so that

the cumulative probability distribution function of a sam-

ple,

F(x) = P{s( t) 5 X}) -@3(x<m,

is independent of

t,

- 00 <

t

-C 00, as indicated by the

notation. Then the average power of the s process, assumed

to be finite, is constant in time:

S = E{s’(t)} =/:mxzdF(x),

-co ==c t < co.

w-9

Moreover, the

r

and n processes have this same property;

the average receiver-output power

R

is given by

R = E{r2(t)} =I-Ly’(x) dF(x), -00 < t < 00,

(11)

where y(x) is defined in (6), and the noise power N is

N = E{n2(t)} =J-tz2(x)dF(x), -00 <t < co,

(1-a

with z(x) as in (9). (Detailed proofs of these statements,

together with further assumptions used, are given in Ap-

pendix A.) The stochastic process problem is thus reduced

to a problem in a single real variable: choose the {Q,} and

{q,}

so that the rightmost integral in (12) is as small as

possible.

IV.

THE BEST QUANTA

We consider first the problem of minimizing N with

respect to the quanta

{qa}

when the {Q,} are fixed preas-

signed sets.

The

dF

integral in (12) may be written more explicitly as

N= i ( (q,-x)2dF(x).

(13)

(The sets {Q,} must be measurable

[dF]

if (1 l)-( 13) are to

have meaning, and we assume always that this is the case.)

If we regard the given

F

as describing the distribution of

unit probability “mass” on the voltage axis [5, p. 571, then

(13) expresses N as the total “moment of inertia” of the

sets {Q,} around the respective points

(4,).

It is a classical

result that such a moment assumes its minimum value

when each {q,} is the center of mass of the corresponding

{Q,} (see, e.g., [5, p. 1751). That is,

JQ,

4a =

a = 1,2;**,v

>

(14

/

Q dF(x) ’

n

are the uniquely determined best quanta to use with a

given partition {Q,}.

To avoid the continual mention of trivial cases we as-

sume always that

F

is increasing at least by v + 1 points,

so that the quantization noise does not vanish. Then none

of the denominators in (14) will vanish, at least in an

LLOYD: LEAST SQUARES QUANTIZATION IN PCM

131

optimum scheme. For if Q, has vanishing mass it can be

combined with some set Q, of nonvanishing mass (dis-

carding

q,)

to give a scheme with v - 1 quanta and the

same noise. Then one of the sets of this scheme can be

divided into two sets and new quanta assigned to give a

scheme with v quanta and noise less than in the original

scheme. (We omit the details.)

If the expression on the right in (14) is substituted for

q,

in (13), there results

N = S - ji

LY=l

q:lQ dF(x),

a

where the {q,} here are the optimum ones of (14). The sum

on the right is the receiver-output power from (11). Hence

when the

{q,}

are centers of mass of the {Q,}, optimum or

not, then S =

R

+ N, which implies that the noise is

orthogonal to the receiver output. One expects this in a

least squares approximation, of course.

V.

THE BEST PARTITION

Now we find the best sets {Q,} to use with a fixed

preassigned set of quanta

(4,).

The considerations of this

section are independent of those of the preceding section.

In particular, the best {Q,} for given

{q,}

may not have

the

{q,}

as their centers of mass.

We assume that the given

{q,}

are distinct since it will

never happen in an optimum scheme that

q, = qP

for some

(Y # p. For if

q,

=

qp,

then Q, and QB are effectively one

set Q,

U

Qp as far as the noise is concerned (13), and this

set can be redivided into two sets and these two sets can be

given distinct quantum values in such a way as to reduce

the noise. (We omit the details.)

Consider the probability mass in a small interval around

voltage value x. According to (13) any of this mass which is

assigned to

q,

(i.e., which lies in Q,) will contribute to the

noise at rate

(q,

- x)~ per unit mass. To minimize the

noise, then, any mass in the neighborhood of x should be

assigned to a

qa

for which

(qa

- x)~ is the smallest of the

numbers

(q,

- x)~,

(q2

-

~)~;..,(q~

- x)~. In other

words,

Q, > [X :

(q,

- x)’ < (

qp

- x)’ for all /3 # a>,

a = l,,..‘,V,

modulo sets of measure zero [

dF].’

This simplifies to

Q,

1 {x:(qp - q&x - f(q, + qp)) <OforallP+a),

a = 1,2;**,v.

05)

It is straightforward that the best {Q,} are determined by

(15) as the intervals whose endpoints bisect the segments

between successive

{q,},

except that the assignment of the

endpoints is not determined. To make matters definite we

let the {Q,} be left-open and right-closed, so that the best

‘If C(x) is a condition on x, then {x: C(x)} denotes the set of all x

which satisfy C(x).

partition to use with the given quanta is

Q, = {x: - 00 <XIX,)

Q,= {X:X<XIX~}

(16)

Qv-, = {x:x,-2(xIx,-,}

Q,= {x:x,~~~x~co},

where the endpoints {x~} are given

Xl = &I, +

q2)

x2 = 5(42 f q3)

07)

X,-I = tkv-1 + 4,).

We have assumed, as we shall hereafter, that the indexing

is such that

q, -C q2 < . . * -C q,.

VI.

QUANTIZATION PROCEDURES

From Sections IV and V we know that we may confine

our attention to quantization schemes defined by 2v - 1

numbers

ql<x, <q,<x,< -..<qv-,<xp-,(qy, (18)

where the {x~} are the endpoints of the intervals {Q,}, as

in (16), and the {

qa}

are the corresponding quanta. We will

regard such a set of numbers as the Cartesian coordinates

of a point

in (2v - 1)-dimensional Euclidean space

E2y-I. The

noise

as a function of

p

has the form

N(p) =Jx’ (ql - x)‘dF(x) +jx2(q2 - x)‘dF(x) + .a.

-cc

+,-‘;, - x)‘dF(x).

(19)

h-1

In an optimum scheme the

{qa}

will be centers of mass

of the corresponding {Q,}, (14), and the ( xa} will lie

midway between adjacent

{q,},

(17). From the derivations

these conditions are sufficient that

N(p)

be a minimum

with respect to variations in each coordinate separately and

hence are necessary conditions at a minimum of

N(p).

As

it turns out, however, they are not sufficient conditions for

a minimum of

N(p).

Points at which (14) and (17) are

satisfied, which we term

stationary points,

while never local

maxima, may be saddle points of

N(p).

Moreover, among

the stationary points there may be several local minima,

only one of which is the sought absolute minimum of N(p).

These complications are discussed further in Appendix B.

The author has not been able to determine sufficient

conditions for an absolute minimum.

The derivations suggest one trial-and-error method for

finding stationary points. A trial point

p(l)

in

E2”-,

is

132

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-28, NO. 2, MARCH 1982

chosen as follows. The endpoints

-00 <xp<xp< . . . <xp, < CQ

are chosen arbitrarily except that each of the resulting

{Qt)} should have nonvanishing mass. Then the centers of

mass of these sets are taken as the first trial quanta

{q:‘)}.

These values will not satisfy the midpoint conditions

(17), in general, so that the second trial point

pc2)

is taken

to be

qL2’ = q;‘),

a = 1,2;. ‘,V

,@) = i(q’2’ + q’?,),

a a

a

a = 1,2;-*,v - 1,

with appropriate modifications if any of the resulting { QL2)}

have vanishing mass. This step does not increase the noise,

in view of the discussion in Section V; that is, N(pc2)) 5

N(

p”)).

The new

{qf)},

centers of mass (c.m.) of the old {Q$},

will not be centers of mass of the new {QL2)}, in general;

trial point

pc3)

is determined by

x(3) = .(2)

,a, = (a ’

ff = 1,2,*.*,v - 1,

c.m. of Qi3)),

a = 1,2;*.,v.

For the resulting noise we have N(pC3)) 5

N(pc2)).

We continue in this way, imposing conditions (14) and

(17) alternately. There results a sequence of trial points

p”’

p’2’

. . .

9 7

(20)

such that

N( p(“) r N(

p’“))

L . . . .

The noise is nonnegative, so that lim,N(

pcm))

will exist,

and we might hope that the sequence (20) had as a limit a

local minimum of N(p).

If the sequence (20) has no limit points then some of the

{xLm)} must become infinite with

m;

this corresponds to

quantizing into fewer than v quanta. Since we have as-

sumed that

F

increases at least by v -t 1 points there will

be quantizing schemes with v quanta for which the result-

ing noise is less than the optimum noise for v - 1 quanta,

obviously. If

p(”

is such a scheme then (20) will have limit

points, using the property that N(p(“)) is a decreasing

sequence.2

Suppose p(“) is such a limit point. If each of the coordi-

nate values {x&m)} of

p(“’

is a continuity point of

F

then it

is easy to see that the coordinates of

p(“)

will satisfy both

(14) and (17). In particular, if N(p) has a unique stationary

point

p,,

(which is the minimum sought), then the sequence

(20), unless it diverges, will converge to

p,,.

Note, by the way, that at a local minimum of N(p) the

numbers {xa} are necessarily continuity points of

F.

Sup-

pose to the contrary that there is a nonvanishing amount of

mass concentrated at one of the,endpoints {x0>, and that

the adjacent sets Q, and Q,,, are as in (16), so that the

mass at x, belongs to Q,. The centers of mass

q,

and

qa+ 1

21t seems likely that this condition A’( p(‘)) 5 (optimum noise for Y - 1

quanta) is stronger than necessary for the nondivergence of (20).

will lie equidistant from x, (17), and from (19) the noise

will not change if we reassign the mass at x, to Q,, 1,

retaining the given (4,) as quanta. But

q,

and

qa+

, are

definitely not centers of mass of the corresponding mod-

ified sets, and the noise will strictly decrease as

qa

and

qa+ 1

are moved to the new centers of mass. Thus the given

configuration is not a local minimum, contrary to assump-

tion. From this result and (19) we see that N(p) is continu-

ous in a neighborhood of a local minimum. We have

proved also that there is no essential loss of generality in

assuming the form (16) for the {Q,}.

We refer to the above trial-and-error method as Method

I. Another trial-and-error method is the following one,

Method II. To simplify the discussion we assume for the

moment that

F

is continuous and nowhere constant. We

choose a trial value

ql

satisfying

41

< (” xdF(x).

J-cc

The condition that

q,

be the center of mass of Q, de-

termines x1 as the unique solution of

/

x’ xdF(x)

41= /:: dF(x) .

-00

The quantities

q,

and x1 now being known, the first of

conditions (17) determines

q2

as

q2

=

2x1

- 41.

If this

q2

lies to the *right of the center of mass of the

interval (x,, co) then the trial chain terminates, and we

start over again with a different trial value

q,.

Otherwise,

x1 and

q2

being known, the second of conditions (14):

serves to determine x2 uniquely. Now the second of condi-

tions (17) gives

cl3 = 2x2 - q2.

We continue in this way, obtaining successively

41, Xl,

* .

.,q,-

,, x,,- ,,

q,;

the last step is the determination

of

q,

according to

4, =

2x,-,

- 4,-l.

(21)

However in this procedure we have not used the last of

conditions (14):

J

co xdF(x)

4, = yz dF(x) ?

(22)

X,-I

and the

q,

obtained from (21) will not satisfy (22) in

general. The discrepancy between the right members of

(21) and (22) will vary continuously with the starting,value

q,,

and the method consists of running through such chains

LLOYD: LEAST SQUARES QUANTIZATION IN PCM

133

using various starting values until the discrepancy is re-

duced to zero.

This method is applicable to more general

F,

with some

obvious modifications. When

F

has intervals of constancy

the {xJ may not be uniquely determined by conditions

(14), and a trial chain may involve several arbitrary param-

eters besides

q,.

Discontinuities of

F

will cause no real

trouble, since we know that the {xu} of an optimum

scheme are continuity points of F, a trial chain that does

not have this property is discarded. We note that Method

II may be used to locate all stationary points of N(p).

VII.

EXAMPLES

In all of the examples we now consider, the distribution

of sample values is absolutely continuous with a sample

probability density f = F’, which is an even function. If

N(p) has a unique stationary point, which we assume to be

the case in the examples treated, then the optimum

{qa}

and {xa} will clearly be symmetrically distributed around

the origin. In applications we are usually interested in

having an even number of quanta, v = 2~~ so we renumber

the positive endpoints and quanta according to

O=xo<q,<x,< . ..<q.-,<x,-,<q,; (23)

the endpoints and quanta for the negative half-axis .are the

negatives of these.

We normalize to unit signal power S = 1. The {q,} and

{x~} for other values of S are to be obtained by multiply-

ing the numbers in the tables by \rs.

The simplest case is the uniform distribution:

Method II of the preceding section shows that N(p) in this

case has a unique stationary point, which is necessarily an

absolute minimum. The optimum scheme is the usual one

with v equal intervals of width l/(2&) each; the quanta

being the midpoints of these intervals. The minimum value

of the noise is the familiar N = l/v*.

Another case of possible interest is the Gaussian:

-cQ~x-=c<.

TABLE I

GAUSSIAN,Y = 2

a

40

x,

1

0.7919

co

TABLE II

GAUSSIAN,~

= 4

a

4a

X,

1 0.4528

0.9816

2 1.5104

w

TABLE III

GAUSSIAN,Y = 8

a

4a X,

1

0.245 1 0.5006

2 0.7560

1.0500

3 1.3439

1.7480

4 2.1520

co

TABLE IV

GAUSSXAN,~

= 16

a

1 0.7584

0.%82

2 0.3880

0.5224

3 0.6568

0.7996

4 0.9423

1.0993

5 1.2562

1.4371

6 1.6181

1.8435

7 2.0690

2.4008

8 2.1326

co

TABLE VIII

GAUSSIAN;OPTIMUMNOISEFORVARIOUSVALUES OF Y

Y

N

Y2N

YXl

2 0.3634

1.452

4 0.1175

1.880 3.93

8 3.455 x 10-2

2.205 4.00

16 9.500 x 10-3

2.430 4.13

32

64

128

(03)

(0)

(2.72)

(4.34)

For speech signals a distribution which has been found

useful empirically is the Laplacian: 4

-lxl!h

f(x) = c-.-

6 ’

-m<xX<.

The optimum quantizing schemes for this distribution for

The optimum schemes for v = 2’,

b =

1,2,. . . ,7, are given

v=2b,b=

1,2,-e.

,7, are given in Tables IX-XV, respec-

in Tables I-VIL3 respectively. The corresponding noise

tively. The corresponding N, v*N, and vx, values are given

values appear in Table VIII together with the quantities

in Table XVI; again, we notice certain regularities.

v*N and vx,. The behavior of these latter with increasing v

VIII.

ASYMPTOTIC PROPERTIES

hint at the existence of asymptotic properties; we examine

this question in the next section.

Let us assume that the distribution

F

is absolutely con-

tinuous with density function

f

=

F’,

which is itself dif-

‘Since some of the tables were never completed, those tables although

mentioned in text are not included in this paper.

4The author is indebted to V. Vyssotsky of the Acoustics Research

Group for this information (private communication).

Least squares quantization in PCM

Figures

Citations

Wasserstein Blue Noise Sampling

Remapping the cognitive and neural profiles of children who struggle at school.

On the reaction to deprecation of clients of 4 + 1 popular Java APIs and the JDK

Quantifying receptor trafficking and colocalization with confocal microscopy

Strong Coresets for Hard and Soft Bregman Clustering with Applications to Exponential Family Mixtures

References

The Philosophy of PCM

A Generalized Sampling Theorem

Quantization Distortion in Pulse-Count Modulation with Nonuniform Spacing of Levels

Instantaneous companding of quantized signals

Related Papers (5)

Some methods for classification and analysis of multivariate observations

A density-based algorithm for discovering clusters in large spatial Databases with Noise

Maximum likelihood from incomplete data via the EM algorithm

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

A K-Means Clustering Algorithm