VLSI Architectures for Computing Multiplications and Inverses in GF(2 m )

doi:10.1109/TC.1985.1676616

N84

13406

TDA

Progress

Report

42-75

July-September

1983

VLSl

Architectures

for

Computing Multiplications

and

Inverses

in

GF(2")

C.

Wang,

T.

K.

Truong,

H.

M.

Shao and

L.

J.

Dectsch

Communications

Sys?ems

Research

Sectlon

J.

Y.

Ovura

Universitv

of

California,

Los

Angeles

I.

S.

Reed

Universitv

of

Southern

California

Finite field arithmetic logic

is

central in the implementation

of

Reed-Solomon coders

and

ik

some cnptoqaphic algorithms. There

is

a

need

for

good multiplication and

inversion clgorithms thct can be easily realized

on

VISI

chips. Massey and Omura

recent& developed

a

new

multiplication algorithm for Galois fie;ds based on a normal

basis representation

In

this paper, a pipeline structure is developed to realize t!:e

hlassey-Omura multiplier in the finite field

CF(19"

).

With the simple squaring property of

the normal-basis representation used togerher with this multiplier, a pipeline architecture

is

also developed

for

computing inverse elements in GF(Zm). The desipis developed

.for

the Masse).-Omura multiplier and the computation

of

inzerse elements are reguk, simple,

expandable and, therefore, natural& suitable

for

VLSI

implementation

1.

introduction

Recently. Masse); and Omura (Ref.

I)

invented a multiplier

which obtains the product

of

two elements in the finite field

GF(2m).

In their invention. they utilize a normal basis of form

{a,

a2. a4..

.

,

a2"-']

to represent elements of the field

where

a

is the

roo:

of an irreducible polynomial

of

degrre

m

over

GF(2).

In this basis each element

In

the field

G'fl

Zm,l

can

be reprebented hy

m

binary digits.

In the norrnal-balrs representation the sqiuriny

of

an ele-

ment in

GF(Zm)

is

readily shown

to

be simple cyclic shift (if

its binary digits. Multiplication in the

irurnial

basis representa-

tions requires for any one product digit the same logic cir-

cuitry as it does for any other product digit. Adjacent

product-digit circuits differ only in their inputs which are

cyclicaliy shifted versions of one another. In this paper.

a

pipeline architecture suitable for

VLSl

design

is

developed for

a Massey-Orniira multiplier

on

GF(

Zm

).

The conventional method for finding an inverse element in

a finite field uses either table look-up

or

Euclid's Jlgorithms.

These rrlethcds are not easily realized in a

VLSI

circuit. How-

ever, usirig a Massey-Omura multiplier. a recursive. pipeline.

inverbion circu'!

is

developed.

This

structure consists

of

four

52

ORlGlNAL

PAGE

IS

OF

POOR

QUALITY

sets

of

shift registers. one parallel-type Massey-Omll:d niulti-

plier 2nd two control signals.

Such

a design is regular. simple

and expandable and, hence, naturallv suit, ble for

VLSl

imple-

menta tion.

II.

Squaring and Multiplying

in

a

Normal

Basis Representation

In this section. the work originally described by Massey and

Omura (Ref.

11

is

reviewed. It is well known that there always

exists a normal basis in the finite field

)

(Ref.

2)

for

all

positive integers,

m.

That is. one can find a field element

a

such that

N=

{a,

a2,

a4,.

.

,

CYZ(~-')) is a basis set

of

GflP

1.

Thus every field element

E

GF(2,)

can be uniqueiy

expressed as

Thus, if

0

is

represented as

a

vector

of

components of :he

normal basis elements

of

GF(?,)

in the

form

p=

[bo. b,,

b,.-~-,b,~i~,then~2=[b,~i,bo.bi.~

..bm-21.1nthe

normal basis representation

p2

is

a

cyclic shift

of

p.

Hence

squaring in

GF(2"')

can be realired physically by logic cir-

cuitry which accomplishes cyclic shifts in a binary register.

Such squaring circuitry is dlustrated in block form in

Fig.

I.

By

(2)

and

(3)

it is readily seen that

1

=

a+

a'

t

a4

t

. .

.

4-

for any element

a

in

GF(?).

This implies that the

normal basis representation of

I

is(1,

1. 1.

.

. .

.

1).

Letp=(bo, b,;.., bm-i] andy=[c,,c,:

'.c,-~]

be two elements of

CF(2'")

in a normal

bms

representation.

Then the last term

d,-l

of the product,

where

bo,

b,. b,.

.

. . .

bm-i are binary digits and addition is

mod-? addition.

Three useful properties of a finite field

GF(2"

)

are stated

here without proof (for prciofs see, for example, Ref.

2).

These

properties are:

(1)

Squaring

in

GF(?)

:sa linear operation. That is. givzn

any two elements

a

and

P

in

GF(lm).

(2)

For

any element

a

of

GF(?".).

(3)

If

a

is

a root of any irreducible polynomial

P(x)

of

degree

v

c%-:*-

t;F(2),

the powers.

a.

a,.

a4..

.

az("-;!,

-.re

:,?

YF'

3'

'>

dnd constitute

d

comp!e' set

of

roc\.

P'

Dt-::

With regard

to

p:.->e;:;.

i,:.;

.?'.ceison and Weidon (Ref.

3)

list

a set

of

irreducll

-

i.c!;

r.c.-wds

of

degree

VI

Q

34

over

GF(

?)

for which the

roo:s

La,

c'.

a4,.

. .

.

a~(~-')}

are linearly

independent. These hear iridependent roots clearly form

3

normal basis of

GF(2'").

Suppose that

{a.

a*,

a4

.

a2(m-i)} is a normal basis

of

GF(

2").

By

(2)

and

(3)

the square

of

(1) is

p2

=

bod

+

b,

a4

+

b2a8

+.

.

+?-

b,-2

+

bm-la2m

is

some

binary funciion of the components

of

0

and

y,

i.e.,

-

dm-,

-

f(bo,

b,,

...

,

bm-l;

co.

C1'.

' '

X,-J

(6)

Since squaring means a cyclic shift of an element in a normal

basis representation. one has

62

=

p2

.

yz

J

Hence the last component

d,-,

of

6,

is obtained by the same

function fin

(6)

operation on the components

of

p2

and

7'.

That is.

dm-,=

f(b,-,, bo,

b,;..

.

hm-2:

c,-~,

co.

cl,

.

c,-~).

By squaring

6

repeatedly, it

is

evident that

..

ORIGINAL

PAGE

IS

OF

POOR

QUALITY

The equations in

(8)

define the Massey-Omura multiplier.

In the normal basis representation this multiplier has the

pioperty that the same logic function

f

which is used

to

find

.the last component of

dm-l

of the product

S

can be used to

find sequentially the remaining components

d,-,,

dm-,,

. .

.

,

do

of the product. This featurc of the product

operation requires only one logic functionfof the

2m

compo-

nents of

fl

and

y

to sequential!? compute the

rn

components of

the prod-lct.

By

(10)

and the fact that

a4

=a3

+

I,

one obtains

d,

=

b2c2

+

b,c,

+

b,c,

+

b3c,

+

blc3

+

b3r0

+

boc,

+

blco

+

bocl

d,

=

b,c,

+

b,c,

+

Clc2

+

b,co

+

bOc2

+-

b2c3

+

b,c2

+

boc3

+

b3co

Figure

2

illustrates the logic diagram of the above-desciibed

sequential-type Massey-Omura mdtiplier on

GF(

?).

A:ter-

nately, for parallel operation this feature permits the use of

m

identical !ogic furrctions,

f.

for calculating simultaneously

all

components of the product. In the latter case, the inputs

to

the

rn

logic functions fare connected directly to the compo-

nents of

/3

and

y.

The only difference in the conni-Ltio.

s

to the

components of

0

or

y

to a functionfis that they are cyclically

shifted versions cf one another. Figure

3

shows the structure

of the parallel-type Massey-Omura multiplier for the simple

case of

rn

=

4.

The extension of this type of structure to a

general case

of

GF(Zm)

is straightforward.

d, =

boco

+

b,co

+

bocl

+

blc3

+

b,c,

+

b,c,

+

b2c1

+

b3c2

+

b,c,

do

=

b,c,

+

bor,

+

b,co

+

boc,

+

b2co

+

b

c

+

b,co

+

b2cl

+

b,c,

01

Comparilig

(1 1)

with

(8).

the functionfis given by

111.

A

Pipeline Structure

for

Implementing

fCb,.

b,.

b,,

b,;

cor

el, c,, c,)

Massey-Omura Multiplier

A deta.’ed design of

a

Massey-Omura mdtiplier is now

developed for the finite field GF(24). As illustrated

in

Figs.

2

and

3.

the design of either the seqtiential-type or parallel-type

Massey-Omura multiplier must focus on the product func-

tion

f:

=

b2c2

+

b3c2

+

b2c,

+

b3c1

+

b1c3

-+

b3co

+

bCc3

+

blcO

+

bocl

(12)

The design off begins with the sele2tion of an irreducible

polynomial

P(.xl=

x4

+

x3

+

1

of degree

rn

=

4

over

GF(2).

This particular polynomial function has linearly indcpendent

roots, namely.

a.

a2,

a4

and

a’.

Hence. the set of roots

{a,

a*.

a4,

a’}

constitutes a normal basis of

CF(24).

Any two

eieinents

/3

and

y

in

GF(7,4)

can be expressed as

/3

=

bo

a

f

b,

a’

+

b,

a4

t

b,

a’

(9)

y

=

co

a

+

c1

a2

+

c2

a4

+

c,

a’

Since the mod-2 sum in

(12)

can be implemented by the

“exclusive

or”

operation (XOR). the structure

of

the product

funcrion

f

can be represented by the logic circuit in Fig.

4.

Ths circuit consists of two portions; the left half is an AND

plane which computes each term of

(

12), while the right half is

XOR plane which computes the mod-2 sum. The input5 to the

AND phe are the complements of the components of

/3

and

y.

This is due

to

the fact that the AND operation in the AND

plane is obtained by the NOR operation on the complements

of the two digits being ANDed, Le.,

xy

=

(X

+

7)

where

X

is the

complement

of

X.

A pipeline structure of a Massey-Omura multiplier for

GF(,Z4)

is shown in Fig.

5.

This structure has a sequential type

of operation. For each of the two inputs. corresponding to

/3

and

7.

to

theffunction, an inverter. two sets of shift registers,

B

and

R,

and

11

gate transistors are utilized. Note that regis-

ters

B

and

R

have an identical circuit structure.

By

(4)

the product of

p

and

y

IS

(

In Fig.

5

during thc first three clock cycles. when signal

LD

=

0,

the complements of

5,,

b,,

6,

and

c,,

c1

are fed

6

=

*

y

=

(bo

a

+

b,

a’

+

b,

a4

+

b,

a’)

*

(cc

a

+

c1

a*

+

c2

u4

+

c3

a’)

=

do

a

+

d,

a2

+

d,

a4

+

d3

a’

54

ORIGINAL

PAGE

rs

OF

POOR

QUALITY

sequenti. Ily into three buffer flipflops

B,

for

(k

=

I,

2,

3).

At

the fourt

I

clock cycle, when

Ld

=

1.

the idlues

of

z,,

b2,

and

7,.

F2,

T,,

previously stored in buffer registers

B,

aild

bo

and

Fo

are ihif!ed into the second set of registeis R, for

(k

=

1,

2.

3,

4).

Then the R-registers are cyclically shifted

Such a cyclic-shift operation is needed to sequentially yield

the product components

d,,

d,.

d,

2nd

do

of

6.

While the

R-registers are cyclically shifting the components of

0

(or

y).

the components of another elemen: in

CF(Z4)

following

0

(or

7)

can be fed into the buffer B-registers. Therefore, the

structure in Fig.

5

provides a pipeline operation in which no

time is lost except for an initial fixed time delay. The VLSl

layout of a Massey-Omura multiplier for

GF(24)

is shown in

Fig.

6.

Figure

7

illustiates a system structure of a pipelined

Massey-Omura multiplier for

GF(zm).

For this ger,eral case

over

GF(2m),

the buffer and the cyclic shift mechanism in

Fig.

7

have

m-

1

and

m

stages, respectively. Each stage con-

sists of a shift register and a gate transistor. The product

function

f

is a mod-2 sum

oi

AND products of the compo-

nents of the two inputs being multiplied. Such

a

circuit for

function fconsists of an AND programmed logic ariay (PLA)

(Ref.4) followed by an XOR sequential-PLA. In the XOR

sequential-PLA there are several levels of XORs. At each level,

the inputs, pair-by-pair, are fed sequentiaiiy one-by-one into

an

XOR

as shown in Fig.

4.

Let

n(j)

be the number of XOR circclits at thej-th level of

the XOR sequential-PLA. Then

n(j

+

1)

=

[n(j)iZ]

where

[x]

is the smallest integer greater than

x

and wbere initially.

n(0)

=

total number of terms to be XORed in product func-

tion

f.

At the last level. there ib only one XOR circuit and the

output

is

the value

off.

In general. it"

k

denotes ?he number

of levels required in the XOR scquential-PLA,

k

=

[log2n(0)j.

It should be noted that as

m

gets large. the mmber

of

mod-:! sums in the functionfbecomes large. In this case. more

XORs and as a consequence more levels iri the XOR sequen-

tial-PLA are required.

To

maximize thz pipeline operation

speed. shift registers are required between the

XOR

levels in

order to store the XOR outputs of the intermediate levels.

Another approach to the realization of product function

is to use a standard ANDGR PLA (Ref. 4). This is possible

since

x

7

v

=

Xy

v

xu

where

v

denotes inclusive OR. In general,

although the design off by the use of such a

PLA

is tedious.

the prodx! function

f

can be accomplished in less than one

clr,:k cycle. One trdde-off

for

such a design is

the

large chip

area required. The required area for such

a

PLA increases

dramatically with

m.

Hence.

d

design utiliiing

a

standard

AND-OR

PLA

to realuefis practical only for small

m.

IV.

A

Pipeline Structure for Computing an

Inverse Element in the Finite Field

W(2m))

For

any

a

in the finite i:eld

GF(2m),

drn

=

a.

Hence the

Let

2m

-

2

be decoiiiposed

as

inverse of

a

is

a-'

=

2

+

2,

+

Z3

+

.

+

2m-1,

then

a-'

can be expressed

as

2

3

a-'

=

(a').

(a2

)

*

(a2

)

*

. . .

*

(azm-')

(13)

As discussed in Section

11.

if

a

is

represented in a normal basis,

squaring can be realized by a cyclic shift cpeiation.

az'

is the

j-th cyclical shift

(CS)

of

a.

Thus. the inverse e!ement

a-'

can

be obtained by using successive cyclic-shift operations and a

Massey-Omura mu!tiplier. The zlgorithm fc;r

a-'

is the

fol

I

o

w

i

ng

:

(1) Obtain the cyclic shift of

a,

:.e..

a'

=

CS(a)

where

CS

denotes the cylic shift function. Let

B=

CS(a)

and

C=l.Letk=O.

(2)

Multiply

B

2nd

C

to obtain the product.

D=

B

*

C.

Set

(3)

If

k=

m

-

I,

a-'

=

D.

Stop. If

k<m

-

1.

let

B=CS(B)

and

C

=

D.

(4)

Go

back

to

(2).

k=k+l.

Figure

8

shows a flow chart diagram of this procedure.

This recursive algorithm for computing an inverse element

in

GF(2,)

can be realized using the circuit shown in Fig.

9.

In

this circuit the parallel-type Massey-hurd multiplier shown in

Fig.

3

with the circuit for the product function

f

shown in

Fig.

4

is utilized.

To

illustrate.

let

Ld,

and

Ld,

be two control signals with

period of four clock signals as shown in Fig.

9.

Also let the

normal basis representation

of

a

be

(ao,

a,,

a,, a,).

At the end

of the third clock pulse. the valuesa,,

5,.

a

J.

are stored in the

input buffer flipflops

B,,

B,.

respectively. During the

four clock cycle.

i3,

a,,,

Zi and

ii,

are simultaneously shifted

to

R,,

R,, R, and R,. respectively. With the appropriate

connections among thc input buffer flip-flops

B,

and flip-flops

Rk,

the cyclic shift of

6

=

(uo,

a,. a,,

u,).

i.e..

(r2

=

(a

3'

a

0'

a,,

a,!

is

obtained inR. At the fourth clock pulse

R,,

R,

are

also

fed the value

"0".

These four complementary

values of

"I"

introduce the element

1

in

GF(24).

As

it

was discussed in Section

11.

a

parallel-type

CF(?)

Massey-Omura multiplier simultaneously

y1e't.k

four product

components

do.

d,,

d2,

d,.

Therefore. during the next three

clocks three successive multiplicdtions. i.e..

0,

=

1

*

a'.

0,

=

0,

a4

and

0,

=

0,

*

a*

are performed for the irlversioi1.

When the third multiplication is completed.

Ld,

=

I.

Thus

55

the output product digits, which together rrpreient the clock cycles, the circuit

ir.

Fig. 9 allows the

bits

of the next

inverse element

CY-',

are

fed

into the output buffer flip-flops eiei

lent

(following

a)

to

be fed into it and the bits

of

the

B

Finally these are sequentially shifted

from

the inversion previous element to be shifted

out

of

it. simultaneously. This

type

of

circuit provides a full pipeline capability. A

VLSl

circuit.

layout of the pipeline inversion circuitry for

CF(2*)

is pre-

sented

in

Fig.

10.

Figure

11

shows the system structure

of

an

inversion circuit

for

the general finite fieldFG(2m).

k'

The above technique for computing the inverse of an el*

ment

in

CF'(~~)

takes four clock cycles. During these four

Ref

e

fences

1.

Massey,

J.

L.,

and @mura,

J.

K.,

Patent Application

of

Computational Method and

Apparatus

for

Finite Field Arithmetic,

submitted in

198

I.

2.

MacWilliams. F.

J.,

and Sloane, N.

J.

A..

The Theory

of

Emr-Con.ecting Codes,

North-Holland Publishing, New York,

1977.

3.

Peterson,

W.

W., and Weldon,

E.

J.,

Jr.,

Error-Correcting Codes,

MIT Press, Cambridge,

4.

Mead. C., arid Conway,

L.,

Introduction to

VLSI

Systems,

Addison-Wesley, Reading,

1972.

1980.

VLSI Architectures for Computing Multiplications and Inverses in GF(2 m )

Summary (1 min read)

1. introduction

ORlGlNAL PAGE IS OF POOR QUALITY

OF POOR QUALITY

Massey-Omura Multiplier

IV. A Pipeline

Citations

References

Related Papers (5)