N84
13406
TDA
Progress
Report
42-75
July-September
1983
VLSl
Architectures
for
Computing Multiplications
and
Inverses
in
GF(2")
C.
C.
Wang,
T.
K.
Truong,
H.
M.
Shao and
L.
J.
Dectsch
Communications
Sys?ems
Research
Sectlon
J.
Y.
Ovura
Universitv
of
California,
Los
Angeles
I.
S.
Reed
Universitv
of
Southern
California
Finite field arithmetic logic
is
central in the implementation
of
Reed-Solomon coders
and
ik
some cnptoqaphic algorithms. There
is
a
need
for
good multiplication and
inversion clgorithms thct can be easily realized
on
VISI
chips. Massey and Omura
recent& developed
a
new
multiplication algorithm for Galois fie;ds based on a normal
basis representation
In
this paper, a pipeline structure is developed to realize t!:e
hlassey-Omura multiplier in the finite field
CF(19"
).
With the simple squaring property of
the normal-basis representation used togerher with this multiplier, a pipeline architecture
is
also developed
for
computing inverse elements in GF(Zm). The desipis developed
.for
the Masse).-Omura multiplier and the computation
of
inzerse elements are reguk, simple,
expandable and, therefore, natural& suitable
for
VLSI
implementation
1.
introduction
Recently. Masse); and Omura (Ref.
I)
invented a multiplier
which obtains the product
of
two elements in the finite field
GF(2m).
In their invention. they utilize a normal basis of form
{a,
a2. a4..
.
.
,
a2"-']
to represent elements of the field
where
a
is the
roo:
of an irreducible polynomial
of
degrre
m
over
GF(2).
In this basis each element
In
the field
G'fl
Zm,l
can
be reprebented hy
m
binary digits.
In the norrnal-balrs representation the sqiuriny
of
an ele-
ment in
GF(Zm)
is
readily shown
to
be simple cyclic shift (if
its binary digits. Multiplication in the
irurnial
basis representa-
tions requires for any one product digit the same logic cir-
cuitry as it does for any other product digit. Adjacent
product-digit circuits differ only in their inputs which are
cyclicaliy shifted versions of one another. In this paper.
a
pipeline architecture suitable for
VLSl
design
is
developed for
a Massey-Orniira multiplier
on
GF(
Zm
).
The conventional method for finding an inverse element in
a finite field uses either table look-up
or
Euclid's Jlgorithms.
These rrlethcds are not easily realized in a
VLSI
circuit. How-
ever, usirig a Massey-Omura multiplier. a recursive. pipeline.
inverbion circu'!
is
developed.
This
structure consists
of
four
52
ORlGlNAL
PAGE
IS
OF
POOR
QUALITY
sets
of
shift registers. one parallel-type Massey-Omll:d niulti-
plier 2nd two control signals.
Such
a design is regular. simple
and expandable and, hence, naturallv suit, ble for
VLSl
imple-
menta tion.
II.
Squaring and Multiplying
in
a
Normal
Basis Representation
In this section. the work originally described by Massey and
Omura (Ref.
11
is
reviewed. It is well known that there always
exists a normal basis in the finite field
)
(Ref.
2)
for
all
positive integers,
m.
That is. one can find a field element
a
such that
N=
{a,
a2,
a4,.
.
.
,
CYZ(~-')) is a basis set
of
GflP
1.
Thus every field element
E
GF(2,)
can be uniqueiy
expressed as
Thus, if
0
is
represented as
a
vector
of
components of :he
normal basis elements
of
GF(?,)
in the
form
p=
[bo. b,,
b,.-~-,b,~i~,then~2=[b,~i,bo.bi.~
..bm-21.1nthe
normal basis representation
p2
is
a
cyclic shift
of
p.
Hence
squaring in
GF(2"')
can be realired physically by logic cir-
cuitry which accomplishes cyclic shifts in a binary register.
Such squaring circuitry is dlustrated in block form in
Fig.
I.
By
(2)
and
(3)
it is readily seen that
1
=
a+
a'
t
a4
t
. .
.
4-
for any element
a
in
GF(?).
This implies that the
normal basis representation of
I
is(1,
1. 1.
.
. .
.
1).
Letp=(bo, b,;.., bm-i] andy=[c,,c,:
'.c,-~]
be two elements of
CF(2'")
in a normal
bms
representation.
Then the last term
d,-l
of the product,
where
bo,
b,. b,.
.
. . .
bm-i are binary digits and addition is
mod-? addition.
Three useful properties of a finite field
GF(2"
)
are stated
here without proof (for prciofs see, for example, Ref.
2).
These
properties are:
(1)
Squaring
in
GF(?)
:sa linear operation. That is. givzn
any two elements
a
and
P
in
GF(lm).
(2)
For
any element
a
of
GF(?".).
(3)
If
a
is
a root of any irreducible polynomial
P(x)
of
degree
v
c%-:*-
t;F(2),
the powers.
a.
a,.
a4..
.
.
.
az("-;!,
-.re
:,?
YF'
3'
'>
dnd constitute
d
comp!e' set
of
roc\.
P'
Dt-::
With regard
to
p:.->e;:;.
i,:.;
.?'.ceison and Weidon (Ref.
3)
list
a set
of
irreducll
-
i.c!;
r.c.-wds
of
degree
VI
Q
34
over
GF(
?)
for which the
roo:s
La,
c'.
a4,.
. .
.
a~(~-')}
are linearly
independent. These hear iridependent roots clearly form
3
normal basis of
GF(2'").
Suppose that
{a.
a*,
a4
.
.
.
.
a2(m-i)} is a normal basis
of
GF(
2").
By
(2)
and
(3)
the square
of
(1) is
p2
=
bod
+
b,
a4
+
b2a8
+.
.
+?-
b,-2
+
bm-la2m
is
some
binary funciion of the components
of
0
and
y,
i.e.,
-
dm-,
-
f(bo,
b,,
...
,
bm-l;
co.
C1'.
' '
X,-J
(6)
Since squaring means a cyclic shift of an element in a normal
basis representation. one has
62
=
p2
.
yz
J
Hence the last component
d,-,
of
6,
is obtained by the same
function fin
(6)
operation on the components
of
p2
and
7'.
That is.
dm-,=
f(b,-,, bo,
b,;..
.
hm-2:
c,-~,
co.
cl,
.
.
.
.
c,-~).
By squaring
6
repeatedly, it
is
evident that
..
ORIGINAL
PAGE
IS
OF
POOR
QUALITY
The equations in
(8)
define the Massey-Omura multiplier.
In the normal basis representation this multiplier has the
pioperty that the same logic function
f
which is used
to
find
.the last component of
dm-l
of the product
S
can be used to
find sequentially the remaining components
d,-,,
dm-,,
. .
.
,
do
of the product. This featurc of the product
operation requires only one logic functionfof the
2m
compo-
nents of
fl
and
y
to sequential!? compute the
rn
components of
the prod-lct.
By
(10)
and the fact that
a4
=a3
+
I,
one obtains
d,
=
b2c2
+
b,c,
+
b,c,
+
b3c,
+
blc3
+
b3r0
+
boc,
+
blco
+
bocl
d,
=
b,c,
+
b,c,
+
Clc2
+
b,co
+
bOc2
+-
b2c3
+
b,c2
+
boc3
+
b3co
Figure
2
illustrates the logic diagram of the above-desciibed
sequential-type Massey-Omura mdtiplier on
GF(
?).
A:ter-
nately, for parallel operation this feature permits the use of
m
identical !ogic furrctions,
f.
for calculating simultaneously
all
components of the product. In the latter case, the inputs
to
the
rn
logic functions fare connected directly to the compo-
nents of
/3
and
y.
The only difference in the conni-Ltio.
s
to the
components of
0
or
y
to a functionfis that they are cyclically
shifted versions cf one another. Figure
3
shows the structure
of the parallel-type Massey-Omura multiplier for the simple
case of
rn
=
4.
The extension of this type of structure to a
general case
of
GF(Zm)
is straightforward.
d, =
boco
+
b,co
+
bocl
+
blc3
+
b,c,
+
b,c,
+
b2c1
+
b3c2
+
b,c,
do
=
b,c,
+
bor,
+
b,co
+
boc,
+
b2co
+
b
c
+
b,co
+
b2cl
+
b,c,
01
Comparilig
(1 1)
with
(8).
the functionfis given by
111.
A
Pipeline Structure
for
Implementing
fCb,.
b,.
b,,
b,;
cor
el, c,, c,)
Massey-Omura Multiplier
A deta.’ed design of
a
Massey-Omura mdtiplier is now
developed for the finite field GF(24). As illustrated
in
Figs.
2
and
3.
the design of either the seqtiential-type or parallel-type
Massey-Omura multiplier must focus on the product func-
tion
f:
=
b2c2
+
b3c2
+
b2c,
+
b3c1
+
b1c3
-+
b3co
+
bCc3
+
blcO
+
bocl
(12)
The design off begins with the sele2tion of an irreducible
polynomial
P(.xl=
x4
+
x3
+
1
of degree
rn
=
4
over
GF(2).
This particular polynomial function has linearly indcpendent
roots, namely.
a.
a2,
a4
and
a’.
Hence. the set of roots
{a,
a*.
a4,
a’}
constitutes a normal basis of
CF(24).
Any two
eieinents
/3
and
y
in
GF(7,4)
can be expressed as
/3
=
bo
a
f
b,
a’
+
b,
a4
t
b,
a’
(9)
y
=
co
a
+
c1
a2
+
c2
a4
+
c,
a’
Since the mod-2 sum in
(12)
can be implemented by the
“exclusive
or”
operation (XOR). the structure
of
the product
funcrion
f
can be represented by the logic circuit in Fig.
4.
Ths circuit consists of two portions; the left half is an AND
plane which computes each term of
(
12), while the right half is
XOR plane which computes the mod-2 sum. The input5 to the
AND phe are the complements of the components of
/3
and
y.
This is due
to
the fact that the AND operation in the AND
plane is obtained by the NOR operation on the complements
of the two digits being ANDed, Le.,
xy
=
(X
+
7)
where
X
is the
complement
of
X.
A pipeline structure of a Massey-Omura multiplier for
GF(,Z4)
is shown in Fig.
5.
This structure has a sequential type
of operation. For each of the two inputs. corresponding to
/3
and
7.
to
theffunction, an inverter. two sets of shift registers,
B
and
R,
and
11
gate transistors are utilized. Note that regis-
ters
B
and
R
have an identical circuit structure.
By
(4)
the product of
p
and
y
IS
(
In Fig.
5
during thc first three clock cycles. when signal
LD
=
0,
the complements of
5,,
b,,
6,
and
c,,
c,,
c1
are fed
6
=
*
y
=
(bo
a
+
b,
a’
+
b,
a4
+
b,
a’)
*
(cc
a
+
c1
a*
+
c2
u4
+
c3
a’)
=
do
a
+
d,
a2
+
d,
a4
+
d3
a’
54
ORIGINAL
PAGE
rs
OF
POOR
QUALITY
sequenti. Ily into three buffer flipflops
B,
for
(k
=
I,
2,
3).
At
the fourt
I
clock cycle, when
Ld
=
1.
the idlues
of
z,,
b2,
and
7,.
F2,
T,,
previously stored in buffer registers
B,
aild
bo
and
Fo
are ihif!ed into the second set of registeis R, for
(k
=
1,
2.
3,
4).
Then the R-registers are cyclically shifted
Such a cyclic-shift operation is needed to sequentially yield
the product components
d,,
d,.
d,
2nd
do
of
6.
While the
R-registers are cyclically shifting the components of
0
(or
y).
the components of another elemen: in
CF(Z4)
following
0
(or
7)
can be fed into the buffer B-registers. Therefore, the
structure in Fig.
5
provides a pipeline operation in which no
time is lost except for an initial fixed time delay. The VLSl
layout of a Massey-Omura multiplier for
GF(24)
is shown in
Fig.
6.
Figure
7
illustiates a system structure of a pipelined
Massey-Omura multiplier for
GF(zm).
For this ger,eral case
over
GF(2m),
the buffer and the cyclic shift mechanism in
Fig.
7
have
m-
1
and
m
stages, respectively. Each stage con-
sists of a shift register and a gate transistor. The product
function
f
is a mod-2 sum
oi
AND products of the compo-
nents of the two inputs being multiplied. Such
a
circuit for
function fconsists of an AND programmed logic ariay (PLA)
(Ref.4) followed by an XOR sequential-PLA. In the XOR
sequential-PLA there are several levels of XORs. At each level,
the inputs, pair-by-pair, are fed sequentiaiiy one-by-one into
an
XOR
as shown in Fig.
4.
Let
n(j)
be the number of XOR circclits at thej-th level of
the XOR sequential-PLA. Then
n(j
+
1)
=
[n(j)iZ]
where
[x]
is the smallest integer greater than
x
and wbere initially.
n(0)
=
total number of terms to be XORed in product func-
tion
f.
At the last level. there ib only one XOR circuit and the
output
is
the value
off.
In general. it"
k
denotes ?he number
of levels required in the XOR scquential-PLA,
k
=
[log2n(0)j.
It should be noted that as
m
gets large. the mmber
of
mod-:! sums in the functionfbecomes large. In this case. more
XORs and as a consequence more levels iri the XOR sequen-
tial-PLA are required.
To
maximize thz pipeline operation
speed. shift registers are required between the
XOR
levels in
order to store the XOR outputs of the intermediate levels.
Another approach to the realization of product function
is to use a standard ANDGR PLA (Ref. 4). This is possible
since
x
7
v
=
Xy
v
xu
where
v
denotes inclusive OR. In general,
although the design off by the use of such a
PLA
is tedious.
the prodx! function
f
can be accomplished in less than one
clr,:k cycle. One trdde-off
for
such a design is
the
large chip
area required. The required area for such
a
PLA increases
dramatically with
m.
Hence.
d
design utiliiing
a
standard
AND-OR
PLA
to realuefis practical only for small
m.
IV.
A
Pipeline Structure for Computing an
Inverse Element in the Finite Field
W(2m))
For
any
a
in the finite i:eld
GF(2m),
drn
=
a.
Hence the
Let
2m
-
2
be decoiiiposed
as
inverse of
a
is
a-'
=
2
+
2,
+
Z3
+
.
.
.
+
2m-1,
then
a-'
can be expressed
as
2
3
a-'
=
(a').
(a2
)
*
(a2
)
*
. . .
*
(azm-')
(13)
As discussed in Section
11.
if
a
is
represented in a normal basis,
squaring can be realized by a cyclic shift cpeiation.
az'
is the
j-th cyclical shift
(CS)
of
a.
Thus. the inverse e!ement
a-'
can
be obtained by using successive cyclic-shift operations and a
Massey-Omura mu!tiplier. The zlgorithm fc;r
a-'
is the
fol
I
o
w
i
ng
:
(1) Obtain the cyclic shift of
a,
:.e..
a'
=
CS(a)
where
CS
denotes the cylic shift function. Let
B=
CS(a)
and
C=l.Letk=O.
(2)
Multiply
B
2nd
C
to obtain the product.
D=
B
*
C.
Set
(3)
If
k=
m
-
I,
a-'
=
D.
Stop. If
k<m
-
1.
let
B=CS(B)
and
C
=
D.
(4)
Go
back
to
(2).
k=k+l.
Figure
8
shows a flow chart diagram of this procedure.
This recursive algorithm for computing an inverse element
in
GF(2,)
can be realized using the circuit shown in Fig.
9.
In
this circuit the parallel-type Massey-hurd multiplier shown in
Fig.
3
with the circuit for the product function
f
shown in
Fig.
4
is utilized.
To
illustrate.
let
Ld,
and
Ld,
be two control signals with
period of four clock signals as shown in Fig.
9.
Also let the
normal basis representation
of
a
be
(ao,
a,,
a,, a,).
At the end
of the third clock pulse. the valuesa,,
5,.
a
J.
are stored in the
input buffer flipflops
B,,
B,,
B,.
respectively. During the
four clock cycle.
i3,
a,,,
Zi and
ii,
are simultaneously shifted
to
R,,
R,, R, and R,. respectively. With the appropriate
connections among thc input buffer flip-flops
B,
and flip-flops
Rk,
the cyclic shift of
6
=
(uo,
a,. a,,
u,).
i.e..
(r2
=
(a
3'
a
0'
a,,
a,!
is
obtained inR. At the fourth clock pulse
R,,
R,,
R,,
R,
are
also
fed the value
"0".
These four complementary
values of
"I"
introduce the element
1
in
GF(24).
As
it
was discussed in Section
11.
a
parallel-type
CF(?)
Massey-Omura multiplier simultaneously
y1e't.k
four product
components
do.
d,,
d2,
d,.
Therefore. during the next three
clocks three successive multiplicdtions. i.e..
0,
=
1
*
a'.
0,
=
0,
a4
and
0,
=
0,
*
a*
are performed for the irlversioi1.
When the third multiplication is completed.
Ld,
=
I.
Thus
55
the output product digits, which together rrpreient the clock cycles, the circuit
ir.
Fig. 9 allows the
bits
of the next
inverse element
CY-',
are
fed
into the output buffer flip-flops eiei
lent
(following
a)
to
be fed into it and the bits
of
the
B
Finally these are sequentially shifted
from
the inversion previous element to be shifted
out
of
it. simultaneously. This
type
of
circuit provides a full pipeline capability. A
VLSl
circuit.
layout of the pipeline inversion circuitry for
CF(2*)
is pre-
sented
in
Fig.
10.
Figure
11
shows the system structure
of
an
inversion circuit
for
the general finite fieldFG(2m).
k'
The above technique for computing the inverse of an el*
ment
in
CF'(~~)
takes four clock cycles. During these four
Ref
e
fences
1.
Massey,
J.
L.,
and @mura,
J.
K.,
Patent Application
of
Computational Method and
Apparatus
for
Finite Field Arithmetic,
submitted in
198
I.
2.
MacWilliams. F.
J.,
and Sloane, N.
J.
A..
The Theory
of
Emr-Con.ecting Codes,
North-Holland Publishing, New York,
1977.
3.
Peterson,
W.
W., and Weldon,
E.
J.,
Jr.,
Error-Correcting Codes,
MIT Press, Cambridge,
4.
Mead. C., arid Conway,
L.,
Introduction to
VLSI
Systems,
Addison-Wesley, Reading,
1972.
1980.