scispace - formally typeset
Search or ask a question
Journal ArticleDOI

VLSI Architectures for Computing Multiplications and Inverses in GF(2 m )

01 Aug 1985-IEEE Transactions on Computers (IEEE Computer Society)-Vol. 34, Iss: 8, pp 709-717
TL;DR: In this article, a pipeline structure is developed to realize the Massey-Omura multiplier in the finite field GF(2m) with the simple squaring property of the normal basis representation used together with this multiplier.
Abstract: Finite field arithmetic logic is central in the implementation of Reed-Solomon coders and in some cryptographic algorithms. There is a need for good multiplication and inversion algorithms that can be easily realized on VLSI chips. Massey and Omura [1] recently developed a new multiplication algorithm for Galois fields based on a normal basis representation. In this paper, a pipeline structure is developed to realize the Massey-Omura multiplier in the finite field GF(2m). With the simple squaring property of the normal basis representation used together with this multiplier, a pipeline architecture is also developed for computing inverse elements in GF(2m). The designs developed for the Massey-Omura multiplier and the computation of inverse elements are regular, simple, expandable, and therefore, naturally suitable for VLSI implementation.

Summary (1 min read)

1. introduction

  • The conventional method for finding an inverse element in a finite field uses either table look-up or Euclid's Jlgorithms.
  • These rrlethcds are not easily realized in a VLSI circuit.

ORlGlNAL PAGE IS OF POOR QUALITY

  • D niultiplier 2nd two control signals, also known as one parallel-type Massey-Omll.
  • Simple and expandable and, hence, naturallv suit, ble for VLSl implementa tion.

OF POOR QUALITY

  • The equations in (8) define the Massey-Omura multiplier.
  • Compute the rn components of the prod-lct.
  • Ternately, for parallel operation this feature permits the use of m identical !ogic furrctions, f. for calculating simultaneously all components of the product, also known as A.

Massey-Omura Multiplier

  • A deta.'ed design of a Massey-Omura mdtiplier is now developed for the finite field GF(24).
  • Since the mod-2 sum in (12) can be implemented by the "exclusive or" operation (XOR).
  • In the XOR sequential-PLA there are several levels of XORs.
  • D design utiliiing a standard AND-OR PLA to realuefis practical only for small m.
  • These four complementary values of "I" introduce the element 1 in GF(24).

IV. A Pipeline

  • Massey-Omura multiplier simultaneously y1e't.k four product components do.
  • During the next three clocks three successive multiplicdtions.
  • Thus the output product digits, which together rrpreient the clock cycles, the circuit ir.
  • Fig. 9 allows the bits of the next inverse element CY-', are fed into the output buffer flip-flops eiei lent (following a) to be fed into it and the bits of the B Finally these are sequentially shifted from the inversion previous element to be shifted out of it.
  • This type of circuit provides a full pipeline capability.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

N84
13406
TDA
Progress
Report
42-75
July-September
1983
VLSl
Architectures
for
Computing Multiplications
and
Inverses
in
GF(2")
C.
C.
Wang,
T.
K.
Truong,
H.
M.
Shao and
L.
J.
Dectsch
Communications
Sys?ems
Research
Sectlon
J.
Y.
Ovura
Universitv
of
California,
Los
Angeles
I.
S.
Reed
Universitv
of
Southern
California
Finite field arithmetic logic
is
central in the implementation
of
Reed-Solomon coders
and
ik
some cnptoqaphic algorithms. There
is
a
need
for
good multiplication and
inversion clgorithms thct can be easily realized
on
VISI
chips. Massey and Omura
recent& developed
a
new
multiplication algorithm for Galois fie;ds based on a normal
basis representation
In
this paper, a pipeline structure is developed to realize t!:e
hlassey-Omura multiplier in the finite field
CF(19"
).
With the simple squaring property of
the normal-basis representation used togerher with this multiplier, a pipeline architecture
is
also developed
for
computing inverse elements in GF(Zm). The desipis developed
.for
the Masse).-Omura multiplier and the computation
of
inzerse elements are reguk, simple,
expandable and, therefore, natural& suitable
for
VLSI
implementation
1.
introduction
Recently. Masse); and Omura (Ref.
I)
invented a multiplier
which obtains the product
of
two elements in the finite field
GF(2m).
In their invention. they utilize a normal basis of form
{a,
a2. a4..
.
.
,
a2"-']
to represent elements of the field
where
a
is the
roo:
of an irreducible polynomial
of
degrre
m
over
GF(2).
In this basis each element
In
the field
G'fl
Zm,l
can
be reprebented hy
m
binary digits.
In the norrnal-balrs representation the sqiuriny
of
an ele-
ment in
GF(Zm)
is
readily shown
to
be simple cyclic shift (if
its binary digits. Multiplication in the
irurnial
basis representa-
tions requires for any one product digit the same logic cir-
cuitry as it does for any other product digit. Adjacent
product-digit circuits differ only in their inputs which are
cyclicaliy shifted versions of one another. In this paper.
a
pipeline architecture suitable for
VLSl
design
is
developed for
a Massey-Orniira multiplier
on
GF(
Zm
).
The conventional method for finding an inverse element in
a finite field uses either table look-up
or
Euclid's Jlgorithms.
These rrlethcds are not easily realized in a
VLSI
circuit. How-
ever, usirig a Massey-Omura multiplier. a recursive. pipeline.
inverbion circu'!
is
developed.
This
structure consists
of
four
52

ORlGlNAL
PAGE
IS
OF
POOR
QUALITY
sets
of
shift registers. one parallel-type Massey-Omll:d niulti-
plier 2nd two control signals.
Such
a design is regular. simple
and expandable and, hence, naturallv suit, ble for
VLSl
imple-
menta tion.
II.
Squaring and Multiplying
in
a
Normal
Basis Representation
In this section. the work originally described by Massey and
Omura (Ref.
11
is
reviewed. It is well known that there always
exists a normal basis in the finite field
)
(Ref.
2)
for
all
positive integers,
m.
That is. one can find a field element
a
such that
N=
{a,
a2,
a4,.
.
.
,
CYZ(~-')) is a basis set
of
GflP
1.
Thus every field element
E
GF(2,)
can be uniqueiy
expressed as
Thus, if
0
is
represented as
a
vector
of
components of :he
normal basis elements
of
GF(?,)
in the
form
p=
[bo. b,,
b,.-~-,b,~i~,then~2=[b,~i,bo.bi.~
..bm-21.1nthe
normal basis representation
p2
is
a
cyclic shift
of
p.
Hence
squaring in
GF(2"')
can be realired physically by logic cir-
cuitry which accomplishes cyclic shifts in a binary register.
Such squaring circuitry is dlustrated in block form in
Fig.
I.
By
(2)
and
(3)
it is readily seen that
1
=
a+
a'
t
a4
t
. .
.
4-
for any element
a
in
GF(?).
This implies that the
normal basis representation of
I
is(1,
1. 1.
.
. .
.
1).
Letp=(bo, b,;.., bm-i] andy=[c,,c,:
'.c,-~]
be two elements of
CF(2'")
in a normal
bms
representation.
Then the last term
d,-l
of the product,
where
bo,
b,. b,.
.
. . .
bm-i are binary digits and addition is
mod-? addition.
Three useful properties of a finite field
GF(2"
)
are stated
here without proof (for prciofs see, for example, Ref.
2).
These
properties are:
(1)
Squaring
in
GF(?)
:sa linear operation. That is. givzn
any two elements
a
and
P
in
GF(lm).
(2)
For
any element
a
of
GF(?".).
(3)
If
a
is
a root of any irreducible polynomial
P(x)
of
degree
v
c%-:*-
t;F(2),
the powers.
a.
a,.
a4..
.
.
.
az("-;!,
-.re
:,?
YF'
3'
'>
dnd constitute
d
comp!e' set
of
roc\.
P'
Dt-::
With regard
to
p:.->e;:;.
i,:.;
.?'.ceison and Weidon (Ref.
3)
list
a set
of
irreducll
-
i.c!;
r.c.-wds
of
degree
VI
Q
34
over
GF(
?)
for which the
roo:s
La,
c'.
a4,.
. .
.
a~(~-')}
are linearly
independent. These hear iridependent roots clearly form
3
normal basis of
GF(2'").
Suppose that
{a.
a*,
a4
.
.
.
.
a2(m-i)} is a normal basis
of
GF(
2").
By
(2)
and
(3)
the square
of
(1) is
p2
=
bod
+
b,
a4
+
b2a8
+.
.
+?-
b,-2
+
bm-la2m
is
some
binary funciion of the components
of
0
and
y,
i.e.,
-
dm-,
-
f(bo,
b,,
...
,
bm-l;
co.
C1'.
' '
X,-J
(6)
Since squaring means a cyclic shift of an element in a normal
basis representation. one has
62
=
p2
.
yz
J
Hence the last component
d,-,
of
6,
is obtained by the same
function fin
(6)
operation on the components
of
p2
and
7'.
That is.
dm-,=
f(b,-,, bo,
b,;..
.
hm-2:
c,-~,
co.
cl,
.
.
.
.
c,-~).
By squaring
6
repeatedly, it
is
evident that
..

ORIGINAL
PAGE
IS
OF
POOR
QUALITY
The equations in
(8)
define the Massey-Omura multiplier.
In the normal basis representation this multiplier has the
pioperty that the same logic function
f
which is used
to
find
.the last component of
dm-l
of the product
S
can be used to
find sequentially the remaining components
d,-,,
dm-,,
. .
.
,
do
of the product. This featurc of the product
operation requires only one logic functionfof the
2m
compo-
nents of
fl
and
y
to sequential!? compute the
rn
components of
the prod-lct.
By
(10)
and the fact that
a4
=a3
+
I,
one obtains
d,
=
b2c2
+
b,c,
+
b,c,
+
b3c,
+
blc3
+
b3r0
+
boc,
+
blco
+
bocl
d,
=
b,c,
+
b,c,
+
Clc2
+
b,co
+
bOc2
+-
b2c3
+
b,c2
+
boc3
+
b3co
Figure
2
illustrates the logic diagram of the above-desciibed
sequential-type Massey-Omura mdtiplier on
GF(
?).
A:ter-
nately, for parallel operation this feature permits the use of
m
identical !ogic furrctions,
f.
for calculating simultaneously
all
components of the product. In the latter case, the inputs
to
the
rn
logic functions fare connected directly to the compo-
nents of
/3
and
y.
The only difference in the conni-Ltio.
s
to the
components of
0
or
y
to a functionfis that they are cyclically
shifted versions cf one another. Figure
3
shows the structure
of the parallel-type Massey-Omura multiplier for the simple
case of
rn
=
4.
The extension of this type of structure to a
general case
of
GF(Zm)
is straightforward.
d, =
boco
+
b,co
+
bocl
+
blc3
+
b,c,
+
b,c,
+
b2c1
+
b3c2
+
b,c,
do
=
b,c,
+
bor,
+
b,co
+
boc,
+
b2co
+
b
c
+
b,co
+
b2cl
+
b,c,
01
Comparilig
(1 1)
with
(8).
the functionfis given by
111.
A
Pipeline Structure
for
Implementing
fCb,.
b,.
b,,
b,;
cor
el, c,, c,)
Massey-Omura Multiplier
A deta.’ed design of
a
Massey-Omura mdtiplier is now
developed for the finite field GF(24). As illustrated
in
Figs.
2
and
3.
the design of either the seqtiential-type or parallel-type
Massey-Omura multiplier must focus on the product func-
tion
f:
=
b2c2
+
b3c2
+
b2c,
+
b3c1
+
b1c3
-+
b3co
+
bCc3
+
blcO
+
bocl
(12)
The design off begins with the sele2tion of an irreducible
polynomial
P(.xl=
x4
+
x3
+
1
of degree
rn
=
4
over
GF(2).
This particular polynomial function has linearly indcpendent
roots, namely.
a.
a2,
a4
and
a’.
Hence. the set of roots
{a,
a*.
a4,
a’}
constitutes a normal basis of
CF(24).
Any two
eieinents
/3
and
y
in
GF(7,4)
can be expressed as
/3
=
bo
a
f
b,
a’
+
b,
a4
t
b,
a’
(9)
y
=
co
a
+
c1
a2
+
c2
a4
+
c,
a’
Since the mod-2 sum in
(12)
can be implemented by the
“exclusive
or”
operation (XOR). the structure
of
the product
funcrion
f
can be represented by the logic circuit in Fig.
4.
Ths circuit consists of two portions; the left half is an AND
plane which computes each term of
(
12), while the right half is
XOR plane which computes the mod-2 sum. The input5 to the
AND phe are the complements of the components of
/3
and
y.
This is due
to
the fact that the AND operation in the AND
plane is obtained by the NOR operation on the complements
of the two digits being ANDed, Le.,
xy
=
(X
+
7)
where
X
is the
complement
of
X.
A pipeline structure of a Massey-Omura multiplier for
GF(,Z4)
is shown in Fig.
5.
This structure has a sequential type
of operation. For each of the two inputs. corresponding to
/3
and
7.
to
theffunction, an inverter. two sets of shift registers,
B
and
R,
and
11
gate transistors are utilized. Note that regis-
ters
B
and
R
have an identical circuit structure.
By
(4)
the product of
p
and
y
IS
(
In Fig.
5
during thc first three clock cycles. when signal
LD
=
0,
the complements of
5,,
b,,
6,
and
c,,
c,,
c1
are fed
6
=
*
y
=
(bo
a
+
b,
a’
+
b,
a4
+
b,
a’)
*
(cc
a
+
c1
a*
+
c2
u4
+
c3
a’)
=
do
a
+
d,
a2
+
d,
a4
+
d3
a’
54

ORIGINAL
PAGE
rs
OF
POOR
QUALITY
sequenti. Ily into three buffer flipflops
B,
for
(k
=
I,
2,
3).
At
the fourt
I
clock cycle, when
Ld
=
1.
the idlues
of
z,,
b2,
and
7,.
F2,
T,,
previously stored in buffer registers
B,
aild
bo
and
Fo
are ihif!ed into the second set of registeis R, for
(k
=
1,
2.
3,
4).
Then the R-registers are cyclically shifted
Such a cyclic-shift operation is needed to sequentially yield
the product components
d,,
d,.
d,
2nd
do
of
6.
While the
R-registers are cyclically shifting the components of
0
(or
y).
the components of another elemen: in
CF(Z4)
following
0
(or
7)
can be fed into the buffer B-registers. Therefore, the
structure in Fig.
5
provides a pipeline operation in which no
time is lost except for an initial fixed time delay. The VLSl
layout of a Massey-Omura multiplier for
GF(24)
is shown in
Fig.
6.
Figure
7
illustiates a system structure of a pipelined
Massey-Omura multiplier for
GF(zm).
For this ger,eral case
over
GF(2m),
the buffer and the cyclic shift mechanism in
Fig.
7
have
m-
1
and
m
stages, respectively. Each stage con-
sists of a shift register and a gate transistor. The product
function
f
is a mod-2 sum
oi
AND products of the compo-
nents of the two inputs being multiplied. Such
a
circuit for
function fconsists of an AND programmed logic ariay (PLA)
(Ref.4) followed by an XOR sequential-PLA. In the XOR
sequential-PLA there are several levels of XORs. At each level,
the inputs, pair-by-pair, are fed sequentiaiiy one-by-one into
an
XOR
as shown in Fig.
4.
Let
n(j)
be the number of XOR circclits at thej-th level of
the XOR sequential-PLA. Then
n(j
+
1)
=
[n(j)iZ]
where
[x]
is the smallest integer greater than
x
and wbere initially.
n(0)
=
total number of terms to be XORed in product func-
tion
f.
At the last level. there ib only one XOR circuit and the
output
is
the value
off.
In general. it"
k
denotes ?he number
of levels required in the XOR scquential-PLA,
k
=
[log2n(0)j.
It should be noted that as
m
gets large. the mmber
of
mod-:! sums in the functionfbecomes large. In this case. more
XORs and as a consequence more levels iri the XOR sequen-
tial-PLA are required.
To
maximize thz pipeline operation
speed. shift registers are required between the
XOR
levels in
order to store the XOR outputs of the intermediate levels.
Another approach to the realization of product function
is to use a standard ANDGR PLA (Ref. 4). This is possible
since
x
7
v
=
Xy
v
xu
where
v
denotes inclusive OR. In general,
although the design off by the use of such a
PLA
is tedious.
the prodx! function
f
can be accomplished in less than one
clr,:k cycle. One trdde-off
for
such a design is
the
large chip
area required. The required area for such
a
PLA increases
dramatically with
m.
Hence.
d
design utiliiing
a
standard
AND-OR
PLA
to realuefis practical only for small
m.
IV.
A
Pipeline Structure for Computing an
Inverse Element in the Finite Field
W(2m))
For
any
a
in the finite i:eld
GF(2m),
drn
=
a.
Hence the
Let
2m
-
2
be decoiiiposed
as
inverse of
a
is
a-'
=
2
+
2,
+
Z3
+
.
.
.
+
2m-1,
then
a-'
can be expressed
as
2
3
a-'
=
(a').
(a2
)
*
(a2
)
*
. . .
*
(azm-')
(13)
As discussed in Section
11.
if
a
is
represented in a normal basis,
squaring can be realized by a cyclic shift cpeiation.
az'
is the
j-th cyclical shift
(CS)
of
a.
Thus. the inverse e!ement
a-'
can
be obtained by using successive cyclic-shift operations and a
Massey-Omura mu!tiplier. The zlgorithm fc;r
a-'
is the
fol
I
o
w
i
ng
:
(1) Obtain the cyclic shift of
a,
:.e..
a'
=
CS(a)
where
CS
denotes the cylic shift function. Let
B=
CS(a)
and
C=l.Letk=O.
(2)
Multiply
B
2nd
C
to obtain the product.
D=
B
*
C.
Set
(3)
If
k=
m
-
I,
a-'
=
D.
Stop. If
k<m
-
1.
let
B=CS(B)
and
C
=
D.
(4)
Go
back
to
(2).
k=k+l.
Figure
8
shows a flow chart diagram of this procedure.
This recursive algorithm for computing an inverse element
in
GF(2,)
can be realized using the circuit shown in Fig.
9.
In
this circuit the parallel-type Massey-hurd multiplier shown in
Fig.
3
with the circuit for the product function
f
shown in
Fig.
4
is utilized.
To
illustrate.
let
Ld,
and
Ld,
be two control signals with
period of four clock signals as shown in Fig.
9.
Also let the
normal basis representation
of
a
be
(ao,
a,,
a,, a,).
At the end
of the third clock pulse. the valuesa,,
5,.
a
J.
are stored in the
input buffer flipflops
B,,
B,,
B,.
respectively. During the
four clock cycle.
i3,
a,,,
Zi and
ii,
are simultaneously shifted
to
R,,
R,, R, and R,. respectively. With the appropriate
connections among thc input buffer flip-flops
B,
and flip-flops
Rk,
the cyclic shift of
6
=
(uo,
a,. a,,
u,).
i.e..
(r2
=
(a
3'
a
0'
a,,
a,!
is
obtained inR. At the fourth clock pulse
R,,
R,,
R,,
R,
are
also
fed the value
"0".
These four complementary
values of
"I"
introduce the element
1
in
GF(24).
As
it
was discussed in Section
11.
a
parallel-type
CF(?)
Massey-Omura multiplier simultaneously
y1e't.k
four product
components
do.
d,,
d2,
d,.
Therefore. during the next three
clocks three successive multiplicdtions. i.e..
0,
=
1
*
a'.
0,
=
0,
a4
and
0,
=
0,
*
a*
are performed for the irlversioi1.
When the third multiplication is completed.
Ld,
=
I.
Thus
55

the output product digits, which together rrpreient the clock cycles, the circuit
ir.
Fig. 9 allows the
bits
of the next
inverse element
CY-',
are
fed
into the output buffer flip-flops eiei
lent
(following
a)
to
be fed into it and the bits
of
the
B
Finally these are sequentially shifted
from
the inversion previous element to be shifted
out
of
it. simultaneously. This
type
of
circuit provides a full pipeline capability. A
VLSl
circuit.
layout of the pipeline inversion circuitry for
CF(2*)
is pre-
sented
in
Fig.
10.
Figure
11
shows the system structure
of
an
inversion circuit
for
the general finite fieldFG(2m).
k'
The above technique for computing the inverse of an el*
ment
in
CF'(~~)
takes four clock cycles. During these four
Ref
e
fences
1.
Massey,
J.
L.,
and @mura,
J.
K.,
Patent Application
of
Computational Method and
Apparatus
for
Finite Field Arithmetic,
submitted in
198
I.
2.
MacWilliams. F.
J.,
and Sloane, N.
J.
A..
The Theory
of
Emr-Con.ecting Codes,
North-Holland Publishing, New York,
1977.
3.
Peterson,
W.
W., and Weldon,
E.
J.,
Jr.,
Error-Correcting Codes,
MIT Press, Cambridge,
4.
Mead. C., arid Conway,
L.,
Introduction to
VLSI
Systems,
Addison-Wesley, Reading,
1972.
1980.

Citations
More filters
Journal ArticleDOI
TL;DR: The fast algorithm proposed in this paper also uses normal bases, and computes multiplicative inverses iterating multiplications in GF(2 m ).
Abstract: This paper proposes a fast algorithm for computing multiplicative inverses in GF(2 m ) using normal bases. Normal bases have the following useful property: In the case that an element x in GF(2 m ) is represented by normal bases, 2 k power operation of an element x in GF(2 m ) can be carried out by k times cyclic shift of its vector representation. C. C. Wang et al. proposed an algorithm for computing multiplicative inverses using normal bases, which requires ( m − 2) multiplications in GF(2 m ) and ( m − 1) cyclic shifts. The fast algorithm proposed in this paper also uses normal bases, and computes multiplicative inverses iterating multiplications in GF(2 m ). It requires at most 2[log 2 ( m − 1)] multiplications in GF(2 m ) and ( m − 1) cyclic shifts, which are much less than those required in the Wang's method. The same idea of the proposed fast algorithm is applicable to the general power operation in GF(2 m ) and the computation of multiplicative inverses in GF( q m ) ( q = 2 n ).

663 citations

Journal ArticleDOI
TL;DR: This work has applications in crytography and coding theory since a reduction in the complexity of multiplying and exponentiating elements of GF(2n) is achieved for many values of n, some prime.

334 citations

Journal ArticleDOI
01 Jul 1998
TL;DR: A new approach for designing digit-serial/parallel finite field multipliers is presented, where the digit-level array-type algorithm minimizes the latency for one multiplication operation and the parallel architecture inside of each digit cell reduces both the cycle-time as well as the switching activities, hence power consumption.
Abstract: Digit-serial architectures are best suited for systems requiring moderate sample rate and where area and power consumption are critical. This paper presents a new approach for designing digit-serial/parallel finite field multipliers. This approach combines both array-type and parallel multiplication algorithms, where the digit-level array-type algorithm minimizes the latency for one multiplication operation and the parallel architecture inside of each digit cell reduces both the cycle-time as well as the switching activities, hence power consumption. By appropriately constraining the feasible primitive polynomials, the mod p(x) operation involved in finite field multiplication can be performed in a more efficient way. As a result, the computation delay and energy consumption of one finite field multiplication using the proposed digit-serial/parallel architectures are significantly less than of those obtained by folding the parallel semi-systolic multipliers. Furthermore, their energy-delay products are reduced by a even larger percentage. Therefore, the proposed digit-serial/parallel architectures are attractive for both low-energy and high-performance applications.

251 citations

Journal ArticleDOI
TL;DR: A configuration of parallel multipliers for GF (2 m) based on irreducible AOPs and ESPs based on canonical bases is presented and it is shown a necessary and sufficient condition for ESPs to be irReducible over GF ( 2) and the uniqueness of the irredUCible ESPs overGF (2).
Abstract: This paper presents a configuration of parallel multipliers for GF (2 m ) based on canonical bases. The possible parallel multipliers by the proposed configuration are limited to a class of fields GF (2 m ). However they can be constructed by O(m 2 ) AND-gates and O(m 2 ) EOR-gates with the structural modularity (this is a desirable feature for the hardware implementation), and their operation time is about (log m ) T , where m is the dimension of GF (2 m ) and T is the delay time of an EOR-gate. In order to construct such parallel multipliers, we define two types of polynomials of special form over GF (2), one is called all one polynomial (denoted by AOP) and the other is called equally spaced polynomial (denoted by ESP). Furthermore, we show a necessary and sufficient condition for ESPs to be irreducible over GF (2) and the uniqueness of the irreducible ESPs over GF (2). Finally, we propose the configuration of parallel multipliers for a class of fields GF (2 m ) based on irreducible AOPs and ESPs over GF (2).

215 citations

Journal ArticleDOI
TL;DR: A new low-complexity bit-parallel canonical basis multiplier for the field GF(2m) generated by an all-one-polynomial is presented and extended to obtain a new bit-Parallel normal basis multiplier.
Abstract: We present a new low-complexity bit-parallel canonical basis multiplier for the field GF(2m) generated by an all-one-polynomial. The proposed canonical basis multiplier requires m/sup 2/-1 XOR gates and m/sup 2/ AND gates. We also extend this canonical basis multiplier to obtain a new bit-parallel normal basis multiplier.

205 citations

References
More filters
Book
01 Jan 1968
TL;DR: This chapter discusses Coding for Discrete Sources, Techniques for Coding and Decoding, and Source Coding with a Fidelity Criterion.
Abstract: Communication Systems and Information Theory. A Measure of Information. Coding for Discrete Sources. Discrete Memoryless Channels and Capacity. The Noisy-Channel Coding Theorem. Techniques for Coding and Decoding. Memoryless Channels with Discrete Time. Waveform Channels. Source Coding with a Fidelity Criterion. Index.

6,684 citations

Book
01 Jan 1978

2,993 citations

Journal ArticleDOI
Kung1
TL;DR: The basic principle of systolic architectures is reviewed and it is explained why they should result in cost-effective, highperformance special-purpose systems for a wide range of problems.
Abstract: f High-performance, special-purpose computer systems are typically used to meet specific application requirements or to off-load computations that are especially taxing to general-purpose computers. As hardware cost and size continue to drop and processing requirements become well-understood in areas such as signal and image processing, more special-purpose systems are being constructed. However, since most of these systems are built on an ad hoc basis for specific tasks, methodological work in this area is rare. Because the knowledge gaited from individual experiences is neither accumulated nor properly organized, the same errors are repeated. I/O and computation imbalance is a notable example-often, the fact that I/O interfaces cannot keep up with device speed is discovered only after constructing a high-speed, special-purpose device. We intend to help correct this ad hoc approach by providing a general guideline-specifically, the concept of systolic architecture, a general methodology for mapping high-level computations into hardware structures. In a systolic system, data flows from the computer memcory in a rhythmic fashion, passing through many processing elements before it returns to memory, much as blood circulates to and from the heart. The system works like an autombbile assembly line where different people work on the same car at different times and many cars are assembled simultaneously. An assembly line is always linear, however, and systolic systems are sometimes two-dimensional. They can be rectangular, triangular, or hexagonal to make use of higher degrees of parallelism. Moreover, to implement a variety of computations, data flow in a systolic system may be at multiple speeds in multiple directions-both inputs and (partial) results flow, whereas only results flow in classical pipelined systems. Generally speaking, a systolic system is easy to implement because of its regularity and easy to reconfigure (to meet various outside constraints) because of its modularity. The systolic architectural concept was developed at Carnegie-Mellon University,'17 and versions of systolic processors are being designed and built by several industrial and governmental organizations.840 This article reviews the basic principle of systolic architectures and explains why they should result in cost-effective, highperformance special-purpose systems for a wide range of problems.

2,319 citations

Patent
14 Sep 1982
TL;DR: In this article, the GF(2 m ) elements are represented by a vector of m binary digits in such a way that multiplication can be performed by using the same logic function to compute each binary component of the product of two elements, and addition can be formed by logic circuitry that forms the modulo-two sum of the corresponding components of the two vectors representing the elements to be summed.
Abstract: Elements of the finite field GF(2 m ) are represented by a vector of m binary digits in such a way that multiplication can be performed by using the same logic function to compute each binary component of the product of two elements, squaring can be performed by logic circuitry that rotates the vector representing the element to be squared, and addition can be performed by logic circuitry that forms the modulo-two sum of the corresponding components of the two vectors representing the elements to be summed.

335 citations

Journal ArticleDOI
TL;DR: Two systolic architectures are developed for performing the product–sum computation AB + C in the finite field GF( 2m) of 2melements, where A, B, and C are arbitrary elements of GF(2m).
Abstract: Two systolic architectures are developed for performing the product–sum computation AB + C in the finite field GF(2m) of 2melements, where A, B, and C are arbitrary elements of GF(2m). The first multiplier is a serial-in, serial-out one-dimensional systolic array, while the second multiplier is a parallel-in, parallel-out two-dimensional systolic array. The first multiplier requires a smaller number of basic cells than the second multiplier. The second multiplier heeds less average time per computation than the first multiplier if a number of computations are performed consecutively. To perform single computations both multipliers require the same computational time. In both cases the architectures are simple and regular and possess the properties of concurrency and modularity. As a consequence they are well suited for use in VLSI systems.

222 citations