scispace - formally typeset
Open AccessProceedings ArticleDOI

Hardware implementation of an elliptic curve processor over GF(p)

Reads0
Chats0
TLDR
A hardware implementation of an arithmetic processor which is efficient for bit-lengths suitable for both commonly used types of public key cryptography (PKC) and RSA cryptosystems is described.
Abstract
We describe a hardware implementation of an arithmetic processor which is efficient for bit-lengths suitable for both commonly used types of public key cryptography (PKC), i.e., elliptic curve (EC) and RSA cryptosystems. Montgomery modular multiplication in a systolic array architecture is used for modular multiplication. The processor consists of special operational blocks for Montgomery modular multiplication, modular addition/subtraction, EC point doubling/addition, modular multiplicative inversion, EC point multiplier, projective to affine coordinates conversion and Montgomery to normal representation conversion.

read more

Content maybe subject to copyright    Report

PDF hosted at the Radboud Repository of the Radboud University
Nijmegen
The following full text is a preprint version which may differ from the publisher's version.
For additional information about this publication click this link.
http://repository.ubn.ru.nl/handle/2066/127482
Please be advised that this information was generated on 2022-08-10 and may be subject to
change.

Hardware Implementation of an Elliptic Curve Processor
over GF (p)
Sıddıka Berna
¨
Ors
1
,LejlaBatina
1,2
, Bart Preneel
1
, Joos Vandewalle
1
1
Katholieke Universiteit Leuven, ESAT/SCD-COSIC
Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium
{sbors, Lejla.Batina, Bart.Preneel, Joos.Vandewalle}@esat.kuleuven.ac.be
2
SafeNet BV
Boxtelseweg 26a, 5261 NE Vught, The Netherlands
Abstract
This paper describes a hardware implementation of an arithmetic processor which is ef-
ficient for bit-lengths suitable for both commonly used types of Public Key Cryptography
(PKC), i.e., Elliptic Curve (EC) and RSA Cryptosystems. Montgomery modular multi-
plication in a systolic array architecture is used for modular multiplication. The processor
consists of special operational blocks for Montgomery Modular Multiplication, modular ad-
dition/subtraction, EC Point doubling/addition, modular multiplicative inversion, EC point
multiplier, projective to affine coordinates conversion and Montgomery to normal represen-
tation conversion.
Keywords: Elliptic Curve Cryptosystems, Modular Operations, FPGA
1 Introduction
Elliptic Curve Cryptography (ECC) was proposed independently by Miller [13] and
Koblitz [7] in the 80’s. Since then a considerable amount of research has been performed
on secure and efficient ECC implementations. The benefits of ECC, when compared with
classical cryptosystems such as RSA [19], include: higher speed, lower power consumption
and smaller certificates, which are especially useful for wireless applications.
The performance of an elliptic curve cryptosystem and of other public key cryptosystems,
is mostly determined by the efficient implementation of finite field arithmetic. In this work
a hardware architecture of a processor for ECC over finite field GF (p)ispresented. The
most critical operation for latency is modular multiplication. We use our systolic array
multiplier based on Montgomery’s Modular Multiplication (MMM) algorithm [14] which is
proposed in [16]; this multiplier is proven to be very efficient for modular exponentiation
as the basic operation for RSA cryptosystems [1].
The processor consists of special operational blocks for MMM, modular addition/subtrac-
tion (MAS), EC point doubling/addition, modular multiplicative inversion, EC point multi-
plier, projective to affine coordinates conversion and Montgomery to normal representation
conversion. Hence it can be programmed by the host to execute any of these operations in
any order. It is possible to use the proposed processor not only for ECC, but also for any
system that modular arithmetic operations are essential for, such as the RSA cryptosystem.
1

The basic operations are MMM and MAS. The other blocks include a nite state ma-
chines (FSMs) which controls the execution of these operations in the right order. The
critical path depends only on the critical path of circuits for MMM and MAS. The archi-
tecture of these blocks is designed to ensure a short critical path to allow for high clock
frequencies which are independent from bit-length of the parameters of ECC. For simplic-
ity, all blocks were designed separately with their own FSMs. This allows for independent
optimization and testing of the building blocks.
The remainder of this paper is organized as follows. In Section 2 we discuss the related
work. Section 3 provides the mathematical background for Montgomery Multiplication
Method (MMM) and ECC over GF (p). Section 4 describes the hardware implementation;
some details are omitted due to space limitation. Section 5 concludes the paper.
2 Previous Work
To the best of our knowledge, the first documented ECC processor over elds GF (p)
is proposed by Orlando and Paar [15]. The Elliptic Curve Processor (ECP) is scalable
in terms of area and speed and especially suited for FPGAs. The authors estimate that
it would take 3 ms to compute one 192-bit point multiplication. However, this superb
timing was estimated by assuming 100% throughput from the multiplier. The expected
latency was not considered. Their multiplier is also based on the MMM algorithm but it is
a generalized version with quotient pipelining introduced by Orup in [17]. We use the basic
MMM algorithm from which we only exclude the modular reduction as a result of the bound
adjustment. In this way no pre-computation is required which results substantial memory
saving. Their multiplier has a semi-systolic architecture while the multiplier presented here
is fully systolic. This results in an important flexibility which is unrelated to any specific
parameter choice. Orlando and Paar also used an adaptation of a xed base exponentiation
method as introduced by Brickell et al. in [3]. This algorithm is assumed to be 4 times
faster than standard double-and-add algorithm which is used here. However, it involves a
known point calculation which is a limiting factor with respect to various applications of
ECC.
Wolkerstorfer proposes a dual-field arithmetic unit that offers all instructions required
for both types of finite fields: GF (p)andGF (2
m
) in [22]. He uses a redundant number
representation and a special multiplication with interleaved modular reduction. Inversion
is performed by the Extended Euclidean Algorithm. This is a low-power architecture that
can be realized on moderate silicon area; the author claims that it requires just a little
more hardware resources than for a pure GF (p) multiplier.
Goodman and Chandrakasan proposed a domain-specific reconfigurable cryptographic
processor (DSRCP) in [6]. The instruction set definition of the DSRCP was dictated by the
IEEE 1363 Public Key Cryptography Standard document. A list of the arithmetic functions
required to implement the various primitives defined in the standard was tabulated in a
functional matrix, which was then used to define the instruction set architecture (ISA) of
the processor. The ISA contains 24 instructions broken up into six types of operations:
conventional arithmetic, modular integer arithmetic, GF arithmetic, elliptic curve field
arithmetic over GF, register manipulation and processor configuration.
2

3 Mathematical background
3.1 Elliptic curves over GF (p)
An elliptic curve E is often expressed in terms of the Weierstrass equation: y
2
= x
3
+ax+
b,wherea, b GF (p)with4a
3
+27b
2
=0(modp). The inverse of the point P =(x
1
,y
1
)is
P =(x
1
, y
1
). The sum P + Q of the points P =(x
1
,y
1
)andQ =(x
2
,y
2
) (assume that
P, Q = O,andP = ±Q)isthepointR =(x
3
,y
3
)where: λ =
y
2
y
1
x
2
x
1
,x
3
= λ
2
x
1
x
2
,y
3
=
(x
1
x
3
)λ y
1
.
For P = Q, the “doubling” formulae are: λ =
3x
2
1
+a
2y
1
,x
3
= λ
2
2x
1
,y
3
=(x
1
x
3
)λy
1
.
The point at infinity O plays a role analogous to that of the number 0 in ordinary
addition. Thus, P +O = P and P +(P )=O for all points P . The points on elliptic curve
together with the operation of “addition” form an abelian group. Then it is straightforward
to introduce the point or scalar multiplication as main operation for ECC. This operation
can be calculated by using double-and-add algorithm as shown in Algorithm 1. For details
see[13,7,2].
Algorithm 1 Elliptic Curve Point Multiplication
Requir e: EC point P =(x, y), integer k,0<k<M, k =(k
l1
,k
l2
, ··· ,k
0
)
2
,
k
l1
=1andM
Ensure: Q =(x
,y
)
1: Q P
2: for i from l 2downto0do
3: Q 2Q
4: if k
i
=1then
5: Q Q + P
6: end if
7: end for
In the above definition of EC group affine coordinates are used, but so-called projective
coordinates have some implementation advantages. The point addition can be done in
projective coordinates using almost only field multiplications. Only one inversion is needed
at the end of a point multiplication operation. We have used the modified Jacobian (J
m
)
coordinates as proposed by Cohen et al. in [5] because EC point doubling is fastest in
this representation. They represent internally the Jacobian coordinates as a quadruple
X, Y, Z, aZ
4
. This representation is called modified Jacobian coordinate system and
denoted by the authors as J
m
. The algorithms for EC point addition and doubling are as
follows [5].
Let P =
X
1
,Y
1
,Z
1
,aZ
4
1
, Q =
X
2
,Y
2
,Z
2
,aZ
4
2
and P + Q = R =
X
3
,Y
3
,Z
3
,aZ
4
3
.
The addition formulas in J
m
are the following (P = ±Q).
U
1
=X
1
Z
2
2
,U
2
=X
2
Z
2
1
,S
1
=Y
1
Z
3
2
,S
2
=Y
2
Z
3
1
,H= U
2
U
1
,r=S
2
S
1
X
3
=H
3
2U
1
H
2
+r
2
,Y
3
=S
1
H
3
+r
U
1
H
2
X
3
,Z
3
=Z
1
Z
2
H, aZ
4
3
=aZ
4
3
(1)
The doubling formulas in J
m
are the following (R =2P ).
S =4X
1
Y
2
1
,U=8Y
4
1
,M=3X
2
1
+
aZ
4
1
X
3
=2S +M
2
,Y
3
=M(SX
3
)U, Z
3
=2Y
1
Z
1
,aZ
4
3
=2U
aZ
4
1
(2)
3

3.2 Montgomery Modular Multiplication
The Montgomery product is defined as: Mont(x, y)=xyR
1
mod N,whereN =
(n
l1
···n
1
n
0
)
b
,0 x, y < N, R = b
l
, b =2
α
with gcd(N,b)=1.
Montgomery’s method for multiplying two integers x and y (called N-residues) modulo
N, avoids trial division by N which is the most expensive operation in hardware. The
Montgomery representation of x Z
N
is xR mod N and it allows very efficient modular
arithmetic especially for multiplication [14].
The original proposal of Montgomery had a conditional subtraction included at the end
of the algorithm. For efficiency as well as resistance against side-channel attacks [9, 10] a
bound for R is given as 4N<Rto avoid this subtraction by Walter in [21]. This bound
guarantees that for inputs X, Y < 2N the output is also bounded by T<2N.
We will take α = 1 for simplicity and make the iteration starting from Step 2 execute l+2
times instead of l times as in the original proposal. By these changes the desired bound
is achieved as 4N<R=2
l+2
. Algorithm 2 is the algorithm for Montgomery modular
multiplication without final subtraction which has the properties given above.
Algorithm 2 Montgomery modular multiplication without final subtraction
Requir e: Integers N =(n
l1
···n
1
n
0
)
2
, x =(x
l
···x
1
x
0
)
2
, y =(y
l
···y
1
y
0
)
2
with x [0, 2N 1],y
[0, 2N 1], R =2
l+2
, gcd(N,2) = 1 and N
= N
1
mod 2 (Notation T =(t
l+1
t
l
...t
0
))
Ensure: T = xyR
1
mod 2N
1: T 0
2: for i from 0 to l +1do
3: m
i
t
0
x
i
y
0
4: T (T + x
i
y + m
i
N)/2
5: end for
All the operations will be done modulo 2N through EC point multiplication. The last
step is to convert the Montgomery representation of the coordinates of the resulting point
back to the normal representation. This is done by calculating the Montgomery modular
multiplication of the coordinates and 1, Mont(xR, 1) = xRR
1
= x. It can be easily proved
that Mont(T,1) N ,if0 T<2N.
4 Hardware Implementation
Our Elliptic Curve processor (ECP) can be divided into 5 levels hierarchically as shown
in Fig. 1.
The operation blocks on each level from top to bottom are as follows:
Level 1: Main Controller (MC)
Level 2:
1. Affine to projective coordinates converter (AtoP): (x, y) (X,Y, Z, aZ
4
)such
that X = x, Y = y, Z =1andaZ
4
= a
2. Normal to Montgomery representation converter (NtoM)
3. EC point multiplier (EPM)
4. Projective to affine coordinates converter (PtoA)
5. Montgomery to normal representation converter (MtoN)
4

Citations
More filters
Book ChapterDOI

Power-Analysis Attacks on an FPGA--First Experimental Results

TL;DR: This paper is the first to describe a setup to conduct power-analysis attacks on FPGAs, and provides strong evidence that implementations of elliptic curve cryptosystems without specific countermeasures are indeed vulnerable to simple power- analysis attacks.
Journal ArticleDOI

Hardware Elliptic Curve Cryptographic Processor Over $rm GF(p)$

TL;DR: A novel hardware architecture for elliptic curve cryptography (ECC) over GF(p) is introduced, based on a new unified modular inversion algorithm that offers considerable improvement over previous ECC techniques that use Fermat's Little Theorem for this operation.
Journal ArticleDOI

High-speed hardware implementations of Elliptic Curve Cryptography: A survey

TL;DR: In this survey, techniques for implementing Elliptic Curve Cryptography at a high-speed are explored and a classification of the work available in the open literature in function of the level of efficiency and flexibility is proposed.
Proceedings ArticleDOI

Electromagnetic Analysis Attack on an FPGA Implementation of an Elliptic Curve Cryptosystem

TL;DR: This paper presents simple (SEMA) and differential (DEMA) electromagnetic analysis attacks on an FPGA implementation of an elliptic curve processor and demonstrates that a correlation analysis requires 1000 measurements to find the key bits.
Journal ArticleDOI

An RNS Implementation of an $F_{p}$ Elliptic Curve Point Multiplier

TL;DR: A hardware architecture of an elliptic Curve point multiplier is proposed that exploits the intrinsic parallelism of the residue number system (RNS), in order to speed up the elliptic curve point calculations and minimize the area complexity of the elliptIC curve point multiplier.
References
More filters
Journal ArticleDOI

A method for obtaining digital signatures and public-key cryptosystems

TL;DR: An encryption method is presented with the novel property that publicly revealing an encryption key does not thereby reveal the corresponding decryption key.
Book

Handbook of Applied Cryptography

TL;DR: A valuable reference for the novice as well as for the expert who needs a wider scope of coverage within the area of cryptography, this book provides easy and rapid access of information and includes more than 200 algorithms and protocols.
Book ChapterDOI

Differential Power Analysis

TL;DR: In this paper, the authors examine specific methods for analyzing power consumption measurements to find secret keys from tamper resistant devices. And they also discuss approaches for building cryptosystems that can operate securely in existing hardware that leaks information.
Journal ArticleDOI

Elliptic curve cryptosystems

TL;DR: The question of primitive points on an elliptic curve modulo p is discussed, and a theorem on nonsmoothness of the order of the cyclic subgroup generated by a global point is given.
Book ChapterDOI

Use of Elliptic Curves in Cryptography

TL;DR: In this paper, an analogue of the Diffie-Hellmann key exchange protocol was proposed, which appears to be immune from attacks of the style of Western, Miller, and Adleman.
Frequently Asked Questions (16)
Q1. What is the architecture of the proposed processor?

The processor consists of special operational blocks for MMM, modular addition/subtraction (MAS), EC point doubling/addition, modular multiplicative inversion, EC point multiplier, projective to affine coordinates conversion and Montgomery to normal representation conversion. 

This paper describes a hardware implementation of an arithmetic processor which is efficient for bit-lengths suitable for both commonly used types of Public Key Cryptography ( PKC ), i. e., Elliptic Curve ( EC ) and RSA Cryptosystems. The processor consists of special operational blocks for Montgomery Modular Multiplication, modular addition/subtraction, EC Point doubling/addition, modular multiplicative inversion, EC point multiplier, projective to affine coordinates conversion and Montgomery to normal representation conversion. 

The architecture of these blocks is designed to ensure a short critical path to allow for high clock frequencies which are independent from bit-length of the parameters of ECC. 

MC instructs, NtoM to start conversion from normal to Montgomery representation, EPM to start point multiplication, PtoA to start conversion from projective to affine coordinates and MtoN to start a conversion from Montgomery to normal representation one after another by setting START-NtoM, START-PM, START-PtoA and START-MtoN signals, respectively. 

The performance of an elliptic curve cryptosystem and of other public key cryptosystems, is mostly determined by the efficient implementation of finite field arithmetic. 

Montgomery’s method for multiplying two integers x and y (called N -residues) modulo N , avoids trial division by N which is the most expensive operation in hardware. 

Because there are both MMMC and modular addition/subtraction (MAS) circuits available, these operations can be executed in parallel. 

The j-th digit of Ti is obtained using the recurrence relation22 × c1i,j + 2 × c0i,j + ti,j = ti−1,j+1 + xi × yj + mi × nj + 2 × c1i,j−1 + c0i,j−1 (4)i = 0, · · · , l + 1, j = 0, · · · , l + 1, c1i,−1 = 0 and c0i,−1 = 0. In Eq. (4), 2 × c1i,j + c0i,j , j = −1, · · · , l, denotes the carry chain up the adder. 

The addition formulas in Jm are the following (P = ±Q).U1 =X1Z22 , U2 =X2Z 2 1 , S1 =Y1Z 3 2 , S2 =Y2Z 3 1 , H =U2−U1, r=S2−S1X3 =−H3−2U1H2+r2, Y3 =−S1H3+r ( U1H 2−X3 ) , Z3 =Z1Z2H, aZ43 =aZ 4 3(1)The doubling formulas in Jm are the following (R = 2P ).S =4X1Y 21 , U =8Y 4 1 , M =3X 2 1 + ( aZ41 )X3 =−2S+M2, Y3 =M(S−X3)−U, Z3 =2Y1Z1, aZ43 =2U ( aZ41 ) (2)The Montgomery product is defined as: Mont(x, y) = xyR−1 mod N , where N = (nl−1 · · ·n1n0)b, 0 ≤ x, y < N , R = bl, b = 2α with gcd(N, b) = 1. 

This is a low-power architecture that can be realized on moderate silicon area; the author claims that it requires just a little more hardware resources than for a pure GF (p) multiplier. 

Algorithm 4 Modular addition and subtraction Require: M , 0 ≤ A < M , 0 ≤ B < M Ensure: C = A + B mod M 1: C′ = A + B 2: C′′ = C′ −M 3: if C′′ < 0 then 4: C = C′5: else 6: C = C′′ 7: end ifRequire: M , 0 ≤ A < M , 0 ≤ B < M Ensure: C = A−B mod M 1: C′ = A−B 2: C′′ = C′ + M 3: if C′ < 0 then 4: C = C′′5: else 6: C = C′ 7: end if 

Algorithm 2 is the algorithm for Montgomery modular multiplication without final subtraction which has the properties given above. 

Wolkerstorfer proposes a dual-field arithmetic unit that offers all instructions required for both types of finite fields: GF (p) and GF (2m) in [22]. 

Modular addition and subtraction of two numbers that are in Montgomery representation will produce the Montgomery representation of the sum or difference as xR mod M ± yR mod M = (x±y)R mod M . 

The authors have used the modified Jacobian (Jm) coordinates as proposed by Cohen et al. in [5] because EC point doubling is fastest in this representation. 

This conversion requires two additional execution of the MMM operation with the inputs xR and 1, then yR and 1, as x = Mont(xR, 1) = xRR−1, y = Mont(yR, 1) = yRR−1.