What is the affine to projective coordinates converter?

MC instructs, NtoM to start conversion from normal to Montgomery representation, EPM to start point multiplication, PtoA to start conversion from projective to affine coordinates and MtoN to start a conversion from Montgomery to normal representation one after another by setting START-NtoM, START-PM, START-PtoA and START-MtoN signals, respectively.

Why is PtoA able to perform a point multiplication in parallel?

Because there are both MMMC and modular addition/subtraction (MAS) circuits available, these operations can be executed in parallel.

what is the j-th digit of ti?

The j-th digit of Ti is obtained using the recurrence relation22 × c1i,j + 2 × c0i,j + ti,j = ti−1,j+1 + xi × yj + mi × nj + 2 × c1i,j−1 + c0i,j−1 (4)i = 0, · · · , l + 1, j = 0, · · · , l + 1, c1i,−1 = 0 and c0i,−1 = 0. In Eq. (4), 2 × c1i,j + c0i,j , j = −1, · · · , l, denotes the carry chain up the adder.

what is the doubling formula in jm?

The addition formulas in Jm are the following (P = ±Q).U1 =X1Z22 , U2 =X2Z 2 1 , S1 =Y1Z 3 2 , S2 =Y2Z 3 1 , H =U2−U1, r=S2−S1X3 =−H3−2U1H2+r2, Y3 =−S1H3+r ( U1H 2−X3 ) , Z3 =Z1Z2H, aZ43 =aZ 4 3(1)The doubling formulas in Jm are the following (R = 2P ).S =4X1Y 21 , U =8Y 4 1 , M =3X 2 1 + ( aZ41 )X3 =−2S+M2, Y3 =M(S−X3)−U, Z3 =2Y1Z1, aZ43 =2U ( aZ41 ) (2)The Montgomery product is defined as: Mont(x, y) = xyR−1 mod N , where N = (nl−1 · · ·n1n0)b, 0 ≤ x, y < N , R = bl, b = 2α with gcd(N, b) = 1.

What is the j-th digit of the adder?

Algorithm 4 Modular addition and subtraction Require: M , 0 ≤ A < M , 0 ≤ B < M Ensure: C = A + B mod M 1: C′ = A + B 2: C′′ = C′ −M 3: if C′′ < 0 then 4: C = C′5: else 6: C = C′′ 7: end ifRequire: M , 0 ≤ A < M , 0 ≤ B < M Ensure: C = A−B mod M 1: C′ = A−B 2: C′′ = C′ + M 3: if C′ < 0 then 4: C = C′′5: else 6: C = C′ 7: end if

What is the doubling formula in jm?

Algorithm 2 is the algorithm for Montgomery modular multiplication without final subtraction which has the properties given above.

What is the simplest way to solve a finite field?

Wolkerstorfer proposes a dual-field arithmetic unit that offers all instructions required for both types of finite fields: GF (p) and GF (2m) in [22].

What is the simplest representation of the sum of two numbers?

Modular addition and subtraction of two numbers that are in Montgomery representation will produce the Montgomery representation of the sum or difference as xR mod M ± yR mod M = (x±y)R mod M .

What is the conversion of the input point to a projective coordinate?

This conversion requires two additional execution of the MMM operation with the inputs xR and 1, then yR and 1, as x = Mont(xR, 1) = xRR−1, y = Mont(yR, 1) = yRR−1.

(Open Access) Hardware implementation of an elliptic curve processor over GF(p) (2003) | S.B. Ors

Q: What is the architecture of the proposed processor?

The processor consists of special operational blocks for MMM, modular addition/subtraction (MAS), EC point doubling/addition, modular multiplicative inversion, EC point multiplier, projective to affine coordinates conversion and Montgomery to normal representation conversion.

Q: What are the contributions in this paper?

This paper describes a hardware implementation of an arithmetic processor which is efficient for bit-lengths suitable for both commonly used types of Public Key Cryptography ( PKC ), i. e., Elliptic Curve ( EC ) and RSA Cryptosystems. The processor consists of special operational blocks for Montgomery Modular Multiplication, modular addition/subtraction, EC Point doubling/addition, modular multiplicative inversion, EC point multiplier, projective to affine coordinates conversion and Montgomery to normal representation conversion.

Q: What is the architecture of the blocks?

The architecture of these blocks is designed to ensure a short critical path to allow for high clock frequencies which are independent from bit-length of the parameters of ECC.

Q: What is the performance of an elliptic curve cryptosystem?

The performance of an elliptic curve cryptosystem and of other public key cryptosystems, is mostly determined by the efficient implementation of finite field arithmetic.

Q: What is the expensive operation in hardware?

Montgomery’s method for multiplying two integers x and y (called N -residues) modulo N , avoids trial division by N which is the most expensive operation in hardware.

Q: What is the advantage of the Extended Euclidean Algorithm?

This is a low-power architecture that can be realized on moderate silicon area; the author claims that it requires just a little more hardware resources than for a pure GF (p) multiplier.

PDF hosted at the Radboud Repository of the Radboud University

Nijmegen

The following full text is a preprint version which may differ from the publisher's version.

For additional information about this publication click this link.

http://repository.ubn.ru.nl/handle/2066/127482

Please be advised that this information was generated on 2022-08-10 and may be subject to

change.

Hardware Implementation of an Elliptic Curve Processor

over GF (p)

Sıddıka Berna

Ors

,LejlaBatina

1,2

, Bart Preneel

, Joos Vandewalle

Katholieke Universiteit Leuven, ESAT/SCD-COSIC

Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium

{sbors, Lejla.Batina, Bart.Preneel, Joos.Vandewalle}@esat.kuleuven.ac.be

SafeNet BV

Boxtelseweg 26a, 5261 NE Vught, The Netherlands

Abstract

This paper describes a hardware implementation of an arithmetic processor which is ef-

ﬁcient for bit-lengths suitable for both commonly used types of Public Key Cryptography

(PKC), i.e., Elliptic Curve (EC) and RSA Cryptosystems. Montgomery modular multi-

plication in a systolic array architecture is used for modular multiplication. The processor

consists of special operational blocks for Montgomery Modular Multiplication, modular ad-

dition/subtraction, EC Point doubling/addition, modular multiplicative inversion, EC point

multiplier, projective to aﬃne coordinates conversion and Montgomery to normal represen-

tation conversion.

Keywords: Elliptic Curve Cryptosystems, Modular Operations, FPGA

1 Introduction

Elliptic Curve Cryptography (ECC) was proposed independently by Miller [13] and

Koblitz [7] in the 80’s. Since then a considerable amount of research has been performed

on secure and eﬃcient ECC implementations. The beneﬁts of ECC, when compared with

classical cryptosystems such as RSA [19], include: higher speed, lower power consumption

and smaller certiﬁcates, which are especially useful for wireless applications.

The performance of an elliptic curve cryptosystem and of other public key cryptosystems,

is mostly determined by the eﬃcient implementation of ﬁnite ﬁeld arithmetic. In this work

a hardware architecture of a processor for ECC over ﬁnite ﬁeld GF (p)ispresented. The

most critical operation for latency is modular multiplication. We use our systolic array

multiplier based on Montgomery’s Modular Multiplication (MMM) algorithm [14] which is

proposed in [16]; this multiplier is proven to be very eﬃcient for modular exponentiation

as the basic operation for RSA cryptosystems [1].

The processor consists of special operational blocks for MMM, modular addition/subtrac-

tion (MAS), EC point doubling/addition, modular multiplicative inversion, EC point multi-

plier, projective to aﬃne coordinates conversion and Montgomery to normal representation

conversion. Hence it can be programmed by the host to execute any of these operations in

any order. It is possible to use the proposed processor not only for ECC, but also for any

system that modular arithmetic operations are essential for, such as the RSA cryptosystem.

The basic operations are MMM and MAS. The other blocks include a ﬁnite state ma-

chines (FSMs) which controls the execution of these operations in the right order. The

critical path depends only on the critical path of circuits for MMM and MAS. The archi-

tecture of these blocks is designed to ensure a short critical path to allow for high clock

frequencies which are independent from bit-length of the parameters of ECC. For simplic-

ity, all blocks were designed separately with their own FSMs. This allows for independent

optimization and testing of the building blocks.

The remainder of this paper is organized as follows. In Section 2 we discuss the related

work. Section 3 provides the mathematical background for Montgomery Multiplication

Method (MMM) and ECC over GF (p). Section 4 describes the hardware implementation;

some details are omitted due to space limitation. Section 5 concludes the paper.

2 Previous Work

To the best of our knowledge, the ﬁrst documented ECC processor over ﬁelds GF (p)

is proposed by Orlando and Paar [15]. The Elliptic Curve Processor (ECP) is scalable

in terms of area and speed and especially suited for FPGAs. The authors estimate that

it would take 3 ms to compute one 192-bit point multiplication. However, this superb

timing was estimated by assuming 100% throughput from the multiplier. The expected

latency was not considered. Their multiplier is also based on the MMM algorithm but it is

a generalized version with quotient pipelining introduced by Orup in [17]. We use the basic

MMM algorithm from which we only exclude the modular reduction as a result of the bound

adjustment. In this way no pre-computation is required which results substantial memory

saving. Their multiplier has a semi-systolic architecture while the multiplier presented here

is fully systolic. This results in an important ﬂexibility which is unrelated to any speciﬁc

parameter choice. Orlando and Paar also used an adaptation of a ﬁxed base exponentiation

method as introduced by Brickell et al. in [3]. This algorithm is assumed to be 4 times

faster than standard double-and-add algorithm which is used here. However, it involves a

known point calculation which is a limiting factor with respect to various applications of

ECC.

Wolkerstorfer proposes a dual-ﬁeld arithmetic unit that oﬀers all instructions required

for both types of ﬁnite ﬁelds: GF (p)andGF (2

) in [22]. He uses a redundant number

representation and a special multiplication with interleaved modular reduction. Inversion

is performed by the Extended Euclidean Algorithm. This is a low-power architecture that

can be realized on moderate silicon area; the author claims that it requires just a little

more hardware resources than for a pure GF (p) multiplier.

Goodman and Chandrakasan proposed a domain-speciﬁc reconﬁgurable cryptographic

processor (DSRCP) in [6]. The instruction set deﬁnition of the DSRCP was dictated by the

IEEE 1363 Public Key Cryptography Standard document. A list of the arithmetic functions

required to implement the various primitives deﬁned in the standard was tabulated in a

functional matrix, which was then used to deﬁne the instruction set architecture (ISA) of

the processor. The ISA contains 24 instructions broken up into six types of operations:

conventional arithmetic, modular integer arithmetic, GF arithmetic, elliptic curve ﬁeld

arithmetic over GF, register manipulation and processor conﬁguration.

3 Mathematical background

3.1 Elliptic curves over GF (p)

An elliptic curve E is often expressed in terms of the Weierstrass equation: y

= x

+ax+

b,wherea, b ∈ GF (p)with4a

+27b

=0(modp). The inverse of the point P =(x

)is

−P =(x

, −y

). The sum P + Q of the points P =(x

)andQ =(x

) (assume that

P, Q = O,andP = ±Q)isthepointR =(x

)where: λ =

−y

−x

= λ

−x

− x

)λ − y

For P = Q, the “doubling” formulae are: λ =

= λ

−2x

=(x

−x

)λ−y

The point at inﬁnity O plays a role analogous to that of the number 0 in ordinary

addition. Thus, P +O = P and P +(−P )=O for all points P . The points on elliptic curve

together with the operation of “addition” form an abelian group. Then it is straightforward

to introduce the point or scalar multiplication as main operation for ECC. This operation

can be calculated by using double-and-add algorithm as shown in Algorithm 1. For details

see[13,7,2].

Algorithm 1 Elliptic Curve Point Multiplication

Requir e: EC point P =(x, y), integer k,0<k<M, k =(k

l−1

l−2

, ··· ,k

)

l−1

=1andM

Ensure: Q =(x



)

1: Q ← P

2: for i from l − 2downto0do

3: Q ← 2Q

4: if k

=1then

5: Q ← Q + P

6: end if

7: end for

In the above deﬁnition of EC group aﬃne coordinates are used, but so-called projective

coordinates have some implementation advantages. The point addition can be done in

projective coordinates using almost only ﬁeld multiplications. Only one inversion is needed

at the end of a point multiplication operation. We have used the modiﬁed Jacobian (J

)

coordinates as proposed by Cohen et al. in [5] because EC point doubling is fastest in

this representation. They represent internally the Jacobian coordinates as a quadruple



X, Y, Z, aZ



. This representation is called modiﬁed Jacobian coordinate system and

denoted by the authors as J

. The algorithms for EC point addition and doubling are as

follows [5].

Let P =



,aZ



, Q =



,aZ



and P + Q = R =



,aZ



The addition formulas in J

are the following (P = ±Q).

,H= U

−U

,r=S

−S

=−H

−2U

=−S



−X



H, aZ

=aZ

(1)

The doubling formulas in J

are the following (R =2P ).

S =4X

,U=8Y

,M=3X





=−2S +M

=M(S−X

)−U, Z

=2Y

,aZ

=2U





(2)

3.2 Montgomery Modular Multiplication

The Montgomery product is deﬁned as: Mont(x, y)=xyR

−1

mod N,whereN =

l−1

···n

)

,0≤ x, y < N, R = b

, b =2

with gcd(N,b)=1.

Montgomery’s method for multiplying two integers x and y (called N-residues) modulo

N, avoids trial division by N which is the most expensive operation in hardware. The

Montgomery representation of x ∈ Z

is xR mod N and it allows very eﬃcient modular

arithmetic especially for multiplication [14].

The original proposal of Montgomery had a conditional subtraction included at the end

of the algorithm. For eﬃciency as well as resistance against side-channel attacks [9, 10] a

bound for R is given as 4N<Rto avoid this subtraction by Walter in [21]. This bound

guarantees that for inputs X, Y < 2N the output is also bounded by T<2N.

We will take α = 1 for simplicity and make the iteration starting from Step 2 execute l+2

times instead of l times as in the original proposal. By these changes the desired bound

is achieved as 4N<R=2

l+2

. Algorithm 2 is the algorithm for Montgomery modular

multiplication without ﬁnal subtraction which has the properties given above.

Algorithm 2 Montgomery modular multiplication without ﬁnal subtraction

Requir e: Integers N =(n

l−1

···n

)

, x =(x

···x

)

, y =(y

···y

)

with x ∈ [0, 2N − 1],y ∈

[0, 2N − 1], R =2

l+2

, gcd(N,2) = 1 and N



= −N

−1

mod 2 (Notation T =(t

l+1

...t

))

Ensure: T = xyR

−1

mod 2N

1: T ← 0

2: for i from 0 to l +1do

3: m

← t

⊕ x

4: T ← (T + x

y + m

N)/2

5: end for

All the operations will be done modulo 2N through EC point multiplication. The last

step is to convert the Montgomery representation of the coordinates of the resulting point

back to the normal representation. This is done by calculating the Montgomery modular

multiplication of the coordinates and 1, Mont(xR, 1) = xRR

−1

= x. It can be easily proved

that Mont(T,1) ≤ N ,if0≤ T<2N.

4 Hardware Implementation

Our Elliptic Curve processor (ECP) can be divided into 5 levels hierarchically as shown

in Fig. 1.

The operation blocks on each level from top to bottom are as follows:

• Level 1: Main Controller (MC)

• Level 2:

1. Aﬃne to projective coordinates converter (AtoP): (x, y) → (X,Y, Z, aZ

)such

that X = x, Y = y, Z =1andaZ

= a

2. Normal to Montgomery representation converter (NtoM)

3. EC point multiplier (EPM)

4. Projective to aﬃne coordinates converter (PtoA)

5. Montgomery to normal representation converter (MtoN)

Hardware implementation of an elliptic curve processor over GF(p)

Figures

Citations

Power-Analysis Attacks on an FPGA--First Experimental Results

Hardware Elliptic Curve Cryptographic Processor Over $rm GF(p)$

High-speed hardware implementations of Elliptic Curve Cryptography: A survey

Electromagnetic Analysis Attack on an FPGA Implementation of an Elliptic Curve Cryptosystem

An RNS Implementation of an $F_{p}$ Elliptic Curve Point Multiplier

References

A method for obtaining digital signatures and public-key cryptosystems

Handbook of Applied Cryptography

Differential Power Analysis

Elliptic curve cryptosystems

Use of Elliptic Curves in Cryptography

Related Papers (5)

Elliptic curve cryptosystems

Elliptic Curves in Cryptography

Use of Elliptic Curves in Cryptography

Modular multiplication without trial division

A method for obtaining digital signatures and public-key cryptosystems

Frequently Asked Questions (16)

Q1. What is the architecture of the proposed processor?

Q2. What are the contributions in this paper?

Q3. What is the architecture of the blocks?

Q4. What is the affine to projective coordinates converter?

Q5. What is the performance of an elliptic curve cryptosystem?

Q6. What is the expensive operation in hardware?

Q7. Why is PtoA able to perform a point multiplication in parallel?

Q8. what is the j-th digit of ti?

Q9. what is the doubling formula in jm?

Q10. What is the advantage of the Extended Euclidean Algorithm?

Q11. What is the j-th digit of the adder?

Q12. What is the doubling formula in jm?

Q13. What is the simplest way to solve a finite field?

Q14. What is the simplest representation of the sum of two numbers?

Q15. What is the way to add points to a elliptic curve?

Q16. What is the conversion of the input point to a projective coordinate?