scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

A scalable and high performance elliptic curve processor with resistance to timing attacks

04 Apr 2005-Vol. 1, pp 538-543
TL;DR: A high performance and scalable elliptic curve processor which is designed to be resistant against timing attacks and the underlying Galois field operators are scheduled so that the point multiplication delay is minimized.
Abstract: This paper presents a high performance and scalable elliptic curve processor which is designed to be resistant against timing attacks. The point multiplication algorithm (double-add-subtract) is modified so that the processor performs the same operations for every 3 bits of the scalar k independent of the bit pattern of the 3 bits. Therefore, it is not possible to extract the key pattern using a timing attack. The data flow graph of the modified algorithm is derived and the underlying Galois field operators are scheduled so that the point multiplication delay is minimized. The architecture of this processor is based on the Galois field of GF(2n) and the bit-serial field multiplier and squarer are designed. The processor is configurable for any value of n and the delay of point multiplication is [18(n+3) + (n+3)/2 + 1]/spl times/(n/3) clock cycles. For the case of GF(2/sup 163/) the point multiplication delay is 165888 clock cycles.

Summary (2 min read)

1. Introduction

  • Elliptic curve cryptography (ECC) is a promising form of public key cryptography for next-generation embedded applications.
  • Reference [3] provides an algorithm based on Montgomery’s method which requires a point double and point addition to occur at each scalar multiplication step, regardless of the key bit.
  • This paper presents a scalable ECC hardware implementation that provides resistance to certain types of timing-based side-channel attacks.
  • Moreover, through their proposed algorithm schedules, the datapath operators are running for every step of the algorithm regardless of the bit-pattern of the private key.
  • Section five provides the performance results and conclusion is in section six.

2. Elliptic Curve Cryptosystem

  • IEEE public-key standard specification (IEEE P1363) [8] defines the Elliptic Curve Cryptography algorithm.
  • The elements of the Galois Field that satisfy the elliptic curve equation form a group with a specific addition operation.
  • Calculating 2.P is referred to as double operation and the inverse of the addition operation is called subtraction.
  • Figure 1 shows the point multiplication algorithm [8] that is based on the signed digit representation of integer k and is considered to be a faster point multiplication algorithm compared to the algorithm based on the regular binary representation [9].
  • The details of these operations are presented in the next section.

3.1. Resistance against timing attack

  • A timing attack is possible because of the data dependent if-conditions, shown in steps 3.2. and 3.3. [2].
  • This security hole makes it possible to extract the bit pattern of the scalar k (key) using timing attack.
  • Figure 2 shows all the possibilities of double/addition/subtraction for all the different combinations of the three bits of k and h.
  • By assuming that the result of the previous calculation is S, then the new value is calculated using the value of S and the initial point P. Since P is a known point on the elliptic curve before the point multiplication algorithm starts, all the values mP are known values and can be pre-calculated and stored in the memory.
  • This means that independent of Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC’05).

3.2. Performance optimization

  • Considering three bits of the integer k at a time not only helps to hide the key pattern from the attacker, but also will provide delay optimization in the overall point multiplication algorithm.
  • The speed optimization is that there will be only one addition/subtraction for every three bits of integer k while the original algorithm requires one to three repeated additions/subtractions depending of the bit pattern of k and h.
  • A similar method was also presented in [10] at the algorithm level (not implementation) for the point multiplication algorithm based on the binary representation.
  • In this paper the authors use the algorithm based on the signed digit representation and the hardware architecture of the proposed modified algorithm is presented.

3.3. Point Double/Add/Subtract schedules

  • Figure 4 shows the details of the double and add/subtract operations based on the projective coordinate representation of the points of the elliptic curve [8], [11].
  • In their case the authors have chosen the bit-serial implementation of the GF multiplier and squarer operations.
  • Figure 5 shows the optimized schedule of three double operations (8S) that is derived using two multipliers, one squarer and one adder.
  • Add/subtract is based on the equations of Figure 4b Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC’05).

3.4. Underlying Galois Field operations

  • This section presents the architecture of underlying Galois Field operators.
  • Generally there are total of n registers that contain the output R. Therefore, the result of multiplication is ready after (n+1) cycles.
  • Squaring is similar to the multiplication with the difference being the two operands are the same.
  • Therefore, half of the bits of the input can be loaded into the output registers and every new coefficient is shifted over two positions [12].
  • The authors have chosen the odd case because for secure elliptic curve cryptography based on GF(2n), n must be a prime number.

4. Processor architecture

  • Based on the modified point multiplication algorithm and the optimized schedules of elliptic curve three doubles and add/subtract operations that are presented in section three, a scalable and high speed elliptic curve cryptographic processor is implemented.
  • This module includes a datapath that consists of the GF operators with their interconnections, the storage unit that keeps the intermediate variables (X, Y, Z, T1, T2, T3, T4), and the FSM that creates the control signals to perform the 8S ± mP operation schedule .
  • Figure 10 shows the block diagram of the whole processor.
  • The key scheduling unit calculates the value h = 3k and generates the decision signal for the ECC storage unit to choose the value mP for every three bits of k and h.
  • Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC’05).

5. Performance Results

  • The proposed elliptic curve crypto processor is designed using VHDL and is simulated using Modelsim.
  • This processor is scalable and can be generated for any field GF(2n) where n is the size of the datapath.
  • Every time frame of the schedules in Figures 5 and 6 takes (n+3) cycles.

6. Conclusion

  • A high performance and scalable elliptic curve processor that provides resistance against timing attacks is presented.
  • The dataflow schedules for the underlying operators of the modified point multiplication algorithm are optimized for maximum speed.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

A Scalable and High Performance Elliptic Curve Processor w ith
Resistance to Timing Attacks
Alireza Hodjat
1
, David D. Hwang
1
, Ingrid Verbauwhede
1,2
1
University of California, Los Angeles
2
Katholieke Universiteit Leuven
{ahodjat, dhwang, ingrid} @ ee.ucla.edu
Abstract
This paper presents a high performance and
scalable elliptic curve processor which is designed to
be resistant against timing attacks. The point
multiplication algorithm (double-add-subtract) is
modified so that the processor performs the same
operations for every 3 bits of the scalar k independent
of the bit pattern of the 3 bits. Therefore, it is not
possible to extract the key pattern using a timing
attack. The d ata flow graph of the modified algorithm
is derived and the underlying Galois Field operators
are scheduled so that the point multiplication delay is
minimized. The a rchitecture of this processor is based
on the Galois Field of GF(2
n
) and the bit-serial field
multiplier and squarer are designed. The processor is
configurable for any value of n and the delay of point
multiplicatio n is [18(n+3) + (n+3)/2 + 1]×(n/3) clock
cycles. For the case of GF(2
163
) the point
multiplication delay is 165888 clock cycles.
Keywords
Elliptic Curve Cryptography, side-channel attacks,
Galois fields, hardware architecture, security.
1. Introduction
Elliptic curve cryptography (ECC) is a promising
form of public key cryptography for next-generation
embedded applications. Because the elliptic key
discrete logarithm problem has no known solution that
can be computed in sub-exponential time, an ECC
system can provide security equivalent to an RSA
system while using much smaller parameters (i.e. bit
size). Due to these smaller parameters, ECC systems
are particularly attractive for deployment in deeply
embedded systems with limited resources.
However, it is critical to note that security in an
embedded context requires a cryptosystem to be both
efficient in r esources as well as keenly aware of side-
channel attacks. A cryptosystem which is not resource-
efficient is impractical for use in a resource-constrained
device. Likewise, a system that is not side-channel
aware may result in an easily compromised device.
In past years, there have been a number of papers on
ECC implementations designed for resistance to side-
channel attacks (SCAs). Side-channel attacks on ECC
systems are often focused on exploiting the difference
in power signatures between the point double and point
addition operations. In using a simple double-add
scalar multiplication algorithm, for instance, the private
key can readily be determined by analyzing in time the
power signatures of additions and doubles. Coron [2]
demonstrates that differential power analysis can be
applied to ECC systems, and shows initial techniques
to thwart DP A. Reference [3] provides an algorithm
based on Montgomery’s method which requires a point
double and point addition to occur at each scalar
multiplication step, regardless of the key bit. In [4], the
authors first analyze the work of [2] and [3] and then
demonstrate that a hybrid technique of both is actually
the most secure. Oswald and Aigner [5] have shown
that adding randomization to the scalar multiplication
algorithm to blind the input parameters of the
multiplication is a means to provide SCA resistance.
The work of [6] provides two techniques to thwart
attacks: the first causes point doublings and additions
to be indistinguishable; the second performs non-
deterministic point exponentiation. An SCA-resistant
method using encoding to ensure point doublings and
additions occur in a uniform pattern is shown in [7].
This paper presents a scalable ECC hardware
implementation that pro vides resistance to certain types
of timing-based side-channel attacks. A modified point
multiplication algorithm is presented which is used to
conceal the difference between the point
addition/subtraction and point double operations. The
implementation in this paper is different from prior art
because we will focus on hiding the information of the
private key in the highest level of the point
Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC’05)
0-7695-2315-3/05 $ 20.00 IEEE

multiplication algorithm by following a similar
dataflow graph (same operations in the same number of
steps) for every three bits of the secret-key. Moreover,
through our proposed algorithm schedules, the datapath
operators are running for every step of the algorithm
regardless of the bit-pattern of the private key.
The rest of this paper is organized as follows. In
section two the elliptic curve cryptosystem is
introduced. Section three presents the modified point
multiplication algorithm that is used to gain resistance
against timing attacks. Moreover, the performance
optimizations and the underlying operator schedules to
implement the modified point multiplication algorithm
are presented. Section four presents the hardware
architecture of the ECC processor that is implemented
using the result of section three. Section five provides
the performance results and conclusion is in section six.
2. Elliptic Curve Cryptosystem
IEEE public-key standard specification (IEEE
P1363) [8] defines the Elliptic Curve Cryptography
algorithm. The main operation in a typical elliptic
curve cryptosystem is called the point-multiplication
which refers to calculating k.P where k is an integer
and P is a point on the specific elliptic curve. The
theory of ECC is b ased on the mathematical mapping
of an elliptic curve on a Galois Field. The elements of
the Galois Field that satisfy the elliptic curve eq uation
form a group with a specific additio n operation. With
this definition, k.P is equivalent to adding P to itself k
times by the group operation. Calculating 2.P is
referred to as double operation and the inverse of the
addition operatio n is called subtraction. All three
operations, do ubling, addition, and subtraction a re used
in the point multiplication algorithm.
Figure 1 shows the point multiplication algorithm
[8] that is based on the signed digit representation of
integer k and is considered to be a faster point
multiplication algorithm compared to the algorithm
based on the regular binary representation [9]. This
algorithm uses the elliptic curve group operations
(double, addition, and subtract) based on the
underlying Galois Field. The details of these operations
are presented in the next section. The projective
coordinates (X,Y,Z)are used for the representation of
the points on the elliptic curve in order to avoid the
inversion operation in the underlying Galois Field [8].
3. Modified point multiplication algorithm
3.1. Resistance against timing attack
As shown in Figure 1, depending on the bit pattern of
integer k, a combination of group operations are used.
A timing attack is possible because of the data
dependent if-conditions, shown in steps 3.2. and 3. 3.
[2]. This security hole makes it possible to extract the
bit pattern of the scalar k (key) using timing attack.
Clearly the time that it takes to perform a single
doubling or a doubling followed by an
addition/subtraction is different and therefore, at each
step it is easy to see if the current bit of k is 0 or 1.
In order to hide the key pattern information, we
propose to consider more than one bit of k (and h)ata
time. Figure 2 shows all the possibilities of
double/addition/subtraction for all the different
combinations of the three bits of k and h. By assuming
that the result of the previous calculation is S, then the
new value is calculated using the value of S and the
initial point P.
The interesting outcome is that all 27 different
combinations can be calculated by doing three doubles
in a row (8S) and then applying one addition or
subtraction with a value mP (wherem=0,1,2,..7).
Since P is a known point on the elliptic curve before
the point multiplication algorithm starts, all the values
mP are known values and can be pre-calculated and
stored in the memory. This means that independent of
Figure 1: Point multiplication algorithm [8]
Figure 2: Point multiplication per 3 bits of scalar k
1. 2(2(2 S)) = 8 S + 0P
2. 2(2(2 S + P)) = 8 S + 4P
3. 2(2(2 S P)) = 8 S 4P
4. 2(2(2 S) + P) = 8 S + 2P
5. 2(2(2 S + P) + P) = 8 S + 6P
6. 2(2(2 S P) + P) = 8 S 2P
7. 2(2(2 S) P) = 8 S 2P
8. 2(2(2 S + P) P) = 8 S + 2P
9. 2(2(2 S P) P) = 8 S 6P
10. 2(2(2 S)) + P = 8 S + P
11. 2(2(2 S + P)) + P = 8 S + 5P
12. 2(2(2 S P)) + P = 8 S + 3P
13. 2(2(2 S) + P) + P = 8 S + 3P
14. 2(2(2 S + P) + P) + P = 8 S + 7P
15. 2(2(2 S P) + P) + P = 8 S P
16. 2(2(2 S) P) + P = 8 S P
17. 2(2(2 S + P) P) + P = 8 S + 3P
18. 2(2(2 S P) P) + P = 8 S 5P
19. 2(2(2 S)) P = 8 S P
20. 2(2(2 S + P)) P = 8 S + 3P
21. 2(2(2 S P)) P = 8 S 5P
22. 2(2(2 S) + P) P = 8 S + P
23. 2(2(2 S + P) + P) P = 8 S + 5P
24. 2(2(2 S P) + P) P = 8 S 3P
25. 2(2(2 S) P) P = 8 S 3P
26. 2(2(2 S + P) P) P = 8 S + P
27. 2(2(2 S P) P) P = 8 S 7P
Input: An integer k and an elliptic curve point P =(X, Y, Z).
Output: The elliptic curve point S=k.P=(X*, Y*, Z*).
1. Set S=P
2. Let kl kl–1...k1 k0 and hl hl–1 ...h1 h0 be the binary
representations of k and h=3k, respectively.
3. For i from l–1downto 1 do
3.1 Set S = 2S
(X*, Y*, Z*) = Double [(X*, Y*, Z*)].
3.2 If hi =1andki =0thensetS=S+P
(X*, Y*, Z*) = Add [(X*, Y*, Z*), (X, Y, Z)].
3.3 If hi =0andki =1thensetS=S-P
(X*, Y*, Z*) = Subtract [(X*, Y*, Z*), (X, Y, Z)].
Example: k=13
k = 001101, h = 100111 -> S = 2 { 2 [ 2 (2P) P ] } + P
Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC’05)
0-7695-2315-3/05 $ 20.00 IEEE

Figure 3: Modified point multiplication algorithm
the bit pattern of scalars k (and h), 8S ± mP is
performed for every three bits of the key. Notice that
for the case of 8S, a dummy addition (8S + 0P) can be
performed in order to keep the datapath busy and in the
end the result is exchanged with the value of 8S which
is already calculated. The modified algorithm is shown
in Figure 3.
3.2. Performance optimization
Considering three bits of the integer k at a time not
only helps to hide the key pattern from the attacker, but
also will provide delay optimization in the overall point
multiplication algorithm. The speed optimization is that
there will be only one addition/subtraction for every
three bits of integer k while the original algorithm
requires one to three repeated additions/subtractions
depending of the bit pattern of k and h. For example
Figure 2 shows that in line 14, the original algorithm
requires three doublings and three additions while the
modified approach performs three doublings and only
one addition. Therefore, the total point multiplication
delay is minimized with the trade-off of having more
memory to store the values P, 2P, ..., 7P.
A similar method was also presented in [10] at the
algorithm level (not implementation) for the point
multiplication a lgorithm based on the binary
representation. However, in this paper we use the
algorithm based on the signed digit representation and
the hardware architecture of the proposed modified
algorithm is presented. Moreover, we will explore
further delay optimizations when performing 8S ± mP
as will be presented in the next section.
3.3. Point Double/Add/Subtract schedules
Figure 4 shows the details of the double and
add/subtract operations based on the projective
coordinate representation o f the po ints of the elliptic
curve [8], [11]. These operations are defined for the
curve of y
2
+xy=x
3
+ax
2
+bover GF(2
n
). The ECC
operations are performed using multiplication,
squaring, and addition in the underlying Galois Field
GF(2
n
). To implement the high speed ECC operators
based on the algorithms of Figure 4, the dataflow
graphs of these operations and data dependencies
between different variables and underlying Galois Field
operators must be derived.
In order to minimize the overall delay of 8S ± mP
operation, the following two c haracteristics can be
used.
First of all, we can use multiple functional units (GF
multiplier, squarer o r adder) in parallel. For this
purpose we should implement high speed and low
cost (in terms of area) Galois field operators. In our
case we have chosen the bit-serial implementation of
the GF multiplier and squarer operations. This is
because the bit-serial multiplier and squarer consume
much less area than bit-parallel implementation and
at the same time can be clocked at much faster clock
frequencies d ue to the minimum combinatorial
critical path delay.
Secondly, since we are calculating 8S in the first
phase we can combine the operation schedules of
three doubles in a row together and optimize the total
delay. Moreover, due to the fact that we can use
more than one GF multiplier and/or squarer, by
maintaining the data dependencies between different
operators in the schedule, we can move the GF
operator up or down in the schedule and therefore
find the optimal schedule and op erator assignments
that can perform the 8S ± mP operation.
Figure 5 shows the optimized schedule of three
double operations (8S) that is d erived using two
multipliers, one squarer and one adder. In this d a taflow
graph, the box with sign × is the GF multiplier, the box
with sign 2 is the GF squarer, and the box with sign +
is the GF adder.
Figure 4: Double and Add/Subtract algorithms
2(X
1
, Y
1
, Z
1
)=(X
2
, Y
2
, Z
2
), w h ere
Z
2
= X
1
Z
1
2
,
X
2
=(X
1
+ cZ
1
2
)
4
,
U = Z
2
+ X
1
2
+ Y
1
Z
1
,
Y
2
= X
1
4
Z
2
+ UX
2
.
(a) D ouble operation
(X
0
, Y
0
, Z
0
)+(X
1
, Y
1
, Z
1
)=(X
2
, Y
2
, Z
2
), w h ere
U
0
= X
0
Z
1
2
,
S
0
= Y
0
Z
1
3
,
U
1
= X
1
Z
0
2
,
W = U
0
+ U
1
,
S
1
= Y
1
Z
0
3
,
R = S
0
+ S
1
,
L = Z
0
W
V = RX
1
+ LY
1
,
Z
2
= LZ
1
,
T = R + Z
2
,
X
2
= aZ
2
2
+ TR + W
3
,
Y
2
= TX
2
+ VL
2
.
(b) Add/Subtract operation
Input: An integer k and an elliptic curve point P =(X, Y, Z).
Output: The elliptic curve point S=k.P=(X*, Y*, Z*).
1. Set S=P
2. Let kl kl–1...k1 k0 and hl hl–1 ...h1 h0 be the binary
representations of k and h=3k, respectively.
3. For i from (l 1)/3 downto 1 do the following for every
three bits of k and h
3.1 S=(8S±mP)
where mP is either P, 2P, …, or 7P depending of the
bit pattern of k and h.
(P, 2P, …, and 7P are pre-calculated and stored)
Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC’05)
0-7695-2315-3/05 $ 20.00 IEEE

Figure 5: Optimized schedule for 3-Double operations
In Figure 5, the input point (S)is(X
0
,Y
0
,Z
0
) and the
output (8S) is (X
4
,Y
4
,Z
4
), and the three double
operations based on the data flow graph of Figure 4 are
scheduled in a sequence. As it will be shown in the next
section, the GF(2
n
) bit-serial multiplier and squarer
require (n+1) and (n+1)/2 clock cycles to generate
their outputs, respectively. T his means that we can
perform two GF squarings in the time frame of a GF
multiplication. Therefore, for every time frame that is
numbered in Figure 5, two GF multiplications and two
GF squarings are performed using the two multipliers
and the single squarer. The GF adder takes only one
cycle and is performed in the end of each time frame.
By considering another single cycle to store the result
in the memory and load the new operands, there will be
total of (n+3) cycles required to perform each step.
Figure 6: Optimized schedule for add/subtract operations
Figure 6 shows the optimized schedule of the
add/subtract operation (8S ± mP). The inputs are of
Figure 5, 8S=(X
4
,Y
4
,Z
4
)andmP=(X
1
,Y
1
,Z
1
)whichis
loaded from memory and the final result is (X
5
,Y
5
,Z
5
).
A control signal differentiates between add and
subtract operations by loading the appropriate
operands. This is because subtraction of (X
4
,Y
4
,Z
4
)-
(X
1
,Y
1
,Z
1
) is calculated by addition of (X
4
,Y
4
,Z
4
)+
(X
1
,X
1
Z
1
+Y
1
,Z
1
)[8].
Using the schedule of figure 5 the calculation of 8S
takes [9(n+3) + (n+3)/2] cycles which is 6(n+3)
cycles less than the original algorithm of Figure 4a.
Moreover the schedule of figure 6 takes 9(n+3) cycles
which is [7(n+3) + (n+3)/2] cycles less than the
original implementation of 8S ± mP using the algorithm
of figure 4b.
2
×
2
×
×
×
×
2
+
2
2
X
0
X
0
Y
0
Z
0
Z
0
CX
0
S1
S1
Z
2
Z
2
12
12
2
+
X
3
+
Y
2
Z
2
C
S2
2
×
2
×
×
×
×
2
+
+
2
2
X
2
X
2
X
2
S2
Z
3
Z
3
12
12
2
+
Y
4
×
×
×
×
×
2
+
+
2
2
X
3
X
3
S3
Z
4
Z
4
X
4
12
12
2
+
Y
3
Z
3
C
S3
2
Z
1
2
T1
T1
T1
T1
T1
T1
Z
1
1
2
3
4
5
6
7
8
9
S=(X
0
,Y
0
,Z
0
)
2S = (X
2
,Y
2
,Z
2
)=2(X
0
,Y
0
,Z
0
)
4S = (X
3
,Y
3
,Z
3
)=2(X
2
,Y
2
,Z
2
)
8S = (X
4
,Y
4
,Z
4
)=2(X
3
,Y
3
,Z
3
)
Each doubling is based on the dataflow graph of Figure 4a
Z
5
X
5
Y
5
2
×
×
×
+
+
X
1
Z
4
Z
4
Y
4
×
Z
4
Y
1
2
W
×
×
X
4
Z
1
×
Z
1
2
T1
T3
T1
×
+
Z
5
×
R
×
×
+
×
+
×
2
X
1
+
T
×
V
a
Z
5
W
3
W
3
RT
RT
X
5
T4
T3
T2
T2
T3
×
Z
1
1
2
1
12
2
1
2
1
2
1
2
12
2
T1
2
T1
T1
×
2
Z
1
X
1
XZ
1
Y
1
XZ
1
Add/subtract
Add/subtract
10
11
12
13
14
15
16
17
18
8S = (X
4
,Y
4
,Z
4
)
mP= (X
1
,Y
1
,Z
1
)
8S ± mP = (X
5
,Y
5
,Z
5
).
Add/subtract is based on the equations of Figure 4b
Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC’05)
0-7695-2315-3/05 $ 20.00 IEEE

3.4. Underlying Galois Field operations
This section presents the architecture of underlying
Galois Field operators. The configurable and scalable
architecture of general GF(2
n
) operators are used. The
multiplier and squarer are performed modulo an
irreducible polynomial (P) which is a programmable
parameter. The bit-serial implementations of these
operators are used. As an example, the architecture of a
4-bit multiplier is shown in Figure 7. Generally there
are total of n registers that contain the output R. It takes
one cycle to load the operands A and B followed by n
cycles of calculating the product and modulo reduction
[12]. T herefore, the result of multiplication is ready
after (n+1) cycles. Figure 8 shows the architecture of
the bit-serial multiplier (a 7-bit field is shown as an
example). Squaring is similar to the multiplication with
the difference being the two operands are the same.
Therefore, half of the bits of the input can be loaded
into the output registers and every new coefficient is
shifted over two positions [12]. Therefore, for an n bit
squarer, a total of (n+1)/2 cycles (half of the
multiplication) are required. Notice that the
architecture shown in Figure 8 is for the case when n is
odd. When n is even there is a slight difference in the
connection of the irreducible polynomial P and the
result registers. We have chosen the odd case because
for secure elliptic curve cryptography based on GF(2
n
),
n must be a prime number.
Figure 7: GF(2
n
) bit-serial multiplier
The GF adder chosen for our implementation is a
bit-parallel architecture in o rder to generate the result
in a single clock cycle. Notice that the addition in
GF(2
n
) is carry free. Therefore, addition is equivalent
to XORing the two bit-vectors of the input operand.
4. Processor architecture
Based on the modified point multiplication algorithm
and the optimized schedules of elliptic curve three
doubles and add/subtract operations that are presented
in section three, a scalable and high speed elliptic curve
cryptographic processor is implemented. The
underlying Galois Field ope rations that are presented in
the last section are used in the datapath. Based on the
schedules of Figures 5 and 6 the required
interconnection between storage elements and GF
operators are designed and the finite state machine that
follows the schedules is implemented.
Figure 9 shows the po int multiplication datapath that
is d esigned to calculate 8S ± mP schedules. This
module includes a datapath that consists of the GF
operators with their interconnections, the storage unit
that keeps the intermediate variables (X, Y, Z, T1, T2,
T3, T4), and the FSM that creates the control signals to
perform the 8S ± mP operation schedule (Figures 5 and
6). Note that point mP with coordinate (X
1
, Y
1
, Z
1
)is
an input to this module. The result of each doubling
and the final addition/subtraction a re overwritten to the
initial register variable S. This means that the hardware
registers that store X
0
, X
2
, X
3
, X
4
,andX
5
are the
same.ThisisalsothecaseforY and Z coordinates.
Moreover, four temporary variables called T1, T2, T3,
T4 are used to store the values in the whole schedule.
The point multiplication module of figure 9 is
controlled from the upper level module using the start
and done and plus_minus signals. Basically, the upper
level control provides the correct value of mP through
(X
1
, Y
1
, Z
1
) c onnections and also asserts the write value
for plus_minus signal and asserts the start signal of this
unit. Then this unit that already has the value of S in its
storage (variables X, Y, Z) starts to calculate 8S ± mP
and updates the variables X, Y, Z and asserts the done
signal indicating that the result for the three bits is
ready. This process is repeated for every three bits of k.
Figure 10 shows the block diagram of the whole
processor. The ECC storage unit stores the values mP.
The key scheduling unit calculates the value h=3kand
generates the decision signal for the ECC storage unit
to choose the value mP for every three bits of k and h.
The top level controller issues the required controls for
the point multiplication datapath and the key
scheduling unit to synchronize the operatio n of all the
units in the processor.
R3
+
&
P3
B3
&
A3
A2
A1
A0
R2
+
&
P2
B2
&
R1
+
&
P1
B1
&
R0
+
&
P0
B0
&
R=(A×B)ModP
R6
+
&
P0
A6
A5
A4
A3
R=(A×A)ModP
&
P6
R4
+
&
P5
&
P4
R2
+
&
P3
&
P2
R0
+
&
P1
&
P0
A2
A1
A0
R5
+
&
P6
&
P5
R3
+
&
P4
&
P3
R1
+
&
P2
&
P1
Figure 8: GF(2
n
) bit-serial squarer
Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC’05)
0-7695-2315-3/05 $ 20.00 IEEE

Citations
More filters
Journal ArticleDOI
TL;DR: This survey considers all three canonical applications of timing channels, namely, covert communication, timing side channel, and network flow watermarking and surveys the theoretical foundations, the implementation, and the various detection and prevention techniques that have been reported in literature.
Abstract: A timing channel is a communication channel that can transfer information to a receiver/decoder by modulating the timing behavior of an entity. Examples of this entity include the interpacket delays of a packet stream, the reordering packets in a packet stream, or the resource access time of a cryptographic module. Advances in the information and coding theory and the availability of high-performance computing systems interconnected by high-speed networks have spurred interest in and development of various types of timing channels. With the emergence of complex timing channels, novel detection and prevention techniques are also being developed to counter them. In this article, we provide a detailed survey of timing channels broadly categorized into network timing channel, in which communicating entities are connected by a network, and in-system timing channel, in which the communicating entities are within a computing system. This survey builds on the last comprehensive survey by Zander et al. [2007] and considers all three canonical applications of timing channels, namely, covert communication, timing side channel, and network flow watermarking. We survey the theoretical foundations, the implementation, and the various detection and prevention techniques that have been reported in literature. Based on the analysis of the current literature, we discuss potential future research directions both in the design and application of timing channels and their detection and prevention techniques.

70 citations


Cites background from "A scalable and high performance ell..."

  • ...To the best of our knowledge, the flow fingerprinting problem was first introduced in Houmansadr and Borisov [2013b] with clear distinction to NFW....

    [...]

Journal ArticleDOI
TL;DR: Dual-spacer dual-rail delay-insensitive Logic (D^3L), presented in this paper, is able to mitigate both power- and timing-based side-channel attacks.

34 citations


Cites background from "A scalable and high performance ell..."

  • ...Existing countermeasures include inserting dummy operations [29], using redundant representation [30], and unifying the multiplication operands [31]....

    [...]

Journal ArticleDOI
TL;DR: The Delay-Insensitive Ternary Logic (DITL) asynchronous design paradigm that combines design aspects of similar dual-rail asynchronous paradigms and Boolean logic to create a single wire per bit, three voltage signaling and logic scheme is developed.
Abstract: As digital circuit design continues to evolve due to progress of semiconductor processes well into the sub 100 nm range, clocked architectures face limitations in a number of cases where clockless asynchronous architectures generate less noise and produce less electro-magnetic interference (EMI). This paper develops the Delay-Insensitive Ternary Logic (DITL) asynchronous design paradigm that combines design aspects of similar dual-rail asynchronous paradigms and Boolean logic to create a single wire per bit, three voltage signaling and logic scheme. DITL is compared with other delay insensitive paradigms, such as Pre-Charge Half-Buffers (PCHB) and NULL Convention Logic (NCL) on which it is based. An application of DITL is discussed in designing secure digital circuits resistant to side channel attacks based on measurement of timing, power, and EMI signatures. A Secure DITL Adder circuit is designed at the transistor level, and several variance parameters are measured to validate the efficiency of DITL in resisting side channel attacks. The DITL design methodology is then applied to design a secure 8051 ALU.

11 citations

Journal ArticleDOI
TL;DR: Results show that the Advanced Encryption Standard (AES) cores designed using MTD3L exhibit similar security to previous secure techniques with substantially less area and energy overhead.
Abstract: As portable devices become more ubiquitous, data security in these devices is becoming increasingly important. Traditional circuit design techniques leave otherwise secure systems vulnerable due to the characteristics of the hardware implementation, rather than weaknesses in the security algorithms. These characteristics, called side-channels, are exploitable because they can be measured and correlated with processed data, potentially giving an attacker insight into the device’s secret data. Alternative design techniques such as dual-rail asynchronous designs are capable of minimizing these potential side-channels by decoupling them from the data being processed. However, these techniques are either expensive to implement compared to standard designs or leave exploitable imbalances in the dual-rail implementation itself. Multi-Threshold Dual-Spacer Dual-Rail Delay-Insensitive Logic (MTD3L) offers security by balancing side-channels both in general and between the dual-rail signals themselves, as well as reduction in circuit overhead compared to previous secure design techniques. Results show that the Advanced Encryption Standard (AES) cores designed using MTD3L exhibit similar security to previous secure techniques with substantially less area and energy overhead.

10 citations

Proceedings ArticleDOI
01 Sep 2006
TL;DR: This paper reports work in progress in the design, implementation and evaluation of a reconfigurable finite field arithmetic architecture with a direct application in elliptic curve cryptography (ECC) for mobile devices that contributes to manage the current interoperability problems in ECC.
Abstract: This paper reports work in progress in the design, implementation and evaluation of a reconfigurable finite field arithmetic architecture with a direct application in elliptic curve cryptography (ECC) for mobile devices. This module contributes to manage the current interoperability problems in ECC, that are due to the several choices in the implementation of ECC crypto systems. We report an evaluation of some finite field arithmetic modules in an architecture for computing scalar multiplication, which is the most time consuming in ECC cryptographic schemes. The arithmetic modules were evaluated for all the GF(2m) NIST elliptic curves in a hardware architecture implemented in field programmable technology

9 citations


Cites background from "A scalable and high performance ell..."

  • ...810 [8] - New 3 bits at a time Projective Polynomial NA Software...

    [...]

References
More filters
Journal Article
TL;DR: In this paper, an Atmel ATmega128 at 8 MHz was used to implement ECC point multiplication over fields using pseudo-Mersenne primes as standardized by NIST and SECG.
Abstract: Strong public-key cryptography is often considered to be too computationally expensive for small devices if not accelerated by cryptographic hardware. We revisited this statement and implemented elliptic curve point multiplication for 160-bit, 192-bit, and 224-bit NIST/SECG curves over GF(p) and RSA-1024 and RSA-2048 on two 8-bit microcontrollers. To accelerate multiple-precision multiplication, we propose a new algorithm to reduce the number of memory accesses. Implementation and analysis led to three observations: 1. Public-key cryptography is viable on small devices without hardware acceleration. On an Atmel ATmega128 at 8 MHz we measured 0.81s for 160-bit ECC point multiplication and 0.43s for a RSA-1024 operation with exponent e = 2 16 +1. 2. The relative performance advantage of ECC point multiplication over RSA modular exponentiation increases with the decrease in processor word size and the increase in key size. 3. Elliptic curves over fields using pseudo-Mersenne primes as standardized by NIST and SECG allow for high performance implementations and show no performance disadvantage over optimal extension fields or prime fields selected specifically for a particular processor architecture.

1,113 citations

Book ChapterDOI
12 Aug 1999
TL;DR: This paper generalizes DPA attack to elliptic curve (EC) cryptosystems and describes a DPA on EC Diffie-Hellman key exchange and EC EI-Gamal type encryption that enable to recover the private key stored inside the smart-card.
Abstract: Differential Power Analysis, first introduced by Kocher et al. in [14], is a powerful technique allowing to recover secret smart card information by monitoring power signals. In [14] a specific DPA attack against smart-cards running the DES algorithm was described. As few as 1000 encryptions were sufficient to recover the secret key. In this paper we generalize DPA attack to elliptic curve (EC) cryptosystems and describe a DPA on EC Diffie-Hellman key exchange and EC EI-Gamal type encryption. Those attacks enable to recover the private key stored inside the smart-card. Moreover, we suggest countermeasures that thwart our attack.

1,089 citations


"A scalable and high performance ell..." refers background in this paper

  • ...The main operation in a typical elliptic curve cryptosystem is called the point-multiplication which refers to calculating k.P where k is an integer and P is a point on the specific elliptic curve....

    [...]

  • ...Keywords Elliptic Curve Cryptography, side-channel attacks, Galois fields, hardware architecture, security....

    [...]

Book ChapterDOI
11 Aug 2004
TL;DR: To accelerate multiple-precision multiplication, a new algorithm to reduce the number of memory accesses is proposed and implemented elliptic curve point multiplication for 160-bit, 192- bit, and 224-bit NIST/SECG curves over GF(p), RSA-1024 and RSA-2048 on two 8-bit microcontrollers.
Abstract: Strong public-key cryptography is often considered to be too computationally expensive for small devices if not accelerated by cryptographic hardware. We revisited this statement and implemented elliptic curve point multiplication for 160-bit, 192-bit, and 224-bit NIST/SECG curves over GF(p) and RSA-1024 and RSA-2048 on two 8-bit microcontrollers. To accelerate multiple-precision multiplication, we propose a new algorithm to reduce the number of memory accesses.

1,081 citations

Journal ArticleDOI
TL;DR: This paper surveys the known methods for fast exponentiation, examining their relative strengths and weaknesses.

590 citations

Book ChapterDOI
12 Aug 1999
TL;DR: The improved method possesses many desirable features for implementing elliptic curves in restricted environments and requires less memory than projective schemes and the amount of computation needed for a scalar multiplication is fixed for all multipliers of the same binary length.
Abstract: This paper describes an algorithm for computing elliptic scalar multiplications on non-supersingular elliptic curves defined over GF(2m). The algorithm is an optimized version of a method described in [1], which is based on Montgomery's method [8]. Our algorithm is easy to implement in both hardware and software, works for any elliptic curve over GF(2m), requires no precomputed multiples of a point, and is faster on average than the addition-subtraction method described in draft standard IEEE P1363. In addition, the method requires less memory than projective schemes and the amount of computation needed for a scalar multiplication is fixed for all multipliers of the same binary length. Therefore, the improved method possesses many desirable features for implementing elliptic curves in restricted environments.

567 citations


"A scalable and high performance ell..." refers background in this paper

  • ...Keywords Elliptic Curve Cryptography, side-channel attacks, Galois fields, hardware architecture, security....

    [...]

Frequently Asked Questions (1)
Q1. What contributions have the authors mentioned in the paper "A scalable and high performance elliptic curve processor with resistance to timing attacks" ?

This paper presents a high performance and scalable elliptic curve processor which is designed to be resistant against timing attacks.