scispace - formally typeset
Search or ask a question
Book ChapterDOI

High-Radix Design of a Scalable Modular Multiplier

14 May 2001-pp 185-201
TL;DR: This paper describes an algorithm and architecture based on an extension of a scalable radix-2 architecture proposed in a previous work that is proven to be correct and the hardware design is discussed in detail.
Abstract: This paper describes an algorithm and architecture based on an extension of a scalable radix-2 architecture proposed in a previous work. The algorithm is proven to be correct and the hardware design is discussed in detail. Experimental results are shown to compare a radix-8 implementation witha radix-2 design. The scalable Montgomery multiplier is adjustable to constrained areas yet being able to work on any given precision of the operands. Similar to some systolic implementations, this design avoid the high load on signals that broadcast to several components, making the delay independent of operand's precision.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: A word-based version of MM is presented and used to explain the main concepts in the hardware design and gives enough freedom to select the word size and the degree of parallelism to be used, according to the available area and/or desired performance.
Abstract: This paper presents a scalable architecture for the computation of modular multiplication, based on the Montgomery multiplication (MM) algorithm. A word-based version of MM is presented and used to explain the main concepts in the hardware design. The proposed multiplier is able to work with any precision of the input operands, limited only by memory or control constraints. Its architecture gives enough freedom to select the word size and the degree of parallelism to be used, according to the available area and/or desired performance. Design trade offs are analyzed in order to identify adequate hardware configurations for a given area or bandwidth requirement.

242 citations


Cites background from "High-Radix Design of a Scalable Mod..."

  • ...High-radix designs for our scalable architecture were reported in [21], [22]....

    [...]

Journal ArticleDOI
TL;DR: This paper presents an overview of hardware implementations for the two commonly used types of public key cryptography, i.e. RSA and elliptic curve cryptography, both based on modular arithmetic.

120 citations


Cites background from "High-Radix Design of a Scalable Mod..."

  • ...They were also first to prove that this type of attack is also practical....

    [...]

Proceedings ArticleDOI
24 Jun 2003
TL;DR: A hardware implementation of an arithmetic processor which is efficient for bit-lengths suitable for both commonly used types of public key cryptography (PKC) and RSA cryptosystems is described.
Abstract: We describe a hardware implementation of an arithmetic processor which is efficient for bit-lengths suitable for both commonly used types of public key cryptography (PKC), i.e., elliptic curve (EC) and RSA cryptosystems. Montgomery modular multiplication in a systolic array architecture is used for modular multiplication. The processor consists of special operational blocks for Montgomery modular multiplication, modular addition/subtraction, EC point doubling/addition, modular multiplicative inversion, EC point multiplier, projective to affine coordinates conversion and Montgomery to normal representation conversion.

120 citations


Additional excerpts

  • ...Efficient implementation of Montgomery modular multiplication (MMM) in hardware was considered by many authors [7, 13, 33, 22, 17, 24, 34, 27, 14, 25, 28]....

    [...]

Journal ArticleDOI
TL;DR: Two new hardware architectures that are able to perform the same operation in approximately n clock cycles with almost the same clock period are proposed, based on precomputing partial results using two possible assumptions regarding the most significant bit of the previous word.
Abstract: Montgomery modular multiplication is one of the fundamental operations used in cryptographic algorithms, such as RSA and Elliptic Curve Cryptosystems. At CHES 1999, Tenca and Koc proposed the Multiple-Word Radix-2 Montgomery Multiplication (MWR2MM) algorithm and introduced a now-classic architecture for implementing Montgomery multiplication in hardware. With parameters optimized for minimum latency, this architecture performs a single Montgomery multiplication in approximately 2n clock cycles, where n is the size of operands in bits. In this paper, we propose two new hardware architectures that are able to perform the same operation in approximately n clock cycles with almost the same clock period. These two architectures are based on precomputing partial results using two possible assumptions regarding the most significant bit of the previous word. These two architectures outperform the original architecture of Tenca and Koc in terms of the product latency times area by 23 and 50 percent, respectively, for several most common operand sizes used in cryptography. The architecture in radix-2 can be extended to the case of radix-4, while preserving a factor of two speedup over the corresponding radix-4 design by Tenca, Todorov, and Koc from CHES 2001. Our optimization has been verified by modeling it using Verilog-HDL, implementing it on Xilinx Virtex-II 6000 FPGA, and experimentally testing it using SRC-6 reconfigurable computer.

100 citations


Cites methods from "High-Radix Design of a Scalable Mod..."

  • ...In [6], a highradix word-based Montgomery algorithm (MWR2MM) was proposed using Booth encoding technique....

    [...]

  • ...Several follow-up designs based on the MWR2MM algorithm have been proposed in order to reduce the computation time [6], [7], [8], [9], [10], [11]....

    [...]

  • ...using Booth recoding as discussed in [6]....

    [...]

Book ChapterDOI
10 Sep 2007
TL;DR: A circuit architecture that can handle multiple data lengths using the same circuits and improve the Montgomery multiplication algorithm in order to maximize the performance of the multiplication unit in FPGA.
Abstract: This paper describes a modular exponentiation processing method and circuit architecture that can exhibit the maximum performance of FPGA resources. The modular exponentiation architecture proposed by us comprises three main techniques. The first technique is to improve the Montgomery multiplication algorithm in order to maximize the performance of the multiplication unit in FPGA. The second technique is to improve and balance the circuit delay. The third technique is to ensure and make fast the scalability of the effective FPGA resource. We propose a circuit architecture that can handle multiple data lengths using the same circuits. In addition, our architecture can perform fast operations using small-scale resources; in particular, it can complete 512-bit modular exponentiation in 0.26 ms by means of XC4VF12-10SF363, which is the minimum logic resources in the Virtex-4 Series FPGAs. Also, the number of SLICEs used is approx. 4000 to make a very compact design. Moreover, 1024-, 1536- and 2048-bit modular exponentiations can be processed in the same circuit with the scalability.

78 citations


Cites methods from "High-Radix Design of a Scalable Mod..."

  • ...The fast hardware implementation of public-key cryptosystems has been extensively researched thus far; in particular, a circuit architecture using Montgomery multiplication [1] has often been proposed [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] [17,18,19]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper suggests ways to solve currently open problems in cryptography, and discusses how the theories of communication and computation are beginning to provide the tools to solve cryptographic problems of long standing.
Abstract: Two kinds of contemporary developments in cryptography are examined. Widening applications of teleprocessing have given rise to a need for new types of cryptographic systems, which minimize the need for secure key distribution channels and supply the equivalent of a written signature. This paper suggests ways to solve these currently open problems. It also discusses how the theories of communication and computation are beginning to provide the tools to solve cryptographic problems of long standing.

14,980 citations


"High-Radix Design of a Scalable Mod..." refers methods in this paper

  • ...1 Introduction Several applications, such as RSA algorithm, [14] Diffie-Hellman key exchange algorithm [5], Digital Signature Standard [12], and Elliptic curve cryptography [6,9] use modular multiplication and modular exponentiation....

    [...]

Journal ArticleDOI
TL;DR: An encryption method is presented with the novel property that publicly revealing an encryption key does not thereby reveal the corresponding decryption key.
Abstract: An encryption method is presented with the novel property that publicly revealing an encryption key does not thereby reveal the corresponding decryption key. This has two important consequences: (1) Couriers or other secure means are not needed to transmit keys, since a message can be enciphered using an encryption key publicly revealed by the intented recipient. Only he can decipher the message, since only he knows the corresponding decryption key. (2) A message can be “signed” using a privately held decryption key. Anyone can verify this signature using the corresponding publicly revealed encryption key. Signatures cannot be forged, and a signer cannot later deny the validity of his signature. This has obvious applications in “electronic mail” and “electronic funds transfer” systems. A message is encrypted by representing it as a number M, raising M to a publicly specified power e, and then taking the remainder when the result is divided by the publicly specified product, n, of two large secret primer numbers p and q. Decryption is similar; only a different, secret, power d is used, where e * d ≡ 1(mod (p - 1) * (q - 1)). The security of the system rests in part on the difficulty of factoring the published divisor, n.

14,659 citations

Journal ArticleDOI
TL;DR: The question of primitive points on an elliptic curve modulo p is discussed, and a theorem on nonsmoothness of the order of the cyclic subgroup generated by a global point is given.
Abstract: We discuss analogs based on elliptic curves over finite fields of public key cryptosystems which use the multiplicative group of a finite field. These elliptic curve cryptosystems may be more secure, because the analog of the discrete logarithm problem on elliptic curves is likely to be harder than the classical discrete logarithm problem, especially over GF(2'). We discuss the question of primitive points on an elliptic curve modulo p, and give a theorem on nonsmoothness of the order of the cyclic subgroup generated by a global point.

5,378 citations


"High-Radix Design of a Scalable Mod..." refers methods in this paper

  • ...1 Introduction Several applications, such as RSA algorithm, [14] Diffie-Hellman key exchange algorithm [5], Digital Signature Standard [12], and Elliptic curve cryptography [6,9] use modular multiplication and modular exponentiation....

    [...]

Journal ArticleDOI
TL;DR: A method for multiplying two integers modulo N while avoiding division by N, a representation of residue classes so as to speed modular multiplication without affecting the modular addition and subtraction algorithms.
Abstract: Let N > 1. We present a method for multiplying two integers (called N-residues) modulo N while avoiding division by N. N-residues are represented in a nonstandard way, so this method is useful only if several computations are done modulo one N. The addition and subtraction algorithms are unchanged. 1. Description. Some algorithms (1), (2), (4), (5) require extensive modular arith- metic. We propose a representation of residue classes so as to speed modular multiplication without affecting the modular addition and subtraction algorithms. Other recent algorithms for modular arithmetic appear in (3), (6). Fix N > 1. Define an A'-residue to be a residue class modulo N. Select a radix R coprime to N (possibly the machine word size or a power thereof) such that R > N and such that computations modulo R are inexpensive to process. Let R~l and N' be integers satisfying 0 N then return t - N else return t ■ To validate REDC, observe mN = TN'N = -Tmod R, so t is an integer. Also, tR = Tmod N so t = TR'X mod N. Thirdly, 0 < T + mN < RN + RN, so 0 < t < 2N. If R and N are large, then T + mN may exceed the largest double-precision value. One can circumvent this by adjusting m so -R < m < 0. Given two numbers x and y between 0 and N - 1 inclusive, let z = REDC(xy). Then z = (xy)R~x mod N, so (xR-l)(yR~x) = zRx mod N. Also, 0 < z < N, so z is the product of x and y in this representation. Other algorithms for operating on N-residues in this representation can be derived from the algorithms normally used. The addition algorithm is unchanged, since xR~x + yR~x = zR~x mod N if and only if x + y = z mod N. Also unchanged are

2,647 citations


"High-Radix Design of a Scalable Mod..." refers background in this paper

  • ...The Montgomery Multiplication (MM) algorithm [10] provides certain advantages in the implementation of modular multiplication....

    [...]

Journal ArticleDOI

1,286 citations


"High-Radix Design of a Scalable Mod..." refers methods in this paper

  • ...Each loop iteration (computational loop) scans k-bits of X (a radix-r digit Xi) and determines the value qY , according to Booth encoding [3]....

    [...]