scispace - formally typeset
Search or ask a question
Posted Content

Generalizations of the Karatsuba Algorithm for Efficient Implementations.

TL;DR: In this paper, the authors generalize the classical Karatsuba Algorithm (KA) for polynomial multiplication to polynomials of arbitrary degree and recursive use, and provide detailed information on how to use the KA with least cost.
Abstract: In this work we generalize the classical Karatsuba Algorithm (KA) for polynomial multiplication to (i) polynomials of arbitrary degree and (ii) recursive use. We determine exact complexity expressions for the KA and focus on how to use it with the least number of operations. We develop a rule for the optimum order of steps if the KA is used recursively. We show how the usage of dummy coefficients may improve performance. Finally we provide detailed information on how to use the KA with least cost, and also provide tables that describe the best possible usage of the KA for polynomials up to a degree of 127. Our results are especially useful for efficient implementations of cryptographic and coding schemes over fixed-size fields like GF (p).

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
P.L. Montgomery1
TL;DR: This work presents division-free formulae, which multiply two 5-term polynomials with 13 scalar multiplications, two 6- term polynmials with 17 scalarmultiplications, and two 7-termPolynomial with 22 scalar multiplier, and describes their application to elliptic curve arithmetic over binary fields.
Abstract: The Karatsuba-Ofman algorithm starts with a way to multiply two 2-term (i.e., linear) polynomials using three scalar multiplications. There is also a way to multiply two 3-term (i.e., quadratic) polynomials using six scalar multiplications. These are used within recursive constructions to multiply two higher-degree polynomials in subquadratic time. We present division-free formulae, which multiply two 5-term polynomials with 13 scalar multiplications, two 6-term polynomials with 17 scalar multiplications, and two 7-term polynomials with 22 scalar multiplications. These formulae may be mixed with the 2-term and 3-term formulae within recursive constructions, leading to improved bounds for many other degrees. Using only the 6-term formula leads to better asymptotic performance than standard Karatsuba. The new formulae work in any characteristic, but simplify in characteristic 2. We describe their application to elliptic curve arithmetic over binary fields. We include some timing data.

206 citations

Book ChapterDOI
10 Aug 2008
TL;DR: A novel architecture and algorithms for performing ECC arithmetic are described and it is shown that ECC on Xilinx's Virtex-4 SX55 FPGA can be performed at a rate of more than 37,000 point multiplications per second.
Abstract: Elliptic Curve Cryptosystems (ECC) have gained increasing acceptance in practice due to their significantly smaller bit size of the operands compared to other public-key cryptosystems. Since their computational complexity is often lower than in the case of RSA or discrete logarithm schemes, ECC are often chosen for high performance public-key applications. However, despite a wealth of research regarding high-speed software and high-speed FPGA implementation of ECC since the mid 1990s, providing truly high-performance ECC on readily available (i.e., non-ASIC) platforms remains an open challenge. This holds especially for ECC over prime fields, which are often preferred over binary fields due to standards in Europe and the US. This work presents a new architecture for an FPGA-based ultra high performance ECC implementation over prime fields. Our architecture makes intensive use of the DSP blocks in modern FPGAs, which are embedded arithmetic units actually intended to accelerate digital signal processing algorithms. We describe a novel architecture and algorithms for performing ECC arithmetic and describe the actual implementation of standard compliant ECC based on the NIST primes P-224 and P-256. We show that ECC on Xilinx's Virtex-4 SX55 FPGA can be performed at a rate of more than 37,000 point multiplications per second. Our architecture outperforms all single-chip hardware implementations over prime fields in the open literature by a wide margin.

179 citations

Journal ArticleDOI
TL;DR: A high performance architecture of elliptic curve scalar multiplication based on the Montgomery ladder method over finite field GF(2m) is proposed and a pseudo-pipelined word serial finite field multiplier with word size w, suitable for the scalar multiplied, is developed.
Abstract: A high performance architecture of elliptic curve scalar multiplication based on the Montgomery ladder method over finite field GF(2m) is proposed. A pseudo-pipelined word serial finite field multiplier with word size w, suitable for the scalar multiplication is also developed. Implemented in hardware, this system performs a scalar multiplication in approximately 6lceilm/wrceil(m-1) clock cycles and the gate delay in the critical path is equal to TAND + lceillog2(w/k)rceilTXOR, where TAND and TXOR are delays due to two-input AND and XOR gates respectively and 1 les k Lt w is used to shorten the critical path.

166 citations

Journal ArticleDOI
TL;DR: This work presents a new scheme for subquadratic space complexity parallel multiplication in GF(2n) using the shifted polynomial basis using the Toeplitz matrix-vector products and coordinate transformation techniques, and to the best of the authors' knowledge, this is the first time that sub quadraticspace complexity parallel multipliers are proposed for dual, weakly dual, and triangular bases.
Abstract: Based on Toeplitz matrix-vector products and coordinate transformation techniques, we present a new scheme for subquadratic space complexity parallel multiplication in GF(2n) using the shifted polynomial basis. Both the space complexity and the asymptotic gate delay of the proposed multiplier are better than those of the best existing subquadratic space complexity parallel multipliers. For example, with n being a power of 2, the space complexity is about 8 percent better, while the asymptotic gate delay is about 33 percent better, respectively. Another advantage of the proposed matrix-vector product approach is that it can also be used to design subquadratic space complexity polynomial, dual, weakly dual, and triangular basis parallel multipliers. To the best of our knowledge, this is the first time that subquadratic space complexity parallel multipliers are proposed for dual, weakly dual, and triangular bases. A recursive design algorithm is also proposed for efficient construction of the proposed subquadratic space complexity multipliers. This design algorithm can be modified for the construction of most of the subquadratic space complexity multipliers previously reported in the literature

141 citations

Book ChapterDOI
19 Aug 2009
TL;DR: The paper's field-arithmetic techniques can be applied in much more generality but have a particularly efficient interaction with the completeness of addition formulas for binary Edwards curves.
Abstract: This paper sets new software speed records for high-security Diffie-Hellman computations, specifically 251-bit elliptic-curve variable-base-point scalar multiplication. In one second of computation on a $200 Core 2 Quad Q6600 CPU, this paper's software performs 30000 251-bit scalar multiplications on the binary Edwards curve d(x + x 2 + y + y 2) = (x + x 2)(y + y 2) over the field ${\bf F}_2[t]/(t^{251}+t^7+t^4+t^2+1)$ where d = t 57 + t 54 + t 44 + 1. The paper's field-arithmetic techniques can be applied in much more generality but have a particularly efficient interaction with the completeness of addition formulas for binary Edwards curves.

137 citations

References
More filters
Book
01 Jan 1968
TL;DR: The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid.
Abstract: A fuel pin hold-down and spacing apparatus for use in nuclear reactors is disclosed. Fuel pins forming a hexagonal array are spaced apart from each other and held-down at their lower end, securely attached at two places along their length to one of a plurality of vertically disposed parallel plates arranged in horizontally spaced rows. These plates are in turn spaced apart from each other and held together by a combination of spacing and fastening means. The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid. This apparatus is particularly useful in connection with liquid cooled reactors such as liquid metal cooled fast breeder reactors.

17,939 citations

Book
01 Jan 1981
TL;DR: This book explains the development of the Fast Fourier Transform Algorithm and its applications in Number Theory and Polynomial Algebra, as well as some examples of its application in Quantization Effects.
Abstract: 1 Introduction.- 1.1 Introductory Remarks.- 1.2 Notations.- 1.3 The Structure of the Book.- 2 Elements of Number Theory and Polynomial Algebra.- 2.1 Elementary Number Theory.- 2.1.1 Divisibility of Integers.- 2.1.2 Congruences and Residues.- 2.1.3 Primitive Roots.- 2.1.4 Quadratic Residues.- 2.1.5 Mersenne and Fermat Numbers.- 2.2 Polynomial Algebra.- 2.2.1 Groups.- 2.2.2 Rings and Fields.- 2.2.3 Residue Polynomials.- 2.2.4 Convolution and Polynomial Product Algorithms in Polynomial Algebra.- 3 Fast Convolution Algorithms.- 3.1 Digital Filtering Using Cyclic Convolutions.- 3.1.1 Overlap-Add Algorithm.- 3.1.2 Overlap-Save Algorithm.- 3.2 Computation of Short Convolutions and Polynomial Products.- 3.2.1 Computation of Short Convolutions by the Chinese Remainder Theorem.- 3.2.2 Multiplications Modulo Cyclotomic Polynomials.- 3.2.3 Matrix Exchange Algorithm.- 3.3 Computation of Large Convolutions by Nesting of Small Convolutions.- 3.3.1 The Agarwal-Cooley Algorithm.- 3.3.2 The Split Nesting Algorithm.- 3.3.3 Complex Convolutions.- 3.3.4 Optimum Block Length for Digital Filters.- 3.4 Digital Filtering by Multidimensional Techniques.- 3.5 Computation of Convolutions by Recursive Nesting of Polynomials.- 3.6 Distributed Arithmetic.- 3.7 Short Convolution and Polynomial Product Algorithms.- 3.7.1 Short Circular Convolution Algorithms.- 3.7.2 Short Polynomial Product Algorithms.- 3.7.3 Short Aperiodic Convolution Algorithms.- 4 The Fast Fourier Transform.- 4.1 The Discrete Fourier Transform.- 4.1.1 Properties of the DFT.- 4.1.2 DFTs of Real Sequences.- 4.1.3 DFTs of Odd and Even Sequences.- 4.2 The Fast Fourier Transform Algorithm.- 4.2.1 The Radix-2 FFT Algorithm.- 4.2.2 The Radix-4 FFT Algorithm.- 4.2.3 Implementation of FFT Algorithms.- 4.2.4 Quantization Effects in the FFT.- 4.3 The Rader-Brenner FFT.- 4.4 Multidimensional FFTs.- 4.5 The Bruun Algorithm.- 4.6 FFT Computation of Convolutions.- 5 Linear Filtering Computation of Discrete Fourier Transforms.- 5.1 The Chirp z-Transform Algorithm.- 5.1.1 Real Time Computation of Convolutions and DFTs Using the Chirp z-Transform.- 5.1.2 Recursive Computation of the Chirp z-Transform.- 5.1.3 Factorizations in the Chirp Filter.- 5.2 Rader's Algorithm.- 5.2.1 Composite Algorithms.- 5.2.2 Polynomial Formulation of Rader's Algorithm.- 5.2.3 Short DFT Algorithms.- 5.3 The Prime Factor FFT.- 5.3.1 Multidimensional Mapping of One-Dimensional DFTs.- 5.3.2 The Prime Factor Algorithm.- 5.3.3 The Split Prime Factor Algorithm.- 5.4 The Winograd Fourier Transform Algorithm (WFTA).- 5.4.1 Derivation of the Algorithm.- 5.4.2 Hybrid Algorithms.- 5.4.3 Split Nesting Algorithms.- 5.4.4 Multidimensional DFTs.- 5.4.5 Programming and Quantization Noise Issues.- 5.5 Short DFT Algorithms.- 5.5.1 2-Point DFT.- 5.5.2 3-Point DFT.- 5.5.3 4-Point DFT.- 5.5.4 5-Point DFT.- 5.5.5 7-Point DFT.- 5.5.6 8-Point DFT.- 5.5.7 9-Point DFT.- 5.5.8 16-Point DFT.- 6 Polynomial Transforms.- 6.1 Introduction to Polynomial Transforms.- 6.2 General Definition of Polynomial Transforms.- 6.2.1 Polynomial Transforms with Roots in a Field of Polynomials.- 6.2.2 Polynomial Transforms with Composite Roots.- 6.3 Computation of Polynomial Transforms and Reductions.- 6.4 Two-Dimensional Filtering Using Polynomial Transforms.- 6.4.1 Two-Dimensional Convolutions Evaluated by Polynomial Transforms and Polynomial Product Algorithms.- 6.4.2 Example of a Two-Dimensional Convolution Computed by Polynomial Transforms.- 6.4.3 Nesting Algorithms.- 6.4.4 Comparison with Conventional Convolution Algorithms.- 6.5 Polynomial Transforms Defined in Modified Rings.- 6.6 Complex Convolutions.- 6.7 Multidimensional Polynomial Transforms.- 7 Computation of Discrete Fourier Transforms by Polynomial Transforms.- 7.1 Computation of Multidimensional DFTs by Polynomial Transforms.- 7.1.1 The Reduced DFT Algorithm.- 7.1.2 General Definition of the Algorithm.- 7.1.3 Multidimensional DFTs.- 7.1.4 Nesting and Prime Factor Algorithms.- 7.1.5 DFT Computation Using Polynomial Transforms Defined in Modified Rings of Polynomials.- 7.2 DFTs Evaluated by Multidimensional Correlations and Polynomial Transforms.- 7.2.1 Derivation of the Algorithm.- 7.2.2 Combination of the Two Polynomial Transform Methods.- 7.3 Comparison with the Conventional FFT.- 7.4 Odd DFT Algorithms.- 7.4.1 Reduced DFT Algorithm. N = 4.- 7.4.2 Reduced DFT Algorithm. N = 8.- 7.4.3 Reduced DFT Algorithm. N = 9.- 7.4.4 Reduced DFT Algorithm. N = 16.- 8 Number Theoretic Transforms.- 8.1 Definition of the Number Theoretic Transforms.- 8.1.1 General Properties of NTTs.- 8.2 Mersenne Transforms.- 8.2.1 Definition of Mersenne Transforms.- 8.2.2 Arithmetic Modulo Mersenne Numbers.- 8.2.3 Illustrative Example.- 8.3 Fermat Number Transforms.- 8.3.1 Definition of Fermat Number Transforms.- 8.3.2 Arithmetic Modulo Fermat Numbers.- 8.3.3 Computation of Complex Convolutions by FNTs.- 8.4 Word Length and Transform Length Limitations.- 8.5 Pseudo Transforms.- 8.5.1 Pseudo Mersenne Transforms.- 8.5.2 Pseudo Fermat Number Transforms.- 8.6 Complex NTTs.- 8.7 Comparison with the FFT.- Appendix A Relationship Between DFT and Conyolution Polynomial Transform Algorithms.- A.1 Computation of Multidimensional DFT's by the Inverse Polynomial Transform Algorithm.- A.1.1 The Inverse Polynomial Transform Algorithm.- A.1.2 Complex Polynomial Transform Algorithms.- A.1.3 Round-off Error Analysis.- A.2 Computation of Multidimensional Convolutions by a Combination of the Direct and Inverse Polynomial Transform Methods.- A.2.1 Computation of Convolutions by DFT Polynomial Transform Algorithms.- A.2.2 Convolution Algorithms Based on Polynomial Transforms and Permutations.- A.3 Computation of Multidimensional Discrete Cosine Transforms by Polynomial Transforms.- A.3.1 Computation of Direct Multidimensional DCT's.- A.3.2 Computation of Inverse Multidimensional DCT's.- Appendix B Short Polynomial Product Algorithms.- Problems.- References.

867 citations

Journal ArticleDOI
Shmuel Winograd1
TL;DR: The system of bilinear forms which are defined by a product of two polynomials modulo a third is considered, it is shown that the number of multiplications depend on how the field of constants used in the algorithm splits P.
Abstract: In this paper we consider the system of bilinear forms which are defined by a product of two polynomials modulo a thirdP We show that the number of multiplications depend on how the field of constants used in the algorithm splitsP If $$P = \prod olimits_{i = 1}^k {P_i^{li} } $$ then 2 ·deg(P) − k multiplications are needed (We assume thatP i is irreducible)

154 citations