A family of scalable FFT architectures and an implementation of 1024-point radix-2 FFT for real-time communications

doi:10.1109/ICCD.2008.4751880

Home
/
Papers
/
A family of scalable FFT architectures and an implementation of 1024-point radix-2 FFT for real-time communications

Proceedings Article•DOI•

A family of scalable FFT architectures and an implementation of 1024-point radix-2 FFT for real-time communications

Adnan Suleiman¹, Hani Saleh², Adel Hussein³, David Akopian³•Institutions (3)

Cirrus Logic¹, Intel², University of Texas at San Antonio³

01 Oct 2008-pp 321-327

TL;DR: A family of architectures for FFT implementation based on the decomposition of the perfect shuffle permutation is presented, which can be designed with variable number of processing elements, providing designers with a trade-off choice of speed vs. complexity.

read less

Abstract: The paper presents a family of architectures for FFT implementation based on the decomposition of the perfect shuffle permutation, which can be designed with variable number of processing elements. This provides designers with a trade-off choice of speed vs. complexity (cost and area.). A detailed case study is provided on the implementation of 1024-point FFT with 2 processing elements using 45 nm process technology, including area, timing, power and place-and-route results.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Book Chapter•DOI•

Towards efficient arithmetic for lattice-based cryptography on reconfigurable hardware

[...]

Thomas Pöppelmann¹, Tim Güneysu¹•Institutions (1)

Ruhr University Bochum¹

07 Oct 2012

TL;DR: This work makes a first step towards efficient FFT-based arithmetic for lattice-based cryptography and shows that the FFT can be implemented efficiently on reconfigurable hardware.

...read moreread less

Abstract: In recent years lattice-based cryptography has emerged as quantum secure and theoretically elegant alternative to classical cryptographic schemes (like ECC or RSA). In addition to that, lattices are a versatile tool and play an important role in the development of efficient fully or somewhat homomorphic encryption (SHE/FHE) schemes. In practice, ideal lattices defined in the polynomial ring ℤp[x]/〈xn+1〉 allow the reduction of the generally very large key sizes of lattice constructions. Another advantage of ideal lattices is that polynomial multiplication is a basic operation that has, in theory, only quasi-linear time complexity of ${\mathcal O}(n \log{n})$ in ℤp[x]/〈xn+1〉. However, few is known about the practical performance of the FFT in this specific application domain and whether it is really an alternative. In this work we make a first step towards efficient FFT-based arithmetic for lattice-based cryptography and show that the FFT can be implemented efficiently on reconfigurable hardware. We give instantiations of recently proposed parameter sets for homomorphic and public-key encryption. In a generic setting we are able to multiply polynomials with up to 4096 coefficients and a 17-bit prime in less than 0.5 milliseconds. For a parameter set of a SHE scheme (n=1024,p=1061093377) our implementation performs 9063 polynomial multiplications per second on a mid-range Spartan-6.

...read moreread less

157 citations

Proceedings Article•DOI•

Low-power application-specific processor for FFT computations

[...]

Teemu Pitkänen¹, Jarmo Takala¹•Institutions (1)

Tampere University of Technology¹

19 Apr 2009

TL;DR: Analysis of a processor architecture tailored for radix-4 and mixed-radix FFT algorithms shows that a programmable solution can possess energy-efficiency comparable to a fixed-function ASIC.

...read moreread less

Abstract: In this paper, we describe a processor architecture tailored for radix-4 and mixed-radix FFT algorithms, which have lower arithmetic complexity than radix-2 algorithms. The processor is based on transport triggered architecture and several optimizations have been used to improve the energy-efficiency. The processor has been synthesized on a 130nm standard cell technology and analysis show that a programmable solution can possess energy-efficiency comparable to a fixed-function ASIC.

...read moreread less

17 citations

Journal Article•DOI•

Low-Power Application-Specific Processor for FFT Computations

[...]

Teemu Pitkänen¹, Jarmo Takala¹•Institutions (1)

Tampere University of Technology¹

01 Apr 2011

TL;DR: A processor architecture tailored for radix-4 and mixed-radix FFT computations is described and experiments show that a programmable solution can possess energy-efficiency comparable to fixed-function ASICs.

...read moreread less

Abstract: In this paper, a processor architecture tailored for radix-4 and mixed-radix FFT computations is described. The processor has native support for power-of-two transform sizes. Several optimizations have been used to improve the energy-efficiency of the processor and experiments show that a programmable solution can possess energy-efficiency comparable to fixed-function ASICs.

...read moreread less

14 citations

Book•DOI•

Progress in Cryptology – LATINCRYPT 2012

[...]

Alejandro Hevia, Gregory Neven

01 Jan 2012

TL;DR: This paper shows that, by specializing the construction of Shallue and van de Woestijne to BN curves, one obtains an encoding function that can be implemented rather efficiently and securely, and that is well-distributed in the sense of Farashahi et al., so that one can easily build from it a hash function that is indifferentiable from a random oracle.

...read moreread less

Abstract: A number of recent works have considered the problem of constructing constant-time hash functions to various families of elliptic curves over finite fields. In the relevant literature, it has been occasionally asserted that constant-time hashing to certain special elliptic curves, in particular so-called BN elliptic curves, was an open problem. It turns out, however, that a suitably general encoding function was constructed by Shallue and van de Woestijne back in 2006. In this paper, we show that, by specializing the construction of Shallue and van de Woestijne to BN curves, one obtains an encoding function that can be implemented rather efficiently and securely, that reaches about 9/16ths of all points on the curve, and that is well-distributed in the sense of Farashahi et al., so that one can easily build from it a hash function that is indifferentiable from a random oracle.

...read moreread less

14 citations

Proceedings Article•DOI•

Efficient hardware implementation of scalable FFT using configurable Radix-4/2

[...]

Senthilkumar Ranganathan¹, Ravikumar Krishnan², H S Sriharsha•Institutions (2)

Institution of Engineers¹, KCG College of Technology²

06 Mar 2014

TL;DR: This paper demonstrates the FPGA implementation of FFT algorithm that is precisely designed to induce an efficient implementation of the parameters involving area and performance by configuring the size of F FT input points which is well suited for wireless and signal processing applications.

...read moreread less

Abstract: This paper demonstrates the FPGA implementation of FFT algorithm that is precisely designed to induce an efficient implementation of the parameters involving area and performance by configuring the size of FFT input points which is well suited for wireless and signal processing applications. An optimized architecture is demonstrated in this paper for computing FFT of length 8/16/32/64/128/512 and 1024 using Radix-4/Radix 2∗2 FFT in FPGA and is compared with Xilinx LogiCore™ FFT IP with configurable point size. It is found that proposed design is more efficient and effective in terms of area and performance while achieving the input system configurability. A novel Address Generator architecture has been proposed which facilitates for Complex Math Processor (CMP). This single generator helps in effectively carrying out the address mapping scheme. The occurrence of hardware overheads is minimized by using the multiplexor for complex arithmetic's. The entire RTL design is described using Verilog HDL and simulated using Xilinx ISim. This experimental result is tested on Spartan-6 XC6SLX4, which is the smallest device on Spartan 6 family and found that Xilinx FFT IP core over maps the available DSP48 slices. The result shows 538 LUT's, 847 Flip Flops, 3 DSP Slices, Maximum Frequency of 217 MHz. This is about 60% improvement in resource usage and 14% upgrade in the performance thus creating a low cost Configurable FFT Processor.

...read moreread less

5 citations

1
2
3
4
…

References

PDF

Open Access

More filters

Journal Article•DOI•

An algorithm for the machine calculation of complex Fourier series

[...]

J.W. Cooley, John W. Tukey

01 Apr 1965-Mathematics of Computation

TL;DR: Good generalized these methods and gave elegant algorithms for which one class of applications is the calculation of Fourier series, applicable to certain problems in which one must multiply an N-vector by an N X N matrix which can be factored into m sparse matrices.

...read moreread less

Abstract: An efficient method for the calculation of the interactions of a 2' factorial ex- periment was introduced by Yates and is widely known by his name. The generaliza- tion to 3' was given by Box et al. (1). Good (2) generalized these methods and gave elegant algorithms for which one class of applications is the calculation of Fourier series. In their full generality, Good's methods are applicable to certain problems in which one must multiply an N-vector by an N X N matrix which can be factored into m sparse matrices, where m is proportional to log N. This results inma procedure requiring a number of operations proportional to N log N rather than N2. These methods are applied here to the calculation of complex Fourier series. They are useful in situations where the number of data points is, or can be chosen to be, a highly composite number. The algorithm is here derived and presented in a rather different form. Attention is given to the choice of N. It is also shown how special advantage can be obtained in the use of a binary computer with N = 2' and how the entire calculation can be performed within the array of N data storage locations used for the given Fourier coefficients. Consider the problem of calculating the complex Fourier series N-1 (1) X(j) = EA(k)-Wjk, j = 0 1, * ,N- 1, k=0

...read moreread less

11,795 citations

"A family of scalable FFT architectu..." refers methods in this paper

...The Fast Fourier Transform (FFT) is a conventional method for an accelerated computation of the Discrete Fourier Transform (DFT) [1], which has been used in many applications such as spectrum estimation, fast convolution and correlation, signal modulation, etc....
[...]
...Also, algorithms are known in two types DIT, decimation in time, where complex multiplication occurs after the two-point DFT; and DIF, decimation in frequency, where complex multiplication occurs before the two-point DFT....
[...]
...In [1] Radix-2 FFT of N points –where N is integer power of 2-requires N log2N complex operations compared to N2 of direct DFT computation....
[...]
...Many of the FFT algorithms relate to the " butterfly structure " presented first by Cooley and Tukey [1] where separate processing element (PE) is assigned for each node of the FFT flow....
[...]

Book•

Theory and application of digital signal processing

[...]

Lawrence R. Rabiner, Ben Gold, C. K. Yuen¹•Institutions (1)

Australian National University¹

01 Jan 1975

TL;DR: Feyman and Wing as discussed by the authors introduced the simplicity of the invariant imbedding method to tackle various problems of interest to engineers, physicists, applied mathematicians, and numerical analysts.

...read moreread less

Abstract: sprightly style and is interesting from cover to cover. The comments, critiques, and summaries that accompany the chapters are very helpful in crystalizing the ideas and answering questions that may arise, particularly to the self-learner. The transparency in the presentation of the material in the book equips the reader to proceed quickly to a wealth of problems included at the end of each chapter. These problems ranging from elementary to research-level are very valuable in that a solid working knowledge of the invariant imbedding techniques is acquired as well as good insight in attacking problems in various applied areas. Furthermore, a useful selection of references is given at the end of each chapter. This book may not appeal to those mathematicians who are interested primarily in the sophistication of mathematical theory, because the authors have deliberately avoided all pseudo-sophistication in attaining transparency of exposition. Precisely for the same reason the majority of the intended readers who are applications-oriented and are eager to use the techniques quickly in their own fields will welcome and appreciate the efforts put into writing this book. From a purely mathematical point of view, some of the invariant imbedding results may be considered to be generalizations of the classical theory of first-order partial differential equations, and a part of the analysis of invariant imbedding is still at a somewhat heuristic stage despite successes in many computational applications. However, those who are concerned with mathematical rigor will find opportunities to explore the foundations of the invariant imbedding method. In conclusion, let me quote the following: "What is the best method to obtain the solution to a problem'? The answer is, any way that works." (Richard P. Feyman, Engineering and Science, March 1965, Vol. XXVIII, no. 6, p. 9.) In this well-written book, Bellman and Wing have indeed accomplished the task of introducing the simplicity of the invariant imbedding method to tackle various problems of interest to engineers, physicists, applied mathematicians, and numerical analysts.

...read moreread less

3,249 citations

Journal Article•DOI•

Pipeline and Parallel-Pipeline FFT Processors for VLSI Implementations

[...]

Wold¹, Despain¹•Institutions (1)

University of California, Berkeley¹

01 May 1984-IEEE Transactions on Computers

TL;DR: VLSI implementations have constraints which differ from those of discrete implementations, requiring another look at some of the typical FFT'algorithms in the light of these constraints.

...read moreread less

Abstract: In some signal processing applications, it is desirable to build very high performance fast Fourier transform (FFT) processors. To meet the performance requirements, these processors are typically highly pipelined. Until the advent of VLSI, it was not possible to build a single chip which could be used to construct pipeline FFT processors of a reasonable size. However, VLSI implementations have constraints which differ from those of discrete implementations, requiring another look at some of the typical FFT'algorithms in the light of these constraints.

...read moreread less

327 citations

"A family of scalable FFT architectu..." refers background in this paper

...A variety of pipeline FFTs have been implemented [6]-[9]....
[...]

Journal Article•DOI•

A low-power, high-performance, 1024-point FFT processor

[...]

Bevan M. Baas¹•Institutions (1)

Stanford University¹

01 Mar 1999-IEEE Journal of Solid-state Circuits

TL;DR: This paper presents an energy-efficient, single-chip, 1024-point fast Fourier transform (FFT) processor, which has been fabricated in a standard 0.7 /spl mu/m CMOS process and is fully functional on first-pass silicon.

...read moreread less

Abstract: This paper presents an energy-efficient, single-chip, 1024-point fast Fourier transform (FFT) processor. The 460000-transistor design has been fabricated in a standard 0.7 /spl mu/m (L/sub poly/=0.6 /spl mu/m) CMOS process and is fully functional on first-pass silicon. At a supply voltage of 1.1 V, it calculates a 1024-point complex FFT in 330 /spl mu/s while consuming 9.5 mW, resulting in an adjusted energy efficiency more than 16 times greater than the previously most efficient known FFT processor. At 3.3 V, it operates at 173 MHz-which is a clock rate 2.6 times greater than the previously fastest rate.

...read moreread less

319 citations

Proceedings Article•DOI•

Design and implementation of a 1024-point pipeline FFT processor

[...]

Shousheng He¹, M. Torkelson¹•Institutions (1)

Lund University¹

11 May 1998

TL;DR: By exploiting the spatial regularity of the new algorithm, minimal requirement for both dominant components in VLSI implementation has been achieved: only 4 complex multipliers and 1024 complex-word data memory for the pipelined 1K FFT processor.

...read moreread less

Abstract: The design and implementation of a 1024-point pipeline FFT processor is presented. The architecture is based on a new form of FFT, the radix-2/sup 2/ algorithm. By exploiting the spatial regularity of the new algorithm, minimal requirement for both dominant components in VLSI implementation has been achieved: only 4 complex multipliers and 1024 complex-word data memory for the pipelined 1K FFT processor. The chip has been implement in 0.5 /spl mu/m CMOS technology and takes an area of 40 mm/sup 2/. With 3.3 V power supply, it can compute 2/sup n/, n=0, 1, ..., 10 complex point forward and inverse FFT in real time with up to 30 MHz sampling frequency. The SQNR is above 50 dB for white noise input.

...read moreread less

243 citations