FFT Radix-2 and Radix-4 FPGA Acceleration Techniques Using HLS and HDL for Digital Communication Systems

doi:10.1109/IMCET.2018.8603064

Home
/
Papers
/
FFT Radix-2 and Radix-4 FPGA Acceleration Techniques Using HLS and HDL for Digital Communication Systems

Proceedings Article•DOI•

FFT Radix-2 and Radix-4 FPGA Acceleration Techniques Using HLS and HDL for Digital Communication Systems

Ghattas Akkad¹, Ali Mansour¹, Bachar El-Hassan², Frederic Le Roy¹, Mohamad Najem³ - Show less +1 more•Institutions (3)

Centre national de la recherche scientifique¹, Lebanese University², Lebanese International University³

01 Nov 2018-

TL;DR: This study focuses on communication systems incorporating filter-based-multicarrier modulations (FBMC), a promising candidate for the 5G technology and implemented and tested various combinations using finite precision, HLS tools and HDL while prompting parallelization, pipelining and hardware reuse architectures.

read less

Abstract: Fast Fourier Transform (FFT) is generally implemented on reconfigurable hardware in several signal processing or digital communication applications. It can be considered the most time and resource consuming operations due to the need of complex operations. The main of this manuscript is to investigate the contribution of High Level Synthesis (HLS) techniques on the implementation of real time FFT algorithms using field programmable gate arrays (FPGAs). In particular, this study focuses on communication systems incorporating filter-based-multicarrier modulations (FBMC), a promising candidate for the 5G technology. In order to evaluate the contribution of HLS, we implemented and tested various combinations such as: 8 and 16 points radix-2 and radix-4 FFT using finite precision, HLS tools and HDL while prompting parallelization, pipelining and hardware reuse architectures.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Low Complexity Robust Adaptive Beamformer Based On Parallel RLMS and Kalman RLMS

[...]

Ghattas Akkad, Ali Mansour, Bachar El-Hassan¹, Jalal Abdulsayed Srar, Mohamad Najem², Frederic Le Roy - Show less +2 more•Institutions (2)

Lebanese University¹, Lebanese International University²

01 Sep 2019

TL;DR: An enhanced, low complexity parallel version of the cascade RLMS is presented by eliminating the need for computing the array image vector cascading stage, and a new Kalman based parallel RLMS (RKLMS) method is proposed, where the LMS stage is replaced by a Kalman implementation of the classical LMS, and compared under low Signal to Interference plus Noise ratios (SINR).

...read moreread less

Abstract: To ease spectral congestion and enhance frequency reuse, researchers are targeting smart antenna systems using spatial multiplexing and adaptive signal processing techniques. Moreover, the accuracy and efficiency of such systems is highly dependent on the adaptive algorithms they employ. A popular, adaptive beamforming algorithm, widely used in smart antennas, is the Recursive Least Square (RLS) algorithm. While, the classical RLS implementation achieves high convergence, it still suffers from its inability to track the target of interest. Recently, a new adaptive algorithm called Recursive Least Square - Least Mean Square (RLMS) which employs a RLS stage followed by a Least Mean Square (LMS) algorithm stage and separated by an estimate of the array image vector, i.e. steering vector, has been proposed. RLMS outperforms previous RLS and LMS variants, with superior convergence and tracking capabilities, at the cost of a moderate increase in computational complexity. In this paper, an enhanced, low complexity parallel version of the cascade RLMS is presented by eliminating the need for computing the array image vector cascading stage. Hence, For an antenna of N elements our strategy can reduce the complexity of the system by 20N multiplications, 6N additions and 2N divisions. Moreover, a new Kalman based parallel RLMS (RKLMS) method is also proposed, where the LMS stage is replaced by a Kalman implementation of the classical LMS, and compared under low Signal to Interference plus Noise ratios (SINR). Simulation results show identical performance for the parallel RLMS, cascaded RLMS at 10dB and superior performance and robustness for the RKLMS on low SINR cases up to -10dB.

...read moreread less

10 citations

Cites background from "FFT Radix-2 and Radix-4 FPGA Accele..."

...Therefore it becomes more suitable for a hardware implementation [17], [18]....
[...]

Journal Article•DOI•

A Comparative Study between HLS and HDL on SoC for Image Processing Applications.

[...]

Roberto Millon¹, Emmanuel Frati¹, Enzo Rucci²•Institutions (2)

National University of Chilecito¹, National University of La Plata²

15 Dec 2020-arXiv: Hardware Architecture

TL;DR: A comparative study between HLS and HDL for FPGA, using a Sobel filter as a case study in the image processing field shows that the HDL implementation is slightly better than the HLS version considering resource usage and response time.

...read moreread less

Abstract: The increasing complexity in today's systems and the limited market times demand new development tools for FPGA. Currently, in addition to traditional hardware description languages (HDLs), there are high-level synthesis (HLS) tools that increase the abstraction level in system development. Despite the greater simplicity of design and testing, HLS has some drawbacks in describing harware. This paper presents a comparative study between HLS and HDL for FPGA, using a Sobel filter as a case study in the image processing field. The results show that the HDL implementation is slightly better than the HLS version considering resource usage and response time. However, the programming effort required in the HDL solution is significantly larger than in the HLS counterpart.

...read moreread less

6 citations

Cites result from "FFT Radix-2 and Radix-4 FPGA Accele..."

...Conversely, other studies observed better performances in one of the two implementations, either HLS [23]–[27] or HDL [28], [29]....
[...]

Journal Article•DOI•

Hardware chip performance analysis of different FFT architecture

[...]

Amit Kumar¹, Adesh Kumar², Aakanksha Devrari²•Institutions (2)

Uttarakhand Technical University¹, University of Petroleum and Energy Studies²

03 Jul 2021-International Journal of Electronics

TL;DR: The hardware chip performance analysis of the variable length FFT processor architectures on Field Programmable Gate Array (FPGA) platform using VHDL programming in which FFT length varies from 8 point to 65,536 point is focused on.

...read moreread less

Abstract: The Fast Fourier Transform (FFT) is one of the most important algorithm used in digital signal processing (DSP) and digital communication applications to compute fast operations. FFT and IFFT is wi...

...read moreread less

4 citations

Cites methods from "FFT Radix-2 and Radix-4 FPGA Accele..."

...Akkad et al. (2018) presented the performance of real time FFT algorithms High Level Synthesis (HLS) environment....
[...]

Journal Article•DOI•

Design optimisation of multiplier-free parallel pipelined FFT on field programmable gate array

[...]

Prasanna Kumar Godi, Battula Tirumala Krishna, Pushpa Kotipalli

01 Oct 2020-Iet Circuits Devices & Systems

TL;DR: Radix-2 decimation in frequency (R2DIF) method is designed to execute an efficient FFT architecture and outperforms conventional methods in terms of less usage power and high speed.

...read moreread less

Abstract: Fast Fourier transform (FFT) is utilised to minimise the complexity of discrete Fourier transform by converting signals from frequency domain to time domain and conversely. Digital signal processing systems like image processing, general filtering, sonar, spread-spectrum communications and convolutions use this FFT operations. Radix-2 decimation in frequency (R2DIF) method is designed to execute an efficient FFT architecture in this study. Each and every state of the FFT stores the input and output the data using the R2DIF method. Also, the complex twiddle factors in FFT are replaced by the proposed uniform Montgomery algorithm. This technique simply performs the shift-add method instead of the multiplication process which also enhances the convergence of the calculation. So, the FFT implementation is done with the help of the proposed method which reduces the usage of chips in the process. Based on this approach, it performs the operation of FFT from 16 points to 1024 points and the performance of this proposed method is compared with existing approaches. Moreover, it does not require expensive dedicated functional blocks and uses only distributed logic resources. The simulation is carried out by the Xilinx platform using Verilog coding. The proposed design outperforms conventional methods in terms of less usage power and high speed.

...read moreread less

3 citations

Proceedings Article•DOI•

Hardware Architecture For A Shift-Based Parallel Odd-Even Transposition Sorting Network

[...]

Rafic Ayoubi¹, Samer Istambouli¹, Abdel-Wahed Abbas¹, Ghattas Akkad¹•Institutions (1)

University of Balamand¹

03 Jul 2019

TL;DR: An optimized hardware architecture for a parallel Odd-Even transposition sorting network, on field programmable gate array (FPGA) based embedded systems is proposed, which results in increasing overall performance by minimizing hardware resource utilization, increasing the operating frequency and reducing complexity.

...read moreread less

Abstract: Sorting is one of the most frequently executed routines on modern computers. Such algorithms are classically implemented as software programs and can contribute significantly to the overall execution time of a process. In this respect, implementing sorting algorithms in hardware can dramatically increase the overall performance of the applications embodying them. This paper proposes an optimized hardware architecture for a parallel Odd-Even transposition sorting network, on field programmable gate array (FPGA) based embedded systems. This implementation introduces a modification of the classical Odd-Even Transposition sorting algorithm. This modification is a shift-based approach offering high flexibility for general purpose applications. The proposed architecture results in increasing overall performance by minimizing hardware resource utilization, increasing the operating frequency and reducing complexity. Simulation and synthesis results demonstrates that the proposed architecture is minimal in size, can operate on odd and even length arrays, capable of sorting arrays of length larger than two times the number of available processors, and can begin the sorting process at data input.

...read moreread less

2 citations

Cites background or methods from "FFT Radix-2 and Radix-4 FPGA Accele..."

...p is expanded with additional input elements to form the vector m=[8,12,4,15,2,11,6,3,5,14,16,10,1,9,13,7,12,8,10,9,13,11,15,14] of 24 elements....
[...]
...The classical Odd-Even sorting simulation is conducted on an array p of N = 16 elements where p = [8, 12, 4, 15, 2, 11, 6, 3, 5, 14, 16, 10, 1, 9, 13, 7]....
[...]

References

PDF

Open Access

More filters

Book•

Multi-Carrier Digital Communications: Theory and Applications of OFDM

[...]

Ahmad Bahai, Burton R. Saltzberg

30 Apr 2013

TL;DR: This book offers a unified presentation of OFDM theory and high speed and wireless applications, in particular, ADSL, wireless LAN, and digital broadcasting technologies are explained.

...read moreread less

Abstract: From the Publisher: Multi-carrier modulation, in particular orthogonal frequency division multiplexing (OFDM), has been successfully applied to a wide variety of digital communications applications for several years. Although OFDM has been chosen as the physical layer standard for a diversity of important systems, the theory, algorithms, and implementation techniques remain subjects of current interest. This book is intended to be a concise summary of the present state of the art of the theory and practice of OFDM technology. This book offers a unified presentation of OFDM theory and high speed and wireless applications. In particular, ADSL, wireless LAN, and digital broadcasting technologies are explained. It is hoped that this book will prove valuable both to developers of such systems, and to researchers and graduate students involved in analysis of digital communications, and will remain a valuable summary of the technology, providing an understanding of new advances as well as the present core technology.

...read moreread less

755 citations

"FFT Radix-2 and Radix-4 FPGA Accele..." refers background in this paper

...OFDM is a multi-carrier modulation that increases spectral efficiency with orthogonal carriers [1] [2]....
[...]

Journal Article•DOI•

A Survey and Evaluation of FPGA High-Level Synthesis Tools

[...]

Razvan Nane¹, Vlad-Mihai Sima¹, Christian Pilato², Jongsok Choi³, Blair Fort³, Andrew Canis³, Yu Ting Chen³, Hsuan Hsiao³, Stephen J. Brown³, Fabrizio Ferrandi², Jason H. Anderson³, Koen Bertels¹ - Show less +8 more•Institutions (3)

Delft University of Technology¹, Polytechnic University of Milan², University of Toronto³

01 Oct 2016-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: This work uses a first-published methodology to compare one commercial and three academic tools on a common set of C benchmarks, aiming at performing an in-depth evaluation in terms of performance and the use of resources.

...read moreread less

Abstract: High-level synthesis (HLS) is increasingly popular for the design of high-performance and energy-efficient heterogeneous systems, shortening time-to-market and addressing today’s system complexity. HLS allows designers to work at a higher-level of abstraction by using a software program to specify the hardware functionality. Additionally, HLS is particularly interesting for designing field-programmable gate array circuits, where hardware implementations can be easily refined and replaced in the target device. Recent years have seen much activity in the HLS research community, with a plethora of HLS tool offerings, from both industry and academia. All these tools may have different input languages, perform different internal optimizations, and produce results of different quality, even for the very same input description. Hence, it is challenging to compare their performance and understand which is the best for the hardware to be implemented. We present a comprehensive analysis of recent HLS tools, as well as overview the areas of active interest in the HLS research community. We also present a first-published methodology to evaluate different HLS tools. We use our methodology to compare one commercial and three academic tools on a common set of C benchmarks, aiming at performing an in-depth evaluation in terms of performance and the use of resources.

...read moreread less

433 citations

Proceedings Article•DOI•

Design of an efficient variable-length FFT processor

[...]

Chung-Ping Hung¹, Sau-Gee Chen¹, Kun-Lung Chen¹•Institutions (1)

National Chiao Tung University¹

23 May 2004

TL;DR: In this paper, an efficient variable-length FFT processor architecture suitable for multi-mode and multi-standard OFDM communication systems is proposed, based on radix-2/sup 2/ DIF FFT algorithm and also supports non-power-of-4 FFT computation.

...read moreread less

Abstract: In this paper, we propose an efficient variable-length FFT processor architecture suitable for multi-mode and multi-standard OFDM communication systems. The FFT processor is based on radix-2/sup 2/ DIF FFT algorithm and also supports non-power-of-4 FFT computation. The design contains an efficient processing element (PE), which can execute radix-2/sup 2/ butterfly (BF) operations, as well as radix-2 BF operations. Moreover, in order to achieve high-performance variable-length FFT operations and data accesses, an efficient variable-length address generator and twiddle factor generator are designed. The design has the merits of low complexity and high speed performance. The designs consider seven different FFT lengths including 64, 256, 512, 1024, 2048, 4096, and 8192 points, which cover all the required FFT lengths by 802.11a, 802.16a, DAB, DVB-T, VDSL and ADSL.

...read moreread less

60 citations

"FFT Radix-2 and Radix-4 FPGA Accele..." refers background or methods in this paper

...In [8], the authors propose the implementation of the FFT radix-4 [8]....
[...]
...Many hardware optimization techniques for an efficient FFT computation on FPGA have been proposed in [7-12]....
[...]

Journal Article•DOI•

A New Representation of FFT Algorithms Using Triangular Matrices

[...]

Mario Garrido¹•Institutions (1)

Linköping University¹

26 Aug 2016-IEEE Transactions on Circuits and Systems

TL;DR: The triangular matrix representation is an excellent alternative to represent FFT algorithms and it opens new possibilities in the exploration and understanding of the FFT.

...read moreread less

Abstract: In this paper we propose a new representation for FFT algorithms called the triangular matrix representation. This representation is more general than the binary tree representation and, therefore, it introduces new FFT algorithms that were not discovered before. Furthermore, the new representation has the advantage that it is simple and easy to understand, as each FFT algorithm only consists of a triangular matrix. Besides, the new representation allows for obtaining the exact twiddle factor values in the FFT flow graph easily. This facilitates the design of FFT hardware architectures. As a result, the triangular matrix representation is an excellent alternative to represent FFT algorithms and it opens new possibilities in the exploration and understanding of the FFT.

...read moreread less

32 citations

"FFT Radix-2 and Radix-4 FPGA Accele..." refers background or methods in this paper

...After this brief introduction of the well-known FFT algorithms with its two varieties (radix 2 or 4), hereinafter, implementation results for the radix-2 and radix-4 FFT using DIF for 8 and 16 point input sequences are discussed....
[...]
...A. 8-Point Radix-2 FFT DIF The 8 point radix-2 FFT architecture is formed of S...
[...]
...Figure 3 presents a flow-graph of an 8 point radix-2 FFT with DIF....
[...]
...The operation diagram describing the butterfly stages and data flow is referred to as the flow-graph [7]....
[...]
...The HDL implementation is conducted on an 8 point FFT radix-2 DIF and 16 point FFT radix-4 DIF under finite precision arithmetic and 16-bit signed data types, targeting a Xilinx ZynQ ’7z020clg484 -1’ [1] and an Intel/Altera Cyclone IV EP4CE115F29C7 with a total of 532 DSP units of 9*9 embedded multiplier each and 114480 logic elements (LEs)....
[...]

Proceedings Article•DOI•

Hardware prototyping of FBMC/OQAM baseband for 5G mobile communication systems

[...]

Jeremy Nadal, Charbel Abdel Nour, Amer Baghdadi, Hao Lin

16 Oct 2014

TL;DR: A new design and prototyping experience of an advanced communication system based on filter-bank multi-carrier (FBMC) modulation, being studied and considered nowadays by recent research projects as a key enabler for the future flexible 5G air interface.

...read moreread less

Abstract: Embedded systems in the field of digital communications are becoming increasingly diversified and complex. This trend is being confirmed with the emergence of many new application scenarios for mobile communication systems beyond 2020. In this context, rapid prototyping experiences are of high interest for performance validation and proof-of-concept of the diverse proposed communication techniques. In this paper, we present a new design and prototyping experience of an advanced communication system based on filter-bank multi-carrier (FBMC) modulation. This modulation is being studied and considered nowadays by recent research projects as a key enabler for the future flexible 5G air interface. The paper illustrates the complete design and prototyping flow from al-gorithm specification to on-board validation and demonstration. The proposed prototype enables to illustrate and evaluate the performance of this new waveform compared to state-of-the-art OFDM-based systems.

...read moreread less

30 citations

"FFT Radix-2 and Radix-4 FPGA Accele..." refers background in this paper

...Furthermore, filter-based multicarrier (FBMC), a promising candidate for the 5G technology, is a subset of multicarrier modulation systems which provides better resistance to multipath by dividing the bandwidth into multiple sub-bands corresponding to the available subcarriers [3] [4]....
[...]