scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

FFT Radix-2 and Radix-4 FPGA Acceleration Techniques Using HLS and HDL for Digital Communication Systems

TL;DR: This study focuses on communication systems incorporating filter-based-multicarrier modulations (FBMC), a promising candidate for the 5G technology and implemented and tested various combinations using finite precision, HLS tools and HDL while prompting parallelization, pipelining and hardware reuse architectures.
Abstract: Fast Fourier Transform (FFT) is generally implemented on reconfigurable hardware in several signal processing or digital communication applications. It can be considered the most time and resource consuming operations due to the need of complex operations. The main of this manuscript is to investigate the contribution of High Level Synthesis (HLS) techniques on the implementation of real time FFT algorithms using field programmable gate arrays (FPGAs). In particular, this study focuses on communication systems incorporating filter-based-multicarrier modulations (FBMC), a promising candidate for the 5G technology. In order to evaluate the contribution of HLS, we implemented and tested various combinations such as: 8 and 16 points radix-2 and radix-4 FFT using finite precision, HLS tools and HDL while prompting parallelization, pipelining and hardware reuse architectures.
Citations
More filters
Proceedings ArticleDOI
01 Sep 2019
TL;DR: An enhanced, low complexity parallel version of the cascade RLMS is presented by eliminating the need for computing the array image vector cascading stage, and a new Kalman based parallel RLMS (RKLMS) method is proposed, where the LMS stage is replaced by a Kalman implementation of the classical LMS, and compared under low Signal to Interference plus Noise ratios (SINR).
Abstract: To ease spectral congestion and enhance frequency reuse, researchers are targeting smart antenna systems using spatial multiplexing and adaptive signal processing techniques. Moreover, the accuracy and efficiency of such systems is highly dependent on the adaptive algorithms they employ. A popular, adaptive beamforming algorithm, widely used in smart antennas, is the Recursive Least Square (RLS) algorithm. While, the classical RLS implementation achieves high convergence, it still suffers from its inability to track the target of interest. Recently, a new adaptive algorithm called Recursive Least Square - Least Mean Square (RLMS) which employs a RLS stage followed by a Least Mean Square (LMS) algorithm stage and separated by an estimate of the array image vector, i.e. steering vector, has been proposed. RLMS outperforms previous RLS and LMS variants, with superior convergence and tracking capabilities, at the cost of a moderate increase in computational complexity. In this paper, an enhanced, low complexity parallel version of the cascade RLMS is presented by eliminating the need for computing the array image vector cascading stage. Hence, For an antenna of N elements our strategy can reduce the complexity of the system by 20N multiplications, 6N additions and 2N divisions. Moreover, a new Kalman based parallel RLMS (RKLMS) method is also proposed, where the LMS stage is replaced by a Kalman implementation of the classical LMS, and compared under low Signal to Interference plus Noise ratios (SINR). Simulation results show identical performance for the parallel RLMS, cascaded RLMS at 10dB and superior performance and robustness for the RKLMS on low SINR cases up to -10dB.

10 citations


Cites background from "FFT Radix-2 and Radix-4 FPGA Accele..."

  • ...Therefore it becomes more suitable for a hardware implementation [17], [18]....

    [...]

Journal ArticleDOI
TL;DR: A comparative study between HLS and HDL for FPGA, using a Sobel filter as a case study in the image processing field shows that the HDL implementation is slightly better than the HLS version considering resource usage and response time.
Abstract: The increasing complexity in today's systems and the limited market times demand new development tools for FPGA. Currently, in addition to traditional hardware description languages (HDLs), there are high-level synthesis (HLS) tools that increase the abstraction level in system development. Despite the greater simplicity of design and testing, HLS has some drawbacks in describing harware. This paper presents a comparative study between HLS and HDL for FPGA, using a Sobel filter as a case study in the image processing field. The results show that the HDL implementation is slightly better than the HLS version considering resource usage and response time. However, the programming effort required in the HDL solution is significantly larger than in the HLS counterpart.

6 citations


Cites result from "FFT Radix-2 and Radix-4 FPGA Accele..."

  • ...Conversely, other studies observed better performances in one of the two implementations, either HLS [23]–[27] or HDL [28], [29]....

    [...]

Journal ArticleDOI
TL;DR: The hardware chip performance analysis of the variable length FFT processor architectures on Field Programmable Gate Array (FPGA) platform using VHDL programming in which FFT length varies from 8 point to 65,536 point is focused on.
Abstract: The Fast Fourier Transform (FFT) is one of the most important algorithm used in digital signal processing (DSP) and digital communication applications to compute fast operations. FFT and IFFT is wi...

4 citations


Cites methods from "FFT Radix-2 and Radix-4 FPGA Accele..."

  • ...Akkad et al. (2018) presented the performance of real time FFT algorithms High Level Synthesis (HLS) environment....

    [...]

Journal ArticleDOI
TL;DR: Radix-2 decimation in frequency (R2DIF) method is designed to execute an efficient FFT architecture and outperforms conventional methods in terms of less usage power and high speed.
Abstract: Fast Fourier transform (FFT) is utilised to minimise the complexity of discrete Fourier transform by converting signals from frequency domain to time domain and conversely. Digital signal processing systems like image processing, general filtering, sonar, spread-spectrum communications and convolutions use this FFT operations. Radix-2 decimation in frequency (R2DIF) method is designed to execute an efficient FFT architecture in this study. Each and every state of the FFT stores the input and output the data using the R2DIF method. Also, the complex twiddle factors in FFT are replaced by the proposed uniform Montgomery algorithm. This technique simply performs the shift-add method instead of the multiplication process which also enhances the convergence of the calculation. So, the FFT implementation is done with the help of the proposed method which reduces the usage of chips in the process. Based on this approach, it performs the operation of FFT from 16 points to 1024 points and the performance of this proposed method is compared with existing approaches. Moreover, it does not require expensive dedicated functional blocks and uses only distributed logic resources. The simulation is carried out by the Xilinx platform using Verilog coding. The proposed design outperforms conventional methods in terms of less usage power and high speed.

3 citations

Proceedings ArticleDOI
03 Jul 2019
TL;DR: An optimized hardware architecture for a parallel Odd-Even transposition sorting network, on field programmable gate array (FPGA) based embedded systems is proposed, which results in increasing overall performance by minimizing hardware resource utilization, increasing the operating frequency and reducing complexity.
Abstract: Sorting is one of the most frequently executed routines on modern computers. Such algorithms are classically implemented as software programs and can contribute significantly to the overall execution time of a process. In this respect, implementing sorting algorithms in hardware can dramatically increase the overall performance of the applications embodying them. This paper proposes an optimized hardware architecture for a parallel Odd-Even transposition sorting network, on field programmable gate array (FPGA) based embedded systems. This implementation introduces a modification of the classical Odd-Even Transposition sorting algorithm. This modification is a shift-based approach offering high flexibility for general purpose applications. The proposed architecture results in increasing overall performance by minimizing hardware resource utilization, increasing the operating frequency and reducing complexity. Simulation and synthesis results demonstrates that the proposed architecture is minimal in size, can operate on odd and even length arrays, capable of sorting arrays of length larger than two times the number of available processors, and can begin the sorting process at data input.

2 citations


Cites background or methods from "FFT Radix-2 and Radix-4 FPGA Accele..."

  • ...p is expanded with additional input elements to form the vector m=[8,12,4,15,2,11,6,3,5,14,16,10,1,9,13,7,12,8,10,9,13,11,15,14] of 24 elements....

    [...]

  • ...The classical Odd-Even sorting simulation is conducted on an array p of N = 16 elements where p = [8, 12, 4, 15, 2, 11, 6, 3, 5, 14, 16, 10, 1, 9, 13, 7]....

    [...]

References
More filters
Book
30 Apr 2013
TL;DR: This book offers a unified presentation of OFDM theory and high speed and wireless applications, in particular, ADSL, wireless LAN, and digital broadcasting technologies are explained.
Abstract: From the Publisher: Multi-carrier modulation, in particular orthogonal frequency division multiplexing (OFDM), has been successfully applied to a wide variety of digital communications applications for several years. Although OFDM has been chosen as the physical layer standard for a diversity of important systems, the theory, algorithms, and implementation techniques remain subjects of current interest. This book is intended to be a concise summary of the present state of the art of the theory and practice of OFDM technology. This book offers a unified presentation of OFDM theory and high speed and wireless applications. In particular, ADSL, wireless LAN, and digital broadcasting technologies are explained. It is hoped that this book will prove valuable both to developers of such systems, and to researchers and graduate students involved in analysis of digital communications, and will remain a valuable summary of the technology, providing an understanding of new advances as well as the present core technology.

755 citations


"FFT Radix-2 and Radix-4 FPGA Accele..." refers background in this paper

  • ...OFDM is a multi-carrier modulation that increases spectral efficiency with orthogonal carriers [1] [2]....

    [...]

Journal ArticleDOI
TL;DR: This work uses a first-published methodology to compare one commercial and three academic tools on a common set of C benchmarks, aiming at performing an in-depth evaluation in terms of performance and the use of resources.
Abstract: High-level synthesis (HLS) is increasingly popular for the design of high-performance and energy-efficient heterogeneous systems, shortening time-to-market and addressing today’s system complexity. HLS allows designers to work at a higher-level of abstraction by using a software program to specify the hardware functionality. Additionally, HLS is particularly interesting for designing field-programmable gate array circuits, where hardware implementations can be easily refined and replaced in the target device. Recent years have seen much activity in the HLS research community, with a plethora of HLS tool offerings, from both industry and academia. All these tools may have different input languages, perform different internal optimizations, and produce results of different quality, even for the very same input description. Hence, it is challenging to compare their performance and understand which is the best for the hardware to be implemented. We present a comprehensive analysis of recent HLS tools, as well as overview the areas of active interest in the HLS research community. We also present a first-published methodology to evaluate different HLS tools. We use our methodology to compare one commercial and three academic tools on a common set of C benchmarks, aiming at performing an in-depth evaluation in terms of performance and the use of resources.

433 citations

Proceedings ArticleDOI
23 May 2004
TL;DR: In this paper, an efficient variable-length FFT processor architecture suitable for multi-mode and multi-standard OFDM communication systems is proposed, based on radix-2/sup 2/ DIF FFT algorithm and also supports non-power-of-4 FFT computation.
Abstract: In this paper, we propose an efficient variable-length FFT processor architecture suitable for multi-mode and multi-standard OFDM communication systems. The FFT processor is based on radix-2/sup 2/ DIF FFT algorithm and also supports non-power-of-4 FFT computation. The design contains an efficient processing element (PE), which can execute radix-2/sup 2/ butterfly (BF) operations, as well as radix-2 BF operations. Moreover, in order to achieve high-performance variable-length FFT operations and data accesses, an efficient variable-length address generator and twiddle factor generator are designed. The design has the merits of low complexity and high speed performance. The designs consider seven different FFT lengths including 64, 256, 512, 1024, 2048, 4096, and 8192 points, which cover all the required FFT lengths by 802.11a, 802.16a, DAB, DVB-T, VDSL and ADSL.

60 citations


"FFT Radix-2 and Radix-4 FPGA Accele..." refers background or methods in this paper

  • ...In [8], the authors propose the implementation of the FFT radix-4 [8]....

    [...]

  • ...Many hardware optimization techniques for an efficient FFT computation on FPGA have been proposed in [7-12]....

    [...]

Journal ArticleDOI
TL;DR: The triangular matrix representation is an excellent alternative to represent FFT algorithms and it opens new possibilities in the exploration and understanding of the FFT.
Abstract: In this paper we propose a new representation for FFT algorithms called the triangular matrix representation. This representation is more general than the binary tree representation and, therefore, it introduces new FFT algorithms that were not discovered before. Furthermore, the new representation has the advantage that it is simple and easy to understand, as each FFT algorithm only consists of a triangular matrix. Besides, the new representation allows for obtaining the exact twiddle factor values in the FFT flow graph easily. This facilitates the design of FFT hardware architectures. As a result, the triangular matrix representation is an excellent alternative to represent FFT algorithms and it opens new possibilities in the exploration and understanding of the FFT.

32 citations


"FFT Radix-2 and Radix-4 FPGA Accele..." refers background or methods in this paper

  • ...After this brief introduction of the well-known FFT algorithms with its two varieties (radix 2 or 4), hereinafter, implementation results for the radix-2 and radix-4 FFT using DIF for 8 and 16 point input sequences are discussed....

    [...]

  • ...A. 8-Point Radix-2 FFT DIF The 8 point radix-2 FFT architecture is formed of S...

    [...]

  • ...Figure 3 presents a flow-graph of an 8 point radix-2 FFT with DIF....

    [...]

  • ...The operation diagram describing the butterfly stages and data flow is referred to as the flow-graph [7]....

    [...]

  • ...The HDL implementation is conducted on an 8 point FFT radix-2 DIF and 16 point FFT radix-4 DIF under finite precision arithmetic and 16-bit signed data types, targeting a Xilinx ZynQ ’7z020clg484 -1’ [1] and an Intel/Altera Cyclone IV EP4CE115F29C7 with a total of 532 DSP units of 9*9 embedded multiplier each and 114480 logic elements (LEs)....

    [...]

Proceedings ArticleDOI
16 Oct 2014
TL;DR: A new design and prototyping experience of an advanced communication system based on filter-bank multi-carrier (FBMC) modulation, being studied and considered nowadays by recent research projects as a key enabler for the future flexible 5G air interface.
Abstract: Embedded systems in the field of digital communications are becoming increasingly diversified and complex. This trend is being confirmed with the emergence of many new application scenarios for mobile communication systems beyond 2020. In this context, rapid prototyping experiences are of high interest for performance validation and proof-of-concept of the diverse proposed communication techniques. In this paper, we present a new design and prototyping experience of an advanced communication system based on filter-bank multi-carrier (FBMC) modulation. This modulation is being studied and considered nowadays by recent research projects as a key enabler for the future flexible 5G air interface. The paper illustrates the complete design and prototyping flow from al-gorithm specification to on-board validation and demonstration. The proposed prototype enables to illustrate and evaluate the performance of this new waveform compared to state-of-the-art OFDM-based systems.

30 citations


"FFT Radix-2 and Radix-4 FPGA Accele..." refers background in this paper

  • ...Furthermore, filter-based multicarrier (FBMC), a promising candidate for the 5G technology, is a subset of multicarrier modulation systems which provides better resistance to multipath by dividing the bandwidth into multiple sub-bands corresponding to the available subcarriers [3] [4]....

    [...]