scispace - formally typeset
Search or ask a question

Showing papers on "Split-radix FFT algorithm published in 2023"


Proceedings ArticleDOI
16 Jan 2023
TL;DR: In this paper , a top-down approximate floating-point FFT design methodology is proposed to exploit the error-tolerance nature of the FFT algorithm, which can achieve up to 52% area delay-product improvement and 23% energy saving when compared to the exact FFT.
Abstract: Fast Fourier Transform (FFT) is a key digital signal processing algorithm that is widely deployed in mobile and portable devices. Recently, with the popularity of human perception related tasks, it is noted that the requirements of full precision and exactness are not always necessary for FFT computation. We propose a top-down approximate Floating-Point FFT design methodology to fully exploit the error-tolerance nature of the FFT algorithm. An efficient error modeling of the configurable approximate multiplier is proposed to link the multiplier approximation to the FFT algorithm precision. Then an approximation optimization flow is formulated to maximize the energy efficiency. Experimental results show that the proposed approximate FFT can achieve up to 52% Area-Delay-Product improvement and 23% energy saving when compared to the exact FFT. The proposed approximate FFT is also found to cover almost 2× wider precision range with higher energy efficiency in comparison with the prior state-of-the-art approximate FFT.

Proceedings ArticleDOI
25 Feb 2023
TL;DR: In this paper , an algorithm-specific instruction (ASI)-based fast Fourier transform (FFT) code generation framework is proposed to generate unified architecture independent butterfly kernels that can be transformed into architecture-dependent kernels by establishing the mapping between ASIs and architecture-specific instructions for various hardware platforms.
Abstract: This paper proposes an algorithm-specific instruction (ASI)-based fast Fourier transform (FFT) code generation framework, named FFTASI, to generate unified architecture independent butterfly kernels that can be transformed into architecture-dependent kernels by establishing the mapping between ASIs and architecture-specific instructions for various hardware platforms. FFTASI strikes a good balance between performance and productivity on CPUs.

Book ChapterDOI
01 Jan 2023

Proceedings ArticleDOI
08 Mar 2023
TL;DR: In this article , the authors proposed a parallel implementation of the Cooley-Tukey Fast Fourier Transform (FFT) using real-valued operations until the very last step.
Abstract: Accurate and faster computation of the Fast Fourier Transform (FFT) using parallel computing is the result of a novel algorithm called FFTpc described in this paper. As opposed to the Cooley-Tukey FFT, the FFTpc uses only real-valued operations until the very last step. Filtering in parallel in the frequency domain is done on data subsets that are processed simultaneously with no data interchange between processors through the main parts of the filtering process. In addition, if the user only requires the magnitude of the transform, the algorithm involves no complex-valued operations at all. Many other novel aspects of the FFTpc and both estimated and actual speedups are reported.

Book ChapterDOI
16 Feb 2023

Proceedings ArticleDOI
12 Feb 2023
TL;DR: The BxBFFT as mentioned in this paper is a parallel-pipelined Fast Fourier Transform (FFT) implementation of the Xilinx SSR FFT, which has higher clock speeds than other FFT algorithms.
Abstract: This poster introduces the "BxBFFT" parallel-pipelined Fast Fourier Transform (FFT), which gives higher clock speeds (Fmax) than competitors with substantial savings in power and logic resources. In comparisons with the Xilinx SSR FFT, Spiral FFT, Astron FFT, and ZipCPU FFT, the BxBFFT had clock speeds above 650MHz in cases where all others were below 300MHz. The BxBFFT's LUTs and power were lower by a factor of ~1.5. The BxBFFT had faster Vivado implementation and faster RTL simulation, for improved productivity in design, testing, and verification. BxBFFT simulations were over 10 times faster than the Xilinx SSR FFT. The BxBFFT supports more features than other FFTs, including real-to-complex FFTs, non-power-of-2 FFTs, and features for high reliability in adverse environments. The BxBFFT's improved performance has been verified in real applications. One customer design had to operate with a reduced workload due to excessive current draw of the Xilinx SSR FFT. A quick replacement of the Xilinx SSR FFT with the BxBFFT lowered die temperature by 34.8 degree Celsius and allowed the design to operate under full load. The source of the BxBFFT's performance is intensive optimization of well-known FFT algorithms, not new algorithms. The BxBFFT's coding style gives better control over synthesis to avoid and resolve performance bottlenecks. Automated generation of top-level code supports 13 different choices for radix and 2 different choices for data flow at each stage, to make optimal choices for each BxBFFT size. This results in a highly efficient FFT.

Proceedings ArticleDOI
04 Jun 2023
TL;DR: In this article , the RV-based convolution algorithm was proposed and applied to linear convolution, where every calculation is natively real-valued (RV) dot products.
Abstract: The Fast Fourier Transform (FFT)-based convolution is the most popular fast convolution algorithm. In past work, we developed the Discrete Hirschman Transform (DHT)-based convolution. When compared to the FFT-based convolution, our DHT-based convolution can reduce the computational complexity by a third. Recently, we developed a comprehensive DFT algorithm where every calculation is natively real-valued (RV) dot products. In this paper, we first apply the natively real-valued DFT to linear convolution. We call this method the RV-based convolution. The arithmetic analysis reveals that it efficiently reduces the operation counts. The algorithm is fast regardless of length.

Proceedings ArticleDOI
01 Jan 2023
TL;DR: In this paper , an architecture for the implementation of the FFT that is derived from the Dynamically Reconfigurable Resource Array and has multiple parallel processing cells while also providing the flexibility to select the radix for each stage of FFT.
Abstract: Fast-Fourier Transform is an important algorithm which is used in digital signal processing and communication applications. Furthermore, mixed-radix FFT provides flexibility and increases the speed of FFT computation. For real-time processing, efficient hardware implementation using reconfigurable architectures is preferred which can offer higher performance and flexibility. In this paper, we propose an architecture for the implementation of the FFT that is derived from the Dynamically Reconfigurable Resource Array and has multiple parallel processing cells while also providing the flexibility to select the radix for each stage of the FFT. The twiddle factor generator proposed in this architecture minimizes the memory requirements and simplifies the hardware. Using the proposed architecture, various length FFTs were mapped onto either single cell or multiple cells in parallel. It is observed that the proposed architecture improves the performance by 2x times when compared to the existing FFT architectures.

Journal ArticleDOI
TL;DR: In this paper , a trainable Fast Fourier Transform (FFT) structure was proposed to increase the accuracy of beamspace transformation in the multi-user (MU) mode of a MIMO receiver.
Abstract: In this letter, we propose new trainable Fast Fourier Transform (FFT) structures to increase the accuracy of beamspace transformation in the multi-user (MU) mode of Massive Multiple Input Multiple Output (MIMO) receiver. The FFT is a widely employed beamspace transformation technique, thanks to its low transforming complexity. Unfortunately, there is a significant performance loss of MU signal detection in the FFT beamspace, as default FFT beams are not co-directed with angles of arrivals. We address this issue with trainable FFT that outperforms not only fixed FFT approaches, but also other trainable FFT techniques.