scispace - formally typeset
Search or ask a question

Showing papers on "Twiddle factor published in 2019"


Book
23 Sep 2019
TL;DR: PRELIMINARIES An Elementary Introduction to the Discrete Fourier Transform Some Mathematical and Computational Preliminaries SEQUENTIAL FFT ALGORITHMS The Divide-and-Conquer Paradigm and Two Basic FFT Algorithms Deciphering the Scrambled Output from In-Place FFT Computation Bit-Reversed Input to the Radix-2 DIF FFT.
Abstract: PRELIMINARIES An Elementary Introduction to the Discrete Fourier Transform Some Mathematical and Computational Preliminaries SEQUENTIAL FFT ALGORITHMS The Divide-and-Conquer Paradigm and Two Basic FFT Algorithms Deciphering the Scrambled Output from In-Place FFT Computation Bit-Reversed Input to the Radix-2 DIF FFT Performing Bit-Reversal by Repeated Permutation of Intermediate Results An In-Place Radix-2 DIT FFT for Input in Natural Order An In-Place Radix-2 DIT FFT for Input in Bit-Reversed Order An Ordered Radix-2 DIT FFT Ordering Algorithms and Computer Implementation of Radix-2 FFTs The Radix-4 and the Class of Radix-2s FFTs The Mixed-Radix and Split-Radix FFTs FFTs for Arbitrary N FFTs for Real Input FFTs for Composite N Selected FFT Applications PARALLEL FFT ALGORITHMS Parallelizing the FFTs: Preliminaries on Data Mapping Computing and Communications on Distributed-Memory Multiprocessors Parallel FFTs without Inter-Processor Permutations Parallel FFTs with Inter-Processor Permutations A Potpourri of Variations on Parallel FFTs Further Improvement and a Generalization of Parallel FFTs Parallelizing Two-Dimensional FFTs Computing and Distributing Twiddle Factors in the Parallel FFTs APPENDICES Fundamental Concepts of Efficient Scientific Computation Solving Recurrence Equations by Substitution Bibliography

148 citations


Journal ArticleDOI
TL;DR: The proposed processor has better-normalized throughput per area unit than the state-of-the-art available designs and is designed as a general IP and can be implemented using a processor synthesizer (application-specific instruction-set processor designer).
Abstract: A high-throughput programmable fast Fourier transform (FFT) processor is designed supporting 16- to 4096-point FFTs and 12- to 2400-point discrete Fourier transforms (DFTs) for 4G, wireless local area network, and future 5G. A 16-path data parallel memory-based architecture is selected as a tradeoff between throughput and cost. To implement a hardware-efficient high-speed processor, several improvements are provided. To maximally reuse the hardware resource, a reconfigurable butterfly unit is proposed to support computing including eight radix-2 in parallel, four radix-3/4 in parallel, two radix-5/8 in parallel, and a radix-16 in one clock cycle. Twiddle factor multipliers using different schemes are optimized and compared, wherein modified coordinate rotation digital computer scheme is finally implemented to minimize the hardware cost while supporting both FFTs and DFTs. An optimized conflict-free data access scheme is also proposed to support multiple butterflies at any radices. The processor is designed as a general IP and can be implemented using a processor synthesizer (application-specific instruction-set processor designer). The electronic design automation synthesis result based on a 65-nm technology shows that the processor area is 1.46 mm2. The processor supports 972 MS/s 4096-point FFT at 250 MHz with a power consumption of 68.64 mW and a signal-to-quantization-noise ratio of 66.1 dB. The proposed processor has better-normalized throughput per area unit than the state-of-the-art available designs.

30 citations


Book ChapterDOI
01 Jan 2019
TL;DR: This chapter summarizes the research on FFT hardware architectures by presenting the FFT algorithms, the building blocks in FFTHardware architectures, the architectures themselves, and the bit reversal algorithm.
Abstract: The fast Fourier transform (FFT) is a widely used algorithm in signal processing applications. FFT hardware architectures are designed to meet the requirements of the most demanding applications in terms of performance, circuit area, and/or power consumption. This chapter summarizes the research on FFT hardware architectures by presenting the FFT algorithms, the building blocks in FFT hardware architectures, the architectures themselves, and the bit reversal algorithm.

16 citations


Proceedings ArticleDOI
01 Sep 2019
TL;DR: A comparison with commercially available FFTs which were specifically tailored for the particular FPGA platforms proves that FFT generator instances can be both performance-and resource-competitive with state of the art designs.
Abstract: A configurable fast Fourier transform (FFT) engines and their inverse counterparts are indispensable in modern wireless communication and radar systems. The FFT processors are usually customized per use case. Therefore, a design generator of single-path delay feedback type of an FFT processor, that permits continuous input and output data streaming has been captured inside Chisel hardware construction language. It supports a wide range of parameter settings, like input data and twiddle factor widths, FFT sizes and number of stages, three radices, different scaling and rounding methods after each butterfly or dragonfly stage, among others, thus enabling an agile design space exploration. A comparison with commercially available FFTs which were specifically tailored for the particular FPGA platforms proves that FFT generator instances can be both performance-and resource-competitive with state of the art designs.

8 citations


Proceedings ArticleDOI
01 Jan 2019
TL;DR: This paper implements a 4-parallel 64K-point FFT hardware architecture based on 2-epoch FFT algorithm and reduces a large number of storages for twiddle factor coefficients, so that the area of ROM can be reduced.
Abstract: Radar technology and its developments have been important issues for decades. With the growth of semiconductor processing technology, the development of circuit design related to THz technology has gradually been noted such as imaging radar system. However, there are ultra-long series in the application of wideband radar system with high sampling rate. While realizing the ultra-long FFT, it would introduce some design challenges. On the other hand, it needs to achieve a high throughput rate to meet the requirement of real-time processing. In this paper, we implement a 4-parallel 64K-point FFT hardware architecture based on 2-epoch FFT algorithm. With the proposed middle twiddle factor generator, we can reduce a large number of storages for twiddle factor coefficients, so that the area of ROM can be reduced. We implement this work in TSMC 90 nm CMOS technology with high- Vt standard cell library and the total gate counts are around 3974.1k. The maximum operating frequency of the system is 390 MHz. When operating at the maximum operating frequency, the throughput reaches 1.57 GS/s and it consumes 0.2811 W (@0.9 V).

7 citations


Journal ArticleDOI
TL;DR: This paper proposes an area-efficient fast Fourier transform (FFT) processor for zero-padded signals based on the radix-2 2 and the radIX-2 3 single-path delay feedback pipeline architectures, which results in a logic gate count of 40,396, which can be efficient and suitable for zero -padded FFT processors.
Abstract: This paper proposes an area-efficient fast Fourier transform (FFT) processor for zero-padded signals based on the radix-2 2 and the radix-2 3 single-path delay feedback pipeline architectures. The delay elements for aligning the data in the pipeline stage are one of the most complex units and that of stage 1 is the biggest. By exploiting the fact that the input data sequence is zero-padded and that the twiddle factor multiplication in stage 1 is trivial, the proposed FFT processor can dramatically reduce the required number of delay elements. Moreover, the 256-point FFT processors were designed using hardware description language (HDL) and were synthesized to gate-level circuits using a standard cell library for 65 nm CMOS process. The proposed architecture results in a logic gate count of 40,396, which can be efficient and suitable for zero-padded FFT processors.

6 citations


Proceedings ArticleDOI
01 Mar 2019
TL;DR: In this paper, a harmonic extraction method based on modulated hopping DFT (mHDFT) in phase locked loop PLL technique is proposed, which is introduced in PLL to reduce the computation burden by injecting time hopping technique in time series.
Abstract: This paper proposes a harmonic extraction method based on modulated hopping DFT (mHDFT) in phase locked loop PLL technique. In presence of harmonics, a voltage or current signal can be deteriorated, and hence it is very much essential to extract harmonics from the signal to obtain a good power quality. The proposed PLL comprises of mHDFT filter, Phase Detector (PD), Moving Averager (MA), PI Controller and Numerically Controlled Oscillator (NCO). The mHDFT algorithm is introduced in PLL to reduce the computation burden by injecting time hopping technique in time series. It removes the instability due to the presence of twiddle factor term in the recursive path and also gives more accuracy as compared to other DFT algorithms. The proposed technique can extract the harmonic components of a signal in the presence of large noise. Mean Square Error (MSE) of the harmonic measurement approaches to the Cramer-Rao lower bound (CRLB).

4 citations


Journal ArticleDOI
TL;DR: A novel mixed-radix FFT algorithm featuring the single-sided binary-tree decomposition strategy is proposed aiming at effectively containing the complexity of multiplications for any 2k-point FFT.

3 citations


Journal ArticleDOI
TL;DR: The architecture of the proposed TF is developed based on the adaptive method of the coordinate rotation digital computer (CORDIC) algorithm, and the core reached its highest operating frequency of 55 MHz at the 1.2-V supply voltage (VDD) with the forward back-gate bias (FBB) ≥ 1.5 V.
Abstract: In this brief, a silicon-on-thin-BOX (SOTB) implementation of single-precision floating-point fast-Fourier-transform (FFT) twiddle factor (TF) is presented. The architecture of the proposed TF is developed based on the adaptive method of the coordinate rotation digital computer (CORDIC) algorithm. The 65-nm SOTB technology was chosen because of its ultra-low-power advantage. Furthermore, the back-gate bias technique can be applied on an SOTB chip to adjust the operation for high-performance or low-power requirement. The layout of the SOTB 65-nm TF core is about 22869 gate-count on the die area of 86721 $ \mu \text {m}^{2}$ . The measurement results show that the core reached its highest operating frequency of 55 MHz at the 1.2-V supply voltage (VDD) with the forward back-gate bias (FBB) ≥ 1.5 V. The power and energy consumptions at this point were 1.54 mW and 27.91 pJ/cycle, respectively. The lowest operating VDD was at 0.5 V with the FBB ≥ 0.5 V. In the standby mode, when the clock-gating technique was deployed, the leakage current can be reduced to 0.4 nA at the 0.4 V VDD and −2.5-V reverse back-gate bias (RBB).

2 citations


Proceedings ArticleDOI
01 Feb 2019
TL;DR: A Silicon On Thin Buried-oxide (SOTB) implementation of the 32-bit floating-point Twiddle Factor (TF) is presented, and the measurement results showed that at the best crossing-point of the 0.75-V power supply (VDD), the chip could run at the maximum operating frequency (FMax) of 32-MHz and consumed 181-µW power.
Abstract: In this paper, a Silicon On Thin Buried-oxide (SOTB) implementation of the 32-bit floating-point Twiddle Factor (TF) is presented. The architecture was developed based on the adaptive COordinate Rotation DIgital Computer (CORDIC). The CORDIC method is a well-known approach for approximating the complex-number multiplication, also known as TF in Fast Fourier Transform (FFT) designs. The SOTB-65nm TF core layout has the size area of 86.7K-µm2. The measurement results showed that at the best crossing-point of the 0.75-V power supply (V DD ), the chip could run at the maximum operating frequency (F Max ) of 32-MHz and consumed 181-µW power. At the sleep-mode, the leakage power dropped about 258.6× to 0.7-µW at the 0.75-V V DD .

1 citations


Patent
11 Oct 2019
TL;DR: In this article, a design method of a two-dimensional Fourier transform IP core based on HLS was revealed, and the design method specifically comprises the following steps: splitting 2D transform in image processing into two one-dimensional transform transforms; calculating the one dimensional transform by using a DIT radix-2 fast Fourier calculation method; and adopting an HLS tool to design a one dimensional Fourier Transform IP core with the processing lengths of 256 and 128.
Abstract: The invention discloses a design method of a two-dimensional Fourier transform IP core based on HLS. The design method specifically comprises the following steps: splitting two-dimensional Fourier transform in image processing into two one-dimensional Fourier transforms; calculating the one-dimensional Fourier transform by using a DIT radix-2 fast Fourier calculation method; adopting an HLS tool to design a one-dimensional Fourier transform IP core with the processing lengths of 256 and 128, including acceleration design of a twiddle factor, design of a reverse order and design of inverse Fourier transform. According to the invention, the two-dimensional Fourier transform is split; one-dimensional Fourier transform is respectively designed by adopting fast Fourier transform; acceleration is realized in the aspect of software by using a simplified algorithm, hardware IP core design is performed by using HLS, acceleration is realized in the aspect of hardware by using parallel computing,and after two-dimensional Fourier transform in an image processing technology is accelerated, real-time design of algorithm processing is facilitated, and the industrial practicability of an image processing algorithm is improved.

Book ChapterDOI
01 Jan 2019
TL;DR: An effective fast Fourier transform (FFT) processor for 1024-point computation based on the radix-2 of decimation-in-frequency (R2DIF) and uses the pipelined feedback (PF) technique via shift registers to efficiently share the same storage between the inputs and outputs during computation.
Abstract: This paper proposes an effective fast Fourier transform (FFT) processor for 1024-point computation based on the radix-2 of decimation-in-frequency (R2DIF) and uses the pipelined feedback (PF) technique via shift registers to efficiently share the same storage between the inputs and outputs during computation. The large memory footprint of the complex twiddle factor multipliers, and hence, area on a chip, of the proposed design is reduced by employing the coordinate rotation digital computer (CoRDiC), which replaces the complex multipliers and does not require memory blocks to store the twiddle factors. To enhance the efficient usage of the hardware resources, the proposed design only uses distributed logic. This can eliminate the use of dedicated functional blocks, which are usually limited to the target chip. The entire proposed system is mapped on a Virtex-7 field-programmable gate array (FPGA) for functional verification and synthesis. The achieved result is the proposed FFT processor more effective in terms of the speed, precision, and resource, as shown in experimental results.

Proceedings ArticleDOI
01 Sep 2019
TL;DR: A new 16-point FFT architecture is designed and a twiddle factor merging method is proposed to reduce the number of multiplications, additions and subtractions used in the design.
Abstract: FFT is one of the most widely-used algorithms in signal processing and communications applications. Although its realization in hardware-efficient FFT designs has been studied, there is still room to further reduce the complexity of FFT architectures by exploring more efficient expressions of twiddle factors in FFT. In this paper, a new 16-point FFT architecture is designed. A twiddle factor merging method is proposed to reduce the number of multiplications, additions and subtractions used in the design. To further improve the design, we apply a common subexpression sharing scheme to optimize the hardware resource sharing among the twiddle factors. Compared with previously published method, the proposed 16-point FFT architecture gains 25.4% and 14% improvement on hardware cost and delay respectively.

Journal ArticleDOI
TL;DR: Experimental results show that compared to the benchmark fixed-point architectures, the proposed AI-based 16-point radix-22 FFT architecture not only greatly improves the SQNR, but also provides higher throughput.
Abstract: This paper studies the challenge of accurate FFT computation. A generic and error-free encoding is proposed based on the algebraic integers (AIs). A wise AI-based encoding may greatly decrease the error due to the non-trivial twiddle factors in the FFT computation. Further, a new method for predicting the well-pruned architecture is presented which helps designing an optimized and low-cost architecture when using the AI-based encoding. In order to examine the proposed AI-based FFT computation and also the procedure of designing an optimized architecture, a custom AI-based 16-point radix-22 FFT architecture has been designed and implemented using 180-nm CMOS technology. Experimental results show that compared to the benchmark fixed-point architectures, the proposed architecture not only greatly improves the SQNR, but also provides higher throughput. Further, the power consumption and area overhead of the ASIC implementation both show an overhead of less than 45% compared to the reference architecture.

Proceedings ArticleDOI
10 Jun 2019
TL;DR: An efficient way to obtain Fast Fourier Transform algorithm (FFT), which can achieve the FFT in a better execution time due to a significant reduction of $N/8$ of the needed twiddle factors and to additional factorizations.
Abstract: The native implementation of the N-point digital Fourier Transform involves calculating the scalar product of the sample buffer (treated as an N-dimensional vector) with N separate basis vectors. Since each scalar product involves N multiplications and N additions, the total time is proportional to $N^{2}$ , in other words, its an $O(N^{2})$ algorithm. However, it turns out that by cleverly re-arranging these operations, one can optimize the algorithm down to $O(Nlog_{2}(N))$ , which for large N makes a huge difference. The optimized version of the algorithm is called the Fast Fourier Transform, or the FFT. In this paper, we discuss about an efficient way to obtain Fast Fourier Transform algorithm (FFT). According to our study, we can eliminate some operations in calculating the FFT algorithm thanks to property of complex numbers and we can achieve the FFT in a better execution time due to a significant reduction of $N/8$ of the needed twiddle factors and to additional factorizations.

Proceedings ArticleDOI
01 Mar 2019
TL;DR: In this paper, a vibration measurement system is proposed by applying F M demodulation technique in phase-locked loop (PLL) block by combining four units; comb-resonator unit, low pass filter unit, controller unit and sampling pulse generator unit.
Abstract: In this paper, vibration measurement system is proposed by applying F M demodulation technique in phase-locked loop (PLL). The PLL block is designed by combining four units; comb-resonator unit, low pass filter unit, controller unit and sampling pulse generator unit. The cascading comb-resonator structure is derived from the standard sliding-discrete Fourier transform (SDFT) by introducing Goertzel algorithm. The high frequency carrier signal of 40kHz can be used to sense the Doppler signal and correspondingly the vibration parameters; amplitude and frequency, are estimated from extracted Doppler signal. The proposed system could estimate the vibration parameters with reduced computation complexity by removing the twiddle factor term in the recurring part of the system function. A stable system is assured by introducing a damping factor in the system function that allows the pole to shift within or on the unity circle of z −plane. A wide range of vibration amplitude and frequency can be estimated with less error. Effect of input noise can be suppressed up to small values of SNR.

Proceedings ArticleDOI
01 Nov 2019
TL;DR: An improved dynamic kernel function Fast Fourier Transform (FFT) with variable truncation scheme (VTS) is proposed, wherein the dynamickernel function deploys a twiddle factor fixed-point numerical representation, and substitutes simple shift-and-add operations for the FFT multipliers, accomplishing a hardware resource saving.
Abstract: An improved dynamic kernel function Fast Fourier Transform (FFT) with variable truncation scheme (VTS) is proposed, wherein the dynamic kernel function deploys a twiddle factor fixed-point numerical representation, and substitutes simple shift-and-add operations for the FFT multipliers, accomplishing a hardware resource saving. Using Xilinx System Generator, the improved dynamic truncation function can yield at 20% average reduction on the hardware resources. The proposed VTS dynamically scales the result of every stage in FFT and maximally uses the word length in FFT without complicated design additions. The proposed VTS also can detect weak signal compared to the method of only truncating the least significant bit (LSB).

Journal ArticleDOI
TL;DR: This study mainly deals with the design and implementation of a DFT processor with non-power-of-two (prime) problem sizes using the Rader algorithm and proves the efficiency of the algorithm and shows the trade-off to be established in terms of occupied area, throughput, latency, and power consumption.
Abstract: The implementation of a discrete Fourier transform (DFT) algorithm plays a key role in many real-time applications. This study mainly deals with the design and implementation of a DFT processor with non-power-of-two (prime) problem sizes using the Rader algorithm. The proposed design focuses on increasing the speed to fulfil the requirements of the real-time data transmission by enabling data rates up to 10 Gbps. Despite its limitation to the prime size, it remains a promising tool in the signal processing aspect and takes its place among other techniques to achieve high-speed wireless communication. By avoiding the cumbersome process during twiddle factors computation as well as the butterfly structure, the outcome preludes to an ambitious architecture dedicated to high-speed design reaching over 233 and 92 MHz for DFT lengths of 7 and 67, respectively, on Virtex 6. Thereby, the obtained results prove the efficiency of the algorithm and show the trade-off to be established in terms of occupied area, throughput, latency, and power consumption.

Patent
05 Dec 2019
TL;DR: In this paper, a circuit may include an input configured to receive a signal and a radix-23 fast Fourier transform (FFT) processing element coupled to the input.
Abstract: In some embodiments, a circuit may include an input configured to receive a signal and a radix-23 fast Fourier transform (FFT) processing element coupled to the input. The radix-23 FFT processing element may be configured to control variation of twiddle factors during calculation of a complete FFT through a plurality of processing stages. The radix- 23 FFT processing element may be configured to incorporate the twiddle factors and adder tree matrices of the calculation into a single stage.

Journal ArticleDOI
TL;DR: The input-decimation approach is presented to decrease the number of input sequences for the recursive filter so that the computation cycle of RDFT/RIDFT can be shortened to meet the computing time requirement.
Abstract: In this paper, an input-decimation technique for the recursive discrete Fourier transform (RDFT)/inverse DFT (RIDFT) algorithm is proposed for the high-speed broadband communication systems. It is worth noting that the input-decimation approach is presented to decrease the number of input sequences for the recursive filter so that the computation cycle of RDFT/RIDFT can be shortened to meet the computing time requirement ( $3.6~{\mu }s$ ) for the high-speed broadband communication systems. Therefore, the input-decimation RDFT/RIDFT algorithm is able to carry out at least 55.5% reduction of the total computation cycles compared with the considered algorithms. Furthermore, holding the advantages of input-decimation technique, the computational complexities of the real-multiplication and -addition are reduced to 41.3% and 22.2%, respectively. The area and the power consumption can be minimized by employing the cost-efficient constant multiplier with the refined signed-digit expression of twiddle factors. Finally, the physical implementation results show that the core area is $0.37\times 0.37$ mm2 with $0.18~\mu \text{m}$ CMOS process. The power consumption is 5.16 mW with the supply voltage of 1.8 V and the operating clock of 40 MHz. The proposed design can achieve 258 million of computational efficiency per unit area (CEUA) and really outperform the previous works.

Proceedings ArticleDOI
15 Apr 2019
TL;DR: This paper introduces the first quantization error analysis of a generalized mixed-radix FFT, and shows that any type of CORDIC considered in the proposed framework outperforms the conventional FFT in terms of Signal-to-Quantization-Noise Ratio (SQNR), for a given silicon area budget.
Abstract: In order to achieve higher transmission rates and system capacities, fifth generation (5G) systems as well as 802.11 ad/ay systems consider higher frequency bands (24GHz-70GHz), the so-called millimeter wave frequencies. These systems rely heavily on Orthogonal Frequency Division Multiplexing (OFDM) approach in order to realize the transmitted signal. The Fast Fourier Transform (FFT) and inverse FFT (IFFT) blocks are vital signal processing components in synthesizing the OFDM signal. Towards the objective of designing a low-complexity, small area, and low-power consumption FFT blocks, this paper proposes an extension of the algorithmic level approach, known as COordinate Rotation DIgital Computer (CORDIC)-Friendly FFT in [1], to a more practical higher order FFTs and mixed-radix FFTs. Moreover, it introduces the first quantization error analysis of a generalized mixed-radix FFT. The error analysis is used to find the optimal modified twiddle factors that reduce the FFT hardware complexity. Additionally, this paper shows that any type of CORDIC considered in the proposed framework outperforms the conventional FFT in terms of Signal-to-Quantization-Noise Ratio (SQNR), for a given silicon area budget. The proposed framework offers up to 50 dB SQNR gain compared to the conventional FFT for different FFT sizes when CORDIC rotators are employed. If Single-path Delay Feedback (SDF) pipeline architecture is used, the achieved SQNR gain is obtained at no additional hardware cost.

Patent
15 Mar 2019
TL;DR: In this paper, a selective mapping-based universal filtering multi-carrier method is proposed, which comprises the following steps: dividing a sub carrier into B sub bands, extending each sub band into different candidate sub bands and multiplying each of the candidate sub-bands with a group of randomly generated twiddle factors, filtering the sub carrier of each sub carrier with an FIR filter, selecting one sub band with the minimum peak-to-average power ratio (PAPR) from the sub bands of a sending end for transmission, superposing transmission signals, in a time domain,
Abstract: The invention provides a selective mapping-based universal filtering multi-carrier method, which comprises the following steps: dividing a sub carrier into B sub bands, extending each sub band into Udifferent candidate sub bands, multiplying each of the candidate sub bands with a group of randomly generated twiddle factors, filtering the sub carrier of each sub band with an FIR filter, selectingone sub band with the minimum peak-to-average power ratio (PAPR) from the sub bands of a sending end for transmission, superposing transmission signals, in a time domain, in the B sub bands selected from the sub bands to form a new sending signal, performing fast Fourier transform on received signals, removing an interference signal, directly recovering the sending signal from the received signalsthrough a linear equalization method, and performing twiddle factor reversing operation on an equalized signal to reconstruct an original sending signal. The PAPR performance of the selective mapping-based universal filtering multi-carrier method is effectively improved, and the selective mapping-based universal filtering multi-carrier method has a certain engineering application value.