scispace - formally typeset
Search or ask a question

Showing papers on "Twiddle factor published in 2005"


Journal ArticleDOI
TL;DR: This paper proposes a new vector rotational scheme called mixed-scaling-rotation coordinate rotational digital computer (MSR-CORDIC) algorithm, which can eliminate the overhead of the scaling operations that are inevitable in existing CORDIC algorithms; hence, it can significantly reduce the total iteration number so as to improve the speed performance.
Abstract: The coordinate rotational digital computer (CORDIC) algorithm is a well-known iterative arithmetic for performing vector rotations in many digital signal processing (DSP) applications. However, the large number of iteration is a major disadvantage of this algorithm for its speed performance. Many researchers have proposed schemes to reduce the number of iterations. Nevertheless, in performing the existing CORDIC algorithms, the norm of the vector is usually enlarged so that extra scaling operations are required to deliver the normalized output. In this paper, we merge the two operation phases (microrotations and scaling phases) and propose a new vector rotational scheme called mixed-scaling-rotation coordinate rotational digital computer (MSR-CORDIC) algorithm. It can eliminate the overhead of the scaling operations that are inevitable in existing CORDIC algorithms; hence, it can significantly reduce the total iteration number so as to improve the speed performance. The proposed MSR-CORDIC can be applied to DSP applications, in which the rotational angles are known in advance [e.g., twiddle factor in fast Fourier transform (FFT) processor designs]. Moreover, most CORDIC algorithms generally suffer from the roundoff noise in the fixed-wordlength implementations. We also propose two schemes to control and reduce the impairment. Our simulation results show that the MSR-CORDIC algorithm can enhance the signal-to-quantization-noise ratio (SQNR) performance by controlling the internal dynamic range. We also investigate the first- and second-order statistical properties, including the mean and variance of the SQNR. Simulation results show that the MSR-CORDIC can enhance SQNR performance of both first- and second-order statistical properties. At the VLSI architecture level, we proposed a generalized MSR-CORDIC engine for the tradeoff between hardware complexity and quantization error performance. It can further reduce the hardware complexity when compared with the newly proposed extend elementary angle set CORDIC algorithm . The MSR-CORDIC scheme has been applied to a variable-length FFT processor design , and results in significant hardware reduction in implementing the twiddle factor operations.

81 citations


Patent
Joel Brenner1
25 Feb 2005
TL;DR: In this paper, a frequency discriminator based on a variant of the DFT transform was proposed, in which the usual twiddle factors were replaced with twiddle factor as for a DFT on a number of points which is the double as the actual number of sample points.
Abstract: Frequency discriminator based on a variant of the DFT transform in which the usual twiddle factors are replaced with twiddle factors as for a DFT on a number of points which is the double as the actual number of sample points. The DFT so modified allows half-bin frequency discrimination, with few added computational burden. Two DFT shifted of half bin with respect to the zero frequency provide a linear response of the discrimination and good immunity to noise. The discriminator is particularly useful in FLL for tracking signals in a GPS receiver.

29 citations


Proceedings ArticleDOI
01 Jan 2005
TL;DR: From the synthesis simulations of a standard 0.35 /spl mu/m CMOS SAMSUNG process, a proposed CSD constant complex multiplier achieved more than 60% area and power efficiency when compared to the conventional programmable complex multiplier.
Abstract: This paper proposes the modified radix-2/sup 4/ and the radix-4/sup 2/ FFT algorithms and efficient pipeline FFT architectures based on those algorithms for OFDM systems. The proposed pipeline FFT architectures have the same number of multipliers as that of the conventional R2/sup 2/SDF and R4SDC. However, the multiplication complexity and the ROMs for storing twiddle factors could be reduced by more than 30% and 50% respectively by replacing one half of the programmable multipliers by the newly proposed CSD constant multipliers. From the synthesis simulations of a standard 0.35 /spl mu/m CMOS SAMSUNG process, a proposed CSD constant complex multiplier achieved more than 60% area and power efficiency when compared to the conventional programmable complex multiplier. This promoted efficiency could be used to the design of a long length FFT processor in wireless OFDM applications which needs more power and area efficiency.

25 citations


Proceedings ArticleDOI
18 Jan 2005
TL;DR: The design method of a real-time FFT processor is presented and a radix-4 butterfly can be calculated in one clock cycle by optimizing algorithm of memory mapping and generation of twiddle factors.
Abstract: The design method of a real-time FFT processor is presented. By optimizing algorithm of memory mapping and generation of twiddle factors, a radix-4 butterfly can be calculated in one clock cycle. An approach to adaptive overflow control is also introduced to avoid overflow without interrupting the computing pipeline. The design is implemented on a FPGA chip and achieves the operating frequency at 127 MHz. It can complete a complex 1024-point FFT within 10.1 /spl mu/s.

21 citations


Proceedings ArticleDOI
05 Dec 2005
TL;DR: A new general method to deduce FFT algorithms is introduced, and the deduced second radix-2 decimation-in-time FFT algorithm is transformed into another parallelizable sequential form, reducing the time complexity of DFT to O(nlogn/p) (where p is the number of processors).
Abstract: Discrete Fourier transform (DFT) has many applications in digital signal and image processing and other scientific and technological domains, but its time complexity of direct computation is O(n2), limiting greatly its application range. Thus many people have developed fast Fourier transform (FFT) algorithms, reducing the complexity from O(n2) to O(nlogn)(In this paper logn denotes log2n).But for large n, O(nlogn) is still very high. So multiprocessor systems have been used to speed up the computation of DFT. This paper first introduces a new general method to deduce FFT algorithms, then transforms the deduced second radix-2 decimation-in-time FFT algorithm into another parallelizable sequential form, and finally transforms the latter algorithm into a new parallel FFT algorithm, reducing the time complexity of DFT to O(nlogn/p) (where p is the number of processors). Using similar methods, the authors can also design other new parallel 1-D and 2-D FFT algorithms.

21 citations


Patent
04 Nov 2005
TL;DR: In this article, a method of performing a fast Fourier transform (FFT) in a single-instruction-stream, multiple-data-stream (SIMD) processor includes providing n-bits of input data, and implementing j number of stages of operations.
Abstract: A method of performing a fast Fourier transform (FFT) in a single-instruction-stream, multiple-data-stream (SIMD) processor includes providing n-bits of input data, and implementing j number of stages of operations. The n-bits of input data are grouped into groups of x-bits to form i number of vectors so that i=n/x. The method includes parallel butterflies operations on vector [i] with vector [i+(n/2)] using a twiddle factor vector W t . Data sorting is performed within a processing array if a present stage j is less than y, where y is an integer less than a maximum value of j. The parallel butterflies operations and data sorting are repeated i times, then the process increments to the next stage j. The parallel butterflies operations, data sorting and incrementing are repeated (j−1) times to generate a transformed result and then the transformed result is output.

20 citations


Patent
08 Aug 2005
TL;DR: In this article, a system and method Fast Fourier Transform (FFT) method in a multi-mode wireless processing system is presented. And the method can include loading an input vector into an input buffer, initializing a second counter and a variable N, where N = log 2 (input vector size), and s is the value of the second counter, performing an FFT stage, and comparing s to N and performing additional FFT stages until s=N.
Abstract: A system and method Fast Fourier Transform (FFT) method in a multi-mode wireless processing system. The method can include loading an input vector into an input buffer, initializing a second counter and a variable N, where N=log2 (input vector size), and s is the value of the second counter, performing an FFT stage, and comparing s to N and performing additional FFT stages until s=N. Performing the FFT stage can include performing vector operations on data in the input buffer and sending results to an output buffer, the data in the input buffer comprising a plurality of segments, advancing the value of the second counter; and switching roles of the input and output buffers. The vector operations can include performing Radix-4 FFT vector operations on the four input data at a time and multiplying the resulting output vectors with a Twiddle factor.

17 citations


Proceedings ArticleDOI
18 Mar 2005
TL;DR: It can be shown that all the possible split-radix FFT algorithms of the type radix-2/sup r//2/Sup rs/ for computing a 2/sup m/-point DFT require exactly the same number of arithmetic operations.
Abstract: A radix-2/16 decimation-in-frequency (DIF) fast Fourier transform (FFT) algorithm and its higher radix version, namely radix-4/16 DIF FFT algorithm, are proposed by suitably mixing the radix-2, radix-4 and radix-16 index maps, and combing some of the twiddle factors. It is shown that the proposed algorithms and the existing radix-2/4 and radix-2/8 FFT algorithms require exactly the same number of arithmetic operations (multiplications+additions). Moreover, by using techniques similar to these, it can be shown that all the possible split-radix FFT algorithms of the type radix-2/sup r//2/sup rs/ for computing a 2/sup m/-point DFT require exactly the same number of arithmetic operations.

16 citations


Proceedings ArticleDOI
01 May 2005
TL;DR: This paper proposes a novel FFT based finite field multiplier based on the fast Fourier transform that performs polynomial multiplication in O(nlog(n) time compared to the classical method time of O( n2).
Abstract: Finite field multiplication is one of the most useful arithmetic operations and has applications in many areas such as signal processing, coding theory and cryptography. However, it is also one of the most time consuming operations in both software and hardware, which makes it pertinent to develop a fast and efficient implementation. In this paper, we propose a novel FFT based finite field multiplier to address this problem. The fast Fourier transform (FFT) is the collection of computationally efficient algorithms that perform the discrete Fourier transform (DFT). For our purposes, we will use its efficient computation for polynomial multiplication. The FFT performs polynomial multiplication in O(nlog(n)) time compared to the classical method time of O(n2). The idea of using the FFT for finite field multiplication has been researched extensively, but to our knowledge, this is the first implementation in hardware

14 citations


Proceedings ArticleDOI
01 Jan 2005
TL;DR: Performance analyses reveal that the proposed design can outperform other MDCT/IMDCT designs in terms of memory storage size, computing latency and fixed point implementation error.
Abstract: This paper presents a novel MDCT/IMDCT algorithm and its hardware design. In algorithm derivation, the MDCT/IMDCT computation is first converted into a form of matrix multiplication consisting of a half size DCT-IV kernel and a projection matrix. The DCT-IV kernel is then realized by a fast DCT-II computing scheme. Since MDCT and IMDCT algorithms use the same DCT kernel, a unified architecture using the same set of twiddle factors can be employed for both computations. Based on the proposed algorithm, a novel design mapping is developed with emphasis on the reduction of hardware and memory access complexities. By careful scheduling in computation and memory access schemes, only single port memory modules are needed in lieu of expensive dual port memories. Performance analyses reveal that, given the comparable hardware resource allocation, the proposed design can outperform other MDCT/IMDCT designs in terms of memory storage size, computing latency and fixed point implementation error.

13 citations


Patent
04 Aug 2005
TL;DR: In this article, a 3780-point DFT processor is proposed, which is decomposed into 140 and 27-point modules, and no twiddle factor multiplication is required.
Abstract: A 3780-point DFT processor is proposed, which is decomposed into 140 and 27-point DFT modules, and no twiddle factor multiplication is required. The 140-point DFT module is computed using nested WFTA, which includes a pre-add/sub module, a real multiplication module and a post-add/sub module. The 27-point DFT module is constructed by 9 and 3-point WFTA using Cooley-Turkey algorithm.

Proceedings ArticleDOI
19 Dec 2005
TL;DR: This paper proposes an alternate instance of padding zeros to the data sequence that results in computational cost reduction to O(pNlog2 N) and can be used to achieve non-uniform upsampling that would zoom-in or zoom-out a particular frequency band.
Abstract: The classical Cooley-Tukey fast Fourier transform (FFT) algorithm has the computational cost of O(Nlog2N) where N is the length of the discrete signal. Spectrum resolution is improved through padding zeros at the tail of the discrete signal, if (p -1)N zeros are padded (where p is an integer) at the tail of the data sequence, the computational cost through FFT becomes O(pNlog2pN). This paper proposes an alternate instance of padding zeros to the data sequence that results in computational cost reduction to O(pNlog2 N). It has been noted that this modification can be used to achieve non-uniform upsampling that would zoom-in or zoom-out a particular frequency band, in addition, it may be used for pruning the spectrum, which would reduce resolution of an unimportant frequency band

Book ChapterDOI
11 Sep 2005
TL;DR: In this paper, a hybrid MPI/OpenMP implementation of a parallel three-dimensional fast Fourier transform (FFT) algorithm on SMP clusters is presented, which can be altered to create a block 3D FFT algorithm in order to reduce the number of cache misses.
Abstract: In the present paper, we propose a hybrid MPI/OpenMP implementation of a parallel three-dimensional fast Fourier transform (FFT) algorithm on SMP clusters. The three-dimensional FFT algorithm can be altered to create a block three-dimensional FFT algorithm in order to reduce the number of cache misses. We then use the obtained block three-dimensional FFT algorithm to implement the parallel three-dimensional FFT. We succeeded in obtaining a performance of over 14 GFLOPS on the AIST Super Cluster M-64 (using 32 nodes out of 132 available, Itanium2 1.3 GHz, 4-way SMP).

Proceedings ArticleDOI
20 Mar 2005
TL;DR: The no-communication algorithm is presented that is a parallel algorithm for 1-D FFT without inter-processors communication and shows that the no- communication algorithm performs better than the 4-step FFT for relatively small data sizes.
Abstract: Computing 1-D fast Fourier transform (FFT) using the classical 4-step FFT on parallel computers requires intensive all-to-all communication. This all-to-all communication significantly reduces the performance of FFT. In this paper, we present the no-communication algorithm that is a parallel algorithm for 1-D FFT without inter-processors communication. The advantage of this algorithm is the absence of all-to-all communication between processors. The disadvantage of this algorithm is the extra computation compared to the classical 4-step FFT. The no-communication algorithm has been implemented and tested in 8-node symmetric multiprocessors (SMP). The results show that the no-communication algorithm performs better than the 4-step FFT for relatively small data sizes. However, 4-step FFT algorithm performs better than the no-communication for relatively large data sizes.

Patent
12 Jan 2005
TL;DR: In this paper, the authors proposed a method to construct a pilot frequency matrix in any size, which can be applied to channel estimation of MIMO system in time domain and frequency domain.
Abstract: The method includes steps: determining size of pilot frequency matrix based on realistic model of channel and channel length estimated; next, determining twiddle factor and generating exponential vectors based on size of pilot frequency matrix; then, obtaining first low of pilot frequency matrix fro the said twiddle factor and exponential vectors; finally, whole matrix is obtained through cyclic shift replicating first low of pilot frequency matrix; thus, whole template of pilot frequency signal is ensured, and construction of pilot frequency signal is completed. The invention can structure pilot frequency matrix in any size, can be applied to channel estimation of MIMO system in time domain and frequency domain. Advantages are: good autocorrelation between lows and columns, weak cross correlation, even matrix energy distribution, and restraining channel noise.

Journal Article
TL;DR: A pipeline processor which may compute various 2~n points FFT is proposed for continuously performing complex points fast Fourier transforms (FFTs); it is shown that any FFT whose sizes are powers of the pipeline's radix can be performed.
Abstract: A pipeline processor which may compute various 2~n points FFT is proposed for continuously performing complex points fast Fourier transforms (FFTs). The processor consists of several stages of butterfly computational elements connected with ping-pang RAMs that reorder the data between the butterfly stages. By properly ordering the input data to the pipeline and addressing the twiddle factors ROM, and by controlling the stage's operating status, it is shown that any FFT whose sizes are powers of the pipeline's radix can be performed. Using block-floating point arithmetic, the processor can provide a high quality. The design is written in VHDL at RTL level, and implemented on a single FPGA chip. The processor can operate at (80 MHz,) and compute a 1 024 complex points FFT in 12.8 μs when operating continuously.

Patent
28 Jul 2005
TL;DR: In this paper, a linear Fourier transform program is provided with steps of decomposing a data length N of linear data into a product N1×N2×...×Nm of factors.
Abstract: PROBLEM TO BE SOLVED: To increase the speed of a linear Fourier transform algorithm for scalar computers. SOLUTION: A linear Fourier transform program is provided with steps of; decomposing a data length N of linear data into a product N1×N2×...×Nm of factors; defining the data length N as P×Q and calculating the first one of Q-1 twiddle factors for multiplication to Fourier transform of each of P linear data out of twiddle factors required for multiplication to Fourier transform results of P linear data having a length Q while varying P and Q and storing calculated twiddle factors in a table; and using the table where the calculated twiddle factors are stored, to perform Fourier transform of data in m times of phases. COPYRIGHT: (C)2005,JPO&NCIPI

Journal Article
TL;DR: A discrete Hartley transform-based FFT/IFFT method is discussed, where the arithmetic operations are all real-valued, and comparing with the complex-valued FFT, the requirements of RAM,multiplier and adder are highly reduced.
Abstract: In multi-carrier modulation of OFDM/DMT,the Fourier transform is real-valued FFT,and the IFFT is the inverse transform of real-valued FFT.The realization of Real-valued FFT is much different from the Complex-valued FFT.In this paper,a discrete Hartley transform-based FFT/IFFT method is discussed.The arithmetic operations are all real-valued,and comparing with the complex-valued FFT,the requirements of RAM,multiplier and adder are highly reduced.

17 Nov 2005
TL;DR: The algorithm derived in this paper is derived from a Cooley decimation-in-time algorithm by using an appropriate indexing process and it is proved that the number of multiplications necessary to compute the proposed algorithm is significantly reduced while theNumber of additions remains almost identical to that of conventional 2D FFT's.
Abstract: In this paper, we propose a new approach for computing 2D FFT's that are suitable for implementation on a systolic array architectures. Our algorithm is derived in this paper from a Cooley decimation-in-time algorithm by using an appropriate indexing process. It is proved that the number of multiplications necessary to compute our proposed algorithm is significantly reduced while the number of additions remains almost identical to that of conventional 2D FFT's. Comparison results shows the powerful performance of the new 2D FFT algorithm against the row-column FFT transform.

Journal Article
Liu Dichen1
TL;DR: A Fast Fourier transformation(FFT)algorithm based on complex sequences is brought forward in te paper and can meet demands of on-line measurement for power quality and reduce amount of calculation.
Abstract: A Fast Fourier transformation(FFT)algorithm based on complex sequences is brought forward in te paper. The algoritm can meet demands of on-line measurement for power quality and reduce amount of calculation. Compared wit traditional Fast Fourier transformation, the algorithm reduces half amount of transformation so as to decrease demand to calculation speed of CPU when it is used to analyze voltage and current data with same length. So the algorithm is applied on the on-line measurement apparatus effectively.

Journal Article
TL;DR: Three algorithms and processor structures of 3780-point FFT are proposed: the 4096-pointFFT algorithm with interpolation, blend FFT algorithm and compositive decomposition algorithm, respectively.
Abstract: The 3780-point FFT introduced in this paper is a key module in the Terrestrial Digital Multimedia/Television transmission system. The 3780-point FFT can′t use the sophisticated radix-2 and radix-4 algorithms directly. In this paper, we propose three algorithms and processor structures of 3780-point FFT. They are the 4096-point FFT algorithm with interpolation, blend FFT algorithm and compositive decomposition algorithm, respectively. We discuss the strong points and drawbacks of each method.

Proceedings ArticleDOI
15 Aug 2005
TL;DR: Different free space propagation algorithms based on fast Fourier transform used to calculate 1D and 2D difiaction patterns in near field and far field as well as their advantages and drawbacks are compared.
Abstract: In this paper we compare different free space propagation algorithms based on fast Fourier transform (FFT). They are used to calculate 1D and 2D difiaction patterns in near field and far field as well. Four algorithms are considered: angular spectrum propagation, direct integral formulation, fractional Fourier transform using single FFT (S_FFT) and fractional Fourier transform using two FFT (D_FFT). We compare these algorithms and discuss their advantages and drawbacks for one and two-dimensional objects.

Patent
10 Feb 2005
TL;DR: In this paper, a method for frequency domain transformation of signals based on a discrete Fourier transform was proposed. But this method requires the signal to be transformed as a sum of complex periodic exponential functions and the complex periodic functions are formed as product of the signal and twiddle factors.
Abstract: Method for frequency domain transformation of signals based on a discrete Fourier transform, whereby the signal to be transformed is formed as a sum of complex periodic exponential functions and the complex periodic exponential functions are formed as product of the signal and twiddle factors. The signal to be transformed is multiplied with a reduced number of twiddle factors. An independent claim is made for an arrangement for frequency domain transformation of signals.

DOI
01 Jan 2005
TL;DR: The main objective of this work is to design, simulate and implement an architecture based on the Twiddle-Eactor-B ased decomposition EFT algorithm that is said to be more power efficient and to compute in much lesser number of clock cycles than other algorithms developed.
Abstract: Design and Implementation of a Fast Fourier Transform Architecture using Twiddle Factor Based decomposition Algorithm by Bhaarath Kumar Dr. Yingtao Jiang, Examination Committee Chair Assistant Professor Department o f Electrical & Computer Engineering University o f Nevada, Las Vegas W ith the advent o f signal processing and wireless communication mobile platform devices, the necessity for data transformation from one form to another becomes an unavoidable aspect. One such mathematical tool that is w idely used for transforming time and frequency domain signals is Eourier Transform. Fast Fourier Transform (EFT) is perhaps the fastest way to achieve transformation. Many algorithms and architectures have been designed over the years in an attempt to make EFT algorithms more efficient and to target many applications. The main objective o f our work is to design, simulate and implement an architecture based on the Twiddle-Eactor-B ased decomposition EFT algorithm. The significant feature o f the algorithm is its effective memory access reduction that accounts to be as much as 30% lesser than in any other conventional EFT algorithms. As a result o f this memory reduction, this algorithm is said to be more power efficient and is said to compute in much lesser number o f clock cycles than other algorithms developed.