scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A low-power, high-performance, 1024-point FFT processor

01 Mar 1999-IEEE Journal of Solid-state Circuits (IEEE)-Vol. 34, Iss: 3, pp 380-387
TL;DR: This paper presents an energy-efficient, single-chip, 1024-point fast Fourier transform (FFT) processor, which has been fabricated in a standard 0.7 /spl mu/m CMOS process and is fully functional on first-pass silicon.
Abstract: This paper presents an energy-efficient, single-chip, 1024-point fast Fourier transform (FFT) processor. The 460000-transistor design has been fabricated in a standard 0.7 /spl mu/m (L/sub poly/=0.6 /spl mu/m) CMOS process and is fully functional on first-pass silicon. At a supply voltage of 1.1 V, it calculates a 1024-point complex FFT in 330 /spl mu/s while consuming 9.5 mW, resulting in an adjusted energy efficiency more than 16 times greater than the previously most efficient known FFT processor. At 3.3 V, it operates at 173 MHz-which is a clock rate 2.6 times greater than the previously fastest rate.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
03 Jan 2005
TL;DR: New subthreshold logic and memory design methodologies are developed and demonstrated on a fast Fourier transform (FFT) processor that is designed to investigate the estimated minimum energy point.
Abstract: In emerging embedded applications such as wireless sensor networks, the key metric is minimizing energy dissipation rather than processor speed. Minimum energy analysis of CMOS circuits estimates the optimal operating point of clock frequencies, supply voltage, and threshold voltage according to A. Chandrakasan et al. (see ibid., vol.27, no.4, p.473-84, Apr. 1992). The minimum energy analysis shows that the optimal power supply typically occurs in subthreshold (e.g., supply voltages that are below device thresholds). New subthreshold logic and memory design methodologies are developed and demonstrated on a fast Fourier transform (FFT) processor. The FFT processor uses an energy-aware architecture that allows for variable FFT length (128-1024 point), variable bit-precision (8 b and 16 b) and is designed to investigate the estimated minimum energy point. The FFT processor is fabricated using a standard 0.18-/spl mu/m CMOS logic process and operates down to 180 mV. The minimum energy point for the 16-b 1024-point FFT processor occurs at 350-mV supply voltage where it dissipates 155 nJ/FFT at a clock frequency of 10 kHz.

619 citations


Cites background from "A low-power, high-performance, 1024..."

  • ...Dedicated low-power FFT processors are desired to sustain low-power requirements of various embedded applications [11]....

    [...]

Proceedings ArticleDOI
13 Sep 2004
TL;DR: Logic and memory design techniques allowing subthreshold operation are developed and demonstrated and the fabricated 1024-point FFT processor operates down to 180mV using a standard 0.18/spl mu/m CMOS logic process while using 155nJ/FFT at the optimal operating point.
Abstract: Minimizing energy requires scaling supply voltages below device thresholds. Logic and memory design techniques allowing subthreshold operation are developed and demonstrated. The fabricated 1024-point FFT processor operates down to 180mV using a standard 0.18/spl mu/m CMOS logic process while using 155nJ/FFT at the optimal operating point.

270 citations

Journal ArticleDOI
TL;DR: A novel 128-point FFT/IFFT processor for ultrawideband (UWB) systems and the proposed pipelined FFT architecture, called mixed-radix multipath delay feedback (MRMDF), can provide a higher throughput rate by using the multidata-path scheme.
Abstract: In this paper, we present a novel 128-point FFT/IFFT processor for ultrawideband (UWB) systems. The proposed pipelined FFT architecture, called mixed-radix multipath delay feedback (MRMDF), can provide a higher throughput rate by using the multidata-path scheme. Furthermore, the hardware costs of memory and complex multipliers in MRMDF are only 38.9% and 44.8% of those in the known FFT processor by means of the delay feedback and the data scheduling approaches. The high-radix FFT algorithm is also realized in our processor to reduce the number of complex multiplications. A test chip for the UWB system has been designed and fabricated using 0.18-/spl mu/m single-poly and six-metal CMOS process with a core area of 1.76/spl times/1.76 mm/sup 2/, including an FFT/IFFT processor and a test module. The throughput rate of this fabricated FFT processor is up to 1 Gsample/s while it consumes 175 mW. Power dissipation is 77.6 mW when its throughput rate meets UWB standard in which the FFT throughput rate is 409.6 Msample/s.

220 citations


Cites background from "A low-power, high-performance, 1024..."

  • ...Various FFT architectures, such as single-memory architecture, dual-memory architecture [2], pipelined architecture [3], array architecture [4], and cached-memory architecture [5], have been proposed in the last three decades....

    [...]

Journal ArticleDOI
TL;DR: The resulting soft digital signal processing system achieves up to 60% and 44% energy savings with no loss in the signal-to-noise ratio (SNR) for receive filtering in a QPSK system and the butterfly of fast Fourier transform in a WLAN OFDM system.
Abstract: In this paper, we present a novel algorithmic noise-tolerance (ANT) technique referred to as reduced precision redundancy (RPR). RPR requires a reduced precision replica whose output can be employed as the corrected output in case the original system computes erroneously. When combined with voltage overscaling (VOS), the resulting soft digital signal processing system achieves up to 60% and 44% energy savings with no loss in the signal-to-noise ratio (SNR) for receive filtering in a QPSK system and the butterfly of fast Fourier transform (FFT) in a WLAN OFDM system, respectively. These energy savings are with respect to optimally scaled (i.e., the supply voltage equals the critical voltage V/sub dd-crit/) present day systems. Further, we show that the RPR technique is able to maintain the output SNR for error rates of up to 0.09/sample and 0.06/sample in an finite impulse response filter and a FFT block, respectively.

196 citations


Cites background or methods from "A low-power, high-performance, 1024..."

  • ...format requires the eventual truncation of the butterfly outputs when writing the outputs back to memory, computation of all the multiplier output bits is unnecessary [23]....

    [...]

  • ...The processor’s datapath computes one complex radix-2 DIT butterfly per cycle [23]....

    [...]

Journal ArticleDOI
TL;DR: A novel 128/64 point fast Fourier transform (FFT)/ inverse FFT (IFFT) processor for the applications in a multiple-input multiple-output orthogonal frequency-division multiplexing based IEEE 802.11n wireless local area network baseband processor.
Abstract: In this paper, we present a novel 128/64 point fast Fourier transform (FFT)/ inverse FFT (IFFT) processor for the applications in a multiple-input multiple-output orthogonal frequency-division multiplexing based IEEE 802.11n wireless local area network baseband processor. The unfolding mixed-radix multipath delay feedback FFT architecture is proposed to efficiently deal with multiple data sequences. The proposed processor not only supports the operation of FFT/IFFT in 128 points and 64 points but can also provide different throughput rates for 1-4 simultaneous data sequences to meet IEEE 802.11n requirements. Furthermore, less hardware complexity is needed in our design compared with traditional four-parallel approach. The proposed FFT/IFFT processor is designed in a 0.13-mum single-poly and eight-metal CMOS process. The core area is 660times2142 mum2 , including an FFT/IFFT processor and a test module. At the operation clock rate of 40 MHz, our proposed processor can calculate 128-point FFT with four independent data sequences within 3.2 mus meeting IEEE 802.11n standard requirements

143 citations


Cites background from "A low-power, high-performance, 1024..."

  • ...[2], pipelined architecture [3], array architecture [4], and cached-memory architecture [5], have been proposed....

    [...]

References
More filters
Book
01 Dec 1989
TL;DR: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today.
Abstract: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today. In this edition, the authors bring their trademark method of quantitative analysis not only to high-performance desktop machine design, but also to the design of embedded and server systems. They have illustrated their principles with designs from all three of these domains, including examples from consumer electronics, multimedia and Web technologies, and high-performance computing.

11,671 citations


"A low-power, high-performance, 1024..." refers background in this paper

  • ...It is well known that data caches increase the effective bandwidth to a memory—but only if the memory access pattern exhibits sufficient locality [ 10 ]....

    [...]

Book
01 Jan 1989
TL;DR: In this paper, the authors provide a thorough treatment of the fundamental theorems and properties of discrete-time linear systems, filtering, sampling, and discrete time Fourier analysis.
Abstract: For senior/graduate-level courses in Discrete-Time Signal Processing. THE definitive, authoritative text on DSP -- ideal for those with an introductory-level knowledge of signals and systems. Written by prominent, DSP pioneers, it provides thorough treatment of the fundamental theorems and properties of discrete-time linear systems, filtering, sampling, and discrete-time Fourier Analysis. By focusing on the general and universal concepts in discrete-time signal processing, it remains vital and relevant to the new challenges arising in the field --without limiting itself to specific technologies with relatively short life spans.

10,388 citations

Book
01 Jan 1975
TL;DR: Feyman and Wing as discussed by the authors introduced the simplicity of the invariant imbedding method to tackle various problems of interest to engineers, physicists, applied mathematicians, and numerical analysts.
Abstract: sprightly style and is interesting from cover to cover. The comments, critiques, and summaries that accompany the chapters are very helpful in crystalizing the ideas and answering questions that may arise, particularly to the self-learner. The transparency in the presentation of the material in the book equips the reader to proceed quickly to a wealth of problems included at the end of each chapter. These problems ranging from elementary to research-level are very valuable in that a solid working knowledge of the invariant imbedding techniques is acquired as well as good insight in attacking problems in various applied areas. Furthermore, a useful selection of references is given at the end of each chapter. This book may not appeal to those mathematicians who are interested primarily in the sophistication of mathematical theory, because the authors have deliberately avoided all pseudo-sophistication in attaining transparency of exposition. Precisely for the same reason the majority of the intended readers who are applications-oriented and are eager to use the techniques quickly in their own fields will welcome and appreciate the efforts put into writing this book. From a purely mathematical point of view, some of the invariant imbedding results may be considered to be generalizations of the classical theory of first-order partial differential equations, and a part of the analysis of invariant imbedding is still at a somewhat heuristic stage despite successes in many computational applications. However, those who are concerned with mathematical rigor will find opportunities to explore the foundations of the invariant imbedding method. In conclusion, let me quote the following: "What is the best method to obtain the solution to a problem'? The answer is, any way that works." (Richard P. Feyman, Engineering and Science, March 1965, Vol. XXVIII, no. 6, p. 9.) In this well-written book, Bellman and Wing have indeed accomplished the task of introducing the simplicity of the invariant imbedding method to tackle various problems of interest to engineers, physicists, applied mathematicians, and numerical analysts.

3,249 citations

Journal ArticleDOI
TL;DR: In this paper, techniques for low power operation are presented which use the lowest possible supply voltage coupled with architectural, logic style, circuit, and technology optimizations to reduce power consumption in CMOS digital circuits while maintaining computational throughput.
Abstract: Motivated by emerging battery-operated applications that demand intensive computation in portable environments, techniques are investigated which reduce power consumption in CMOS digital circuits while maintaining computational throughput. Techniques for low-power operation are shown which use the lowest possible supply voltage coupled with architectural, logic style, circuit, and technology optimizations. An architecturally based scaling strategy is presented which indicates that the optimum voltage is much lower than that determined by other scaling considerations. This optimum is achieved by trading increased silicon area for reduced power consumption. >

2,690 citations

Journal Article
TL;DR: An architecturally based scaling strategy is presented which indicates that the optimum voltage is much lower than that determined by other scaling considerations, and is achieved by trading increased silicon area for reduced power consumption.
Abstract: Motivated by emerging battery-operated applications that demand intensive computation in portable environments, techniques are investigated which reduce power consumption in CMOS digital circuits while maintaining computational throughput Techniques for low-power operation are shown which use the lowest possible supply voltage coupled with architectural, logic style, circuit, and technology optimizations An architecturally based scaling strategy is presented which indicates that the optimum voltage is much lower than that determined by other scaling considerations This optimum is achieved by trading increased silicon area for reduced power consumption >

2,337 citations


"A low-power, high-performance, 1024..." refers background in this paper

  • ...Unfortunately, a lower supply voltage reduces circuit performance....

    [...]