scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Analysis and design of low power radix-4 FFT processor using pipelined architecture

TL;DR: Pipelined architecture with low power techniques like sign swap, sub expression elimination along with several area reduction techniques like “In Place” addressing, single butterfly element per stage using the pipelined architecture are presented.
Abstract: Fast Fourier Transform is an elevated form of Discrete Fourier Transform which is much simpler, effective, and faster with lesser number of computations has dominated in various fields. As the gate length of CMOS is going deeper and deeper into Ultra Deep Sub-Micron (UDSM) the leakage power which was negligible before is tending towards the dynamic power range, increasing the requirement of low power devices. This paper presents several low power techniques like sign swap, sub expression elimination along with several area reduction techniques like “In Place” addressing, single butterfly element per stage using the pipelined architecture. In this paper pipelined architecture with low power techniques is implemented on both radix-2 and radix-4 FFT processor and compared. Results shows that pipelined Radix-4 FFT consumes 11% less power compared to radix-2 FFT for 16 point implementation.
Citations
More filters
Proceedings ArticleDOI
01 Feb 2016
TL;DR: The main results show that the optimization guarantees reduced power consumption for radix-2 butterfly, when compared with previous works from the literature.
Abstract: In the FFT computation, the butterflies play a central role, since they allow calculation of complex terms. In this calculation, involving multiplications of input data with appropriate coefficients, the optimization of the butterfly can contribute for the reduction of power consumption of FFT architectures. In this paper different and dedicated structures for the 16 bit-width radix-2 and radix-4 DIT butterflies are implemented, where the main goal is to minimize the number of arithmetic operators in order to produce power-efficient structures. Firstly, we improve a radix-2 butterfly previously presented in literature, reducing one adder and one subtractor in the structure. After that, part of this optimized radix-2 butterfly is used to reduce the number of real multipliers in the radix-4 butterfly. The main results show that the optimization guarantees reduced power consumption for radix-2 butterfly, when compared with previous works from the literature. Moreover, the use of part of the optimized radix-2 into the radix-4 structure leads to the reduction of power consumption for this structure.

20 citations

Journal ArticleDOI
TL;DR: A comparative study of efficient algorithms and architectures for FFT chip design is presented and it is recommended that mixed-radix/higher-radIX algorithm combined with Single-path Delay Commutator (SDC) architecture is appropriate for massive MIMO in 5G, optical OFDM, cooperative MIM o and multi-user MIMo-based applications.
Abstract: The Fast Fourier Transform and Inverse Fast Fourier Transform (FFT/IFFT) are the most significant digital signal processing (DSP) techniques used in Orthogonal Frequency Division Multiplexing (OFDM)-based applications which include day-to-day wired/wireless communications, broadband access, and information sharing. The advancements in telecommunication technologies require an efficient FFT/IFFT processing device to meet the necessary specifications which depend on the particular application. A real-time implementation of high-speed FFT/IFFT processor with less area that operates in minimal power consumption is essential in designing an OFDM integrated chip. A comparative study of efficient algorithms and architectures for FFT chip design is presented in this paper. It is also recommended that mixed-radix/higher-radix algorithm combined with Single-path Delay Commutator (SDC) architecture is appropriate for massive MIMO in 5G, optical OFDM, cooperative MIMO and multi-user MIMO-based applications.

18 citations

Proceedings ArticleDOI
01 Feb 2017
TL;DR: Different addition schemes are exploited in order to improve the efficiency of 16 bit-width radix-2 andRadix-4 FFT butterflies by exploiting combinations of simultaneous addition of three and seven operands.
Abstract: In FFT computation, the butterflies play a central role, since they allow the calculation of complex terms. Therefore, the optimization of the butterfly can contribute for the power reduction in FFT architectures. In this paper we exploit different addition schemes in order to improve the efficiency of 16 bit-width radix-2 and radix-4 FFT butterflies. Combinations of simultaneous addition of three and seven operands are inserted in the structures of the butterflies in order to produce power-efficient structures. The used additions schemes include Carry Save Adder (CSA), and adder compressors. The radix-2 and radix-4 butterflies were implemented in hardware description language and synthesized to 45nm Nangate Open Cell Library using Cadence RTL Compiler. The main results show that both radix-2 and radix-4 butterflies, with CSA, are more efficient when compared with the same structures with other adder circuits.

8 citations

Proceedings ArticleDOI
01 Dec 2017
TL;DR: The results show that the best proposed split-radix saves up to 47.28% of power dissipation by using 5-2 adder compressors, when compared with radix-4 butterfly using the synthesis tool adder.
Abstract: Fast Fourier Transform (FFT) is one of the most common implementations of the Discrete Fourier Transform (DFT), and it is a commonly used algorithm to process and classify data in IoT (Internet of Things) smart sensors Butterflies play a central role in FFT computation since they allow calculation of complex terms In this work, we propose a power-efficient hardware architecture for 16-bit split-radix DIT (Decimation in Time) butterflies Based on the results we show that the 16-bit split-radix butterfly hardware architecture is more power-efficient than 16-bit radix-4 one Moreover, we are able to improve the power-efficiency of the split-radix butterfly by using efficient 5-2 adder compressors The results show that our best proposed split-radix saves up to 4728% of power dissipation by using 5-2 adder compressors, when compared with radix-4 butterfly using the synthesis tool adder

8 citations


Additional excerpts

  • ...Most low power implementations of FFT algorithm from the literature aims at optimizing the entire FFT architecture by using techniques such as pipelining, the reuse of butterflies in sequential and semi-parallel structures, or even reordering the twiddle factors, such as in [11], and [12]....

    [...]

References
More filters
Book
01 Feb 1996
TL;DR: In this paper, the authors present an overview of the design of Verilog HDLs and its application in computer aided digital design (CADD), including the following: 1. Hierarchical Modeling Concepts.
Abstract: PART I. BASIC VERILOG TOPICS. 1. Overview of Digital Design with Verilog HDL. Evolution of Computer Aided Digital Design. Emergence of HDLs. Typical Design Flow. Importance of HDLs. Popularity of Verilog HDL. Trends in HDLs. 2. Hierarchical Modeling Concepts. Design Methodologies. 4-bit Ripple Carry Counter. Modules. Instances. Components of a Simulation. Example. Design Block. Stimulus Block. Summary. Exercises. 3. Basic Concepts. Lexical Conventions. Whitespace. Comments. Operators. Number Specification. Sized numbers. Unsized numbers. X or Z values. Negative numbers. Underscore characters and question marks. Strings. Identifiers and Keywords. Escaped Identifiers. Data Types. Value Set. Nets. Registers. Vectors. Integer , Real, and Time Register Data Types. Integer. Real Time. Arrays. Memories. Parameters. Strings. System Tasks and Compiler Directives. System Tasks. Displaying information. Monitoring information. Stopping and finishing in a simulation. Compiler Directives. 'define. 'include. Summary. Exercises. 4. Modules and Ports. Modules. Ports. List of Ports. Port Declaration. Port Connection Rules. Inputs. Outputs. Inouts. Width matching. Unconnected ports. Example of illegal port connection. Connecting Ports to External Signals. Connecting by ordered list. Connecting ports by name. Hierarchical Names. Summary. Exercises. 5. Gate-Level Modeling. Gate Types. And/Or Gates. Buf/Not Gates. Bufif/notif. Examples. Gate-level multiplexer. 4-bit full adder. Gate Delays. Rise, Fall, and Turn-off Delays. Rise delay. Fall delay. Turn-off delay. Min/Typ/Max Values. Min value. Typ val. Max value. Delay Example. Summary. Exercises. 6. Dataflow Modeling. Continuous Assignments. Implicit Continuous Assignment. Delays. Regular Assignment Delay. Implicit Continuous Assignment Delay. Net Declaration Delay. Expressions, Operators, and Operands. Expressions. Operands. Operators. Operator Types. Arithmetic Operators. Binary operators. Unary operators. Logical Operators. Relational Operators. Equality Operators. Bitwise Operators. Reduction Operators. Shift Operators. Concatenation Operator. Replication Operator. Conditional Operator. Operator Precedence. Examples. 4-to-1 Multiplexer. Method 1: logic equation. Method 2: conditional operator. 4-bit Full Adder. Method 1: dataflow operators. Method 2: full adder with carry lookahead. Ripple Counter. Summary. Exercises. 7. Behavioral Modeling. Structured Procedures. Initial Statement. Always Statement. Procedural Assignments. Blocking assignments. Nonblocking Assignments. Application of nonblocking assignments. Timing Controls. Delay-Based Timing Control. Regular delay control. Intra-assignment delay control. Zero delay control. Event-Based Timing Control. Regular event control. Named event control. Event OR control. Level-Sensitive Timing Control. Conditional Statements. Multiway Branching. Case Statement. Casex, casez Keywords. Loops. While Loop. For Loop. Repeat Loop. Forever loop. Sequential and Parallel Blocks. Block Types. Sequential blocks. Parallel blocks. Special Features of Blocks. Nested blocks. Named blocks. Disabling named blocks. Examples. 4-to-1 Multiplexer. 4-bit Counter. Traffic Signal Controller. Specification. Stimulus. Summary. Exercises. 8. Tasks and Functions. Differences Between Tasks and Functions. Tasks. Task Declaration and Invocation. Task Examples. Use of Input and Output Arguments. Asymmetric Sequence Generator. Functions. Function Declaration and Invocation. Function Examples. Parity calculation. Left/right shifter. Summary. Exercises. 9. Useful Modeling Techniques. Procedural Continuous Assignments. Assign and deassign. Force and release. Force and release on registers. Force and release on nets. Overriding Parameters. Defparam Statement. Module_Instance Parameter Values. Conditional Compilation and Execution. Conditional Compilation. Conditional Execution. Time Scales. Useful System Tasks. File Output. Opening a file. Writing to files. Closing files. Displaying Hierarchy. Strobing. Random Number Generation. Initializing Memory from File. Value Change Dump File. Summary. Exercises. PART II. ADVANCED VERILOG TOPICS. 10. Timing and Delays. Types of Delay Models. Distributed Delay. Lumped Delay. Pin-to-Pin Delays. Path Delay Modeling. Specify Blocks. Inside Specify Blocks. Parallel Connection. Full Connection. Specparam Statements. Conditional Path Delays. Rise, fall, and turn-off delays. Min, max, and typical delays. Handling x transitions. Timing Checks. $setup and $hold checks. $setup task. $hold task. $width Check. Delay Back-Annotation. Summary. Exercises. 11. Switch-Level Modeling. Switch-Modeling Elements. MOS Switches. CMOS Switches. Directional Switches. Power and Ground. Resistive Switches. Delay Specification on Switches. MOS and CMOS switches. Bidirectional pass switches. Specify blocks. Examples. CMOS Nor Gate. 2-to-1 Multiplexer. Simple CMOS Flip-Flop. Summary. Exercises. 12. User-Defined Primitives. UDP Basics. Parts of UDP Definition. UDP Rules. Combinational UDPs. Combinational UDP Definition. State Table Entries. Shorthand Notation for Don't Cares. Instantiating UDP Primitives. Example of a Combinational UDP. Sequential UDPs. Level-Sensitive Sequential UDPs. Edge-Sensitive Sequential UDPs. Example of a Sequential UDP. UDP Table Shorthand Symbols. Guidelines for UDP Design. Summary. Exercises. 13. Programming Language Interface. Uses of PLI. Linking and Invocation of PLI Tasks. Linking PLI Tasks. Linking PLI in Verilog-XL. Linking in VCS. Invoking PLI Tasks. General Flow of PLI Task Addition and Invocation. Internal Data Representation. PLI Library Routines. Access Routines. Mechanics of Access Routines. Types of Access Routines. Examples of Access Routines. Utility Routines. Mechanics of Utility Routines. Types of Utility Routines. Example of Utility Routines. Summary. Exercises. 14. Logic Synthesis with Verilog HDL. What Is Logic Synthesis? Impact of Logic Synthesis. Verilog HDL Synthesis. Verilog Constructs. Verilog Operators. Interpretation of a Few Verilog Constructs. The Assign statement. The if-else statement. The case statement for loops. The Function Statement. Synthesis Design Flow. RTL to Gates. RTL Description. Translation. Unoptimized Intermediate Representation. Logic Optimization. Technology Mapping and Optimization. Technology library. Design constraints. Optimized gate-level description. An Example of RTL-to-Gates. Design Sspecification. RTL description. Technology library. Design constraints. Logic synthesis. Final, Optimized, Gate-Level Description. IC Fabrication. Verification of Gate-Level Netlist. Functional Verification. Timing Verification. Modeling Tips for Logic Synthesis. Verilog Coding Style. Use meaningful names for signals and variables. Avoid mixing positive and negative edge-triggered flip-flops. Use basic building blocks vs. Use continuous assign statements. Instantiate multiplexers vs. Use if-else or case statements. Use parentheses to optimize logic structure. Use arithmetic operators *, /, and % vs. Design building blocks. Be careful with multiple assignments to the same variable. Define if-else or case statements explicitly. Design Partitioning. Horizontal partitioning. Vertical Partitioning. Parallelizing design structure. Design Constraint Specification. Example of Sequential Circuit Synthesis. Design Specification. Circuit Requirements. Finite State Machine (FSM). Verilog Description. Technology Library. Design Constraints. Logic Synthesis. Optimized Gate-Level Netlist. Verification. Summary. Exercises. PART III: APPENDICES. A. Strength Modeling and Advanced Net Definitions. B. List of PLI Routines. C. List of Keywords, System Tasks, and Compiler Directives. D. Formal Syntax Definition. E. Verilog Tidbits. F. Verilog Examples. Index.

432 citations

Book
Gary Yeap1
31 Aug 1997
TL;DR: This tutorial was developed when I developed a company wide training class Tutorial on Low Power Digital VLSI Design for designers in Motorola The feedback from the tutorial attendees helps to improve the quality of the training.
Abstract: 2011 Gary K Yeap Practical Low Power Digital Vlsi Design April 21st, 2019 c1731006c4 FPGA Architectures amp Applications Testing amp Testability Low Power VLSI Design 2002 kluwer academic publishers Gary K Yeap Practical Low Power Digital Neil Weste and K Eshragian Principles of CMOS VLSI Design A PRACTICAL LOW POWER DIGITAL VLSI DESIGN Springer April 10th, 2019 ceived when I developed a company wide training class Tutorial on Low Power Digital VLSI Design for designers in Motorola The feedback from the tutorial attendees helps to

323 citations


"Analysis and design of low power ra..." refers background in this paper

  • ...…in the more dynamic power dissipation which requires battery size to be larger which is generally not preferred for the handheld applications since the battery was not developed in the same scale as the semiconductor technology developed resulting in the requirement of the lower power devices [3]....

    [...]

Journal ArticleDOI
TL;DR: The proposed architecture takes advantage of the reduced number of operations of the RFFT with respect to the complex fast Fourier transform (CFFT), and requires less area while achieving higher throughput and lower latency.
Abstract: This paper presents a new pipelined hardware architecture for the computation of the real-valued fast Fourier transform (RFFT). The proposed architecture takes advantage of the reduced number of operations of the RFFT with respect to the complex fast Fourier transform (CFFT), and requires less area while achieving higher throughput and lower latency. The architecture is based on a novel algorithm for the computation of the RFFT, which, contrary to previous approaches, presents a regular geometry suitable for the implementation of hardware structures. Moreover, the algorithm can be used for both the decimation in time (DIT) and decimation in frequency (DIF) decompositions of the RFFT and requires the lowest number of operations reported for radix 2. Finally, as in previous works, when calculating the RFFT the output samples are obtained in a scrambled order. The problem of reordering these samples is solved in this paper and a pipelined circuit that performs this reordering is proposed.

130 citations


"Analysis and design of low power ra..." refers methods in this paper

  • ...…signal processing based systems like Orthogonal Frequency Division Multiplexing (OFDM) and Software Defined Radio where pipelined hardware architecture are extensively used in the FFT computation which provides a advantage of high throughput, reasonable reduction in power, reduced latency [6]....

    [...]

Proceedings ArticleDOI
01 Dec 2006
TL;DR: A tool aimed at generating fast Fourier transform cores targeting FPGA platforms was presented and a set of accurate estimators has been implemented to allow the designer an early and quick design space exploration before synthesizing the core.
Abstract: In this paper a tool aimed at generating fast Fourier transform (FFT) cores targeting FPGA platforms was presented The tool is able to generate different pipelined architectures of the FFT that provide different points of the design space: from high performance to low area implementations The user can select the most suitable architecture based on a broad set of configuration parameters, as they are the number of points, sample size, truncation, etc Moreover, a set of accurate estimators has been implemented to allow the designer an early and quick design space exploration before synthesizing the core Experimental results validate our approach and provide significant measurements about the accuracy of the estimation and the tool execution time

7 citations


"Analysis and design of low power ra..." refers background in this paper

  • ...He was the one who envisioned that count of transistors doubles for every 18 months as a result of which chip density increases resulting in the high speed applications [1]....

    [...]