scispace - formally typeset
Search or ask a question

Showing papers on "Adder published in 2002"


Journal ArticleDOI
TL;DR: A performance analysis of 1-bit full-adder cell is presented, after the adder cell is anatomized into smaller modules, and several designs of each of them are developed, prototyped, simulated and analyzed.
Abstract: A performance analysis of 1-bit full-adder cell is presented. The adder cell is anatomized into smaller modules. The modules are studied and evaluated extensively. Several designs of each of them are developed, prototyped, simulated and analyzed. Twenty different 1-bit full-adder cells are constructed (most of them are novel circuits) by connecting combinations of different designs of these modules. Each of these cells exhibits different power consumption, speed, area, and driving capability figures. Two realistic circuit structures that include adder cells are used for simulation. A library of full-adder cells is developed and presented to the circuit designers to pick the full-adder cell that satisfies their specific applications.

454 citations


Journal ArticleDOI
TL;DR: An architecture that performs the forward and inverse discrete wavelet transform (DWT) using a lifting-based scheme for the set of seven filters proposed in JPEG2000 using an architecture consisting of two row processors, two column processors, and two memory modules.
Abstract: We propose an architecture that performs the forward and inverse discrete wavelet transform (DWT) using a lifting-based scheme for the set of seven filters proposed in JPEG2000. The architecture consists of two row processors, two column processors, and two memory modules. Each processor contains two adders, one multiplier, and one shifter. The precision of the multipliers and adders has been determined using extensive simulation. Each memory module consists of four banks in order to support the high computational bandwidth. The architecture has been designed to generate an output every cycle for the JPEG2000 default filters. The schedules have been generated by hand and the corresponding timings listed. Finally, the architecture has been implemented in behavioral VHDL. The estimated area of the proposed architecture in 0.18-/spl mu/ technology is 2.8 nun square, and the estimated frequency of operation is 200 MHz.

350 citations


Journal ArticleDOI
TL;DR: This paper proposes a technique to build a total of 41 new 10-transistor full adders using novel XOR and XNOR gates in combination with existing ones to reduce the threshold-voltage loss of the pass transistors.
Abstract: Full adders are important components in applications such as digital signal processors (DSP) architectures and microprocessors. In this paper, we propose a technique to build a total of 41 new 10-transistor full adders using novel XOR and XNOR gates in combination with existing ones. We have done over 10,000 HSPICE simulation runs of all the different adders in different input patterns, frequencies, and load capacitances. Almost all those new adders consume less power in high frequencies, while three new adders consistently consume on average 10% less power and have higher speed compared with the previous 10-transistor full adder and the conventional 28-transistor CMOS adder. One draw back of the new adders is the threshold-voltage loss of the pass transistors.

306 citations


Journal ArticleDOI
TL;DR: In this paper, the authors presented three new residue-to-binary converters for the residue number system (2/sup n/-1, 2 /sup n/, 2/Sup n/+1) using 2n-bit or n-bit adders with improvements on speed, area, or dynamic range compared with various previous converters.
Abstract: Based on an algorithm derived from the new Chinese remainder theorem I, we present three new residue-to-binary converters for the residue number system (2/sup n/-1, 2/sup n/, 2/sup n/+1) designed using 2n-bit or n-bit adders with improvements on speed, area, or dynamic range compared with various previous converters. The 2n-bit adder based converter is faster and requires about half the hardware required by previous methods. For n-bit adder-based implementations, one new converter is twice as fast as the previous method using a similar amount of hardware, whereas another new converter achieves improvement in either speed, area, or dynamic range compared with previous converters.

195 citations


Journal ArticleDOI
TL;DR: The results show that, except for short chains of blocks or for cases where minimum power consumption is desired, topologies with only pass transistors or transmission gates are not attractive, and the most interesting implementations in terms of trade off between power and delay are the traditional CMOS and mirror topologies.
Abstract: In this paper the main topologies of one-bit full adders, including the most interesting of those recently proposed, are analyzed and compared for speed, power consumption, and power-delay product. The comparison has been performed on two classes of circuits, the former with minimum transistor size to minimize power consumption, the latter with optimized transistor dimension to minimize power-delay product. The investigation has been carried out with properly defined simulation runs on a Cadence environment using a 0.35-/spl mu/m process, also including the parasitics derived from layout. Performance has been also compared for different supply voltage values. Thus design guidelines have been derived to select the most suitable topology for the design features required. This paper also proposes a novel figure of merit to realistically compare n-bit adders implemented as a chain of one-bit full adders. The results differ from those previously published both for the more realistic simulations carried out and the more appropriate figure of merit used. They show that, except for short chains of blocks or for cases where minimum power consumption is desired, topologies with only pass transistors or transmission gates are not attractive. In contrast, the most interesting implementations in terms of trade off between power and delay are the traditional CMOS and mirror topologies. Moreover, the dual-rail domino and the CPL allow the best speed performance.

193 citations


Patent
29 Oct 2002
TL;DR: An image processing apparatus and method which can reduce the size of circuits for alpha-blending and dithering and realize high speed processing which perform in parallel processing for finding an amount of update of present image data to be drawn with respect to image data already stored in a display buffer by using a blending coefficient in a subtractor and a multiplier and processing for adding noise data to the image data stored in the display buffer in a first adder and adding the data obtained by the two processing at a second adder so as to find data comprised of noise data added by linear interpolation
Abstract: An image processing apparatus and method which can reduce the size of circuits for alpha-blending and dithering and realize high speed processing which perform in parallel processing for finding an amount of update of present image data to be drawn with respect to image data already stored in a display buffer by using a blending coefficient in a subtractor and a multiplier and processing for adding noise data to the image data already stored in the display buffer in a first adder and adding the data obtained by the two processing at a second adder so as to find data comprised of noise data added to data obtained by linear interpolation of two colors, then extracting color valid values at a clamp circuit, thinning out the extracted data in a rounding-off circuit, and writing it back to the display buffer.

182 citations


Proceedings ArticleDOI
07 Aug 2002
TL;DR: Novel full adder circuits using Fredkin gates are proposed which have lower hardware complexity than the current state-of-the-art, while generating the additional signals required for carry skip adder architectures.
Abstract: Conservative and reversible logic gates are widely known to be compatible with revolutionary computing paradigms such as optical and quantum computing. A fundamental conservative reversible logic gate is the Fredkin gate. This paper presents efficient adder circuits based on the Fredkin gate. Novel full adder circuits using Fredkin gates air proposed which have lower hardware complexity than the current state-of-the-art, while generating the additional signals required for carry skip adder architectures. The traditional ripple carry adder and several carry skip adder topologies are compared. Theoretical performance of each adder is determined and compared. Although the variable sized block carry skip adder is determined to have shorter delay than the fixed block size carry skip adder, the performance gains are not sufficient to warrant the required additional hardware complexity.

180 citations


Journal ArticleDOI
TL;DR: Genetic algorithm-based simulations of molecular device structures in a nanocell where placement and connectivity of the internal molecular switches are not specifically directed and the internal topology is generally disordered show that it is possible to use easily fabricated nanocells as logic devices by setting theinternal molecular switch states after the topological molecular assembly is complete.
Abstract: Molecular electronics seeks to build electrical devices to implement computation - logic and memory - using individual or small collections of molecules. These devices have the potential to reduce device size and fabrication costs, by several orders of magnitude, relative to conventional CMOS. However, the construction of a practical molecular computer will require the molecular switches and their related interconnect technologies to behave as large-scale diverse logic, with input/output wires scaled to molecular dimensions. It is unclear whether it is necessary or even. possible to control the precise regular placement and interconnection of these diminutive molecular systems. This paper describes genetic algorithm-based simulations of molecular device structures in a nanocell where placement and connectivity of the internal molecular switches are not specifically directed and the internal topology is generally disordered. With some simplifying assumptions, these results show that it is possible to use easily fabricated nanocells as logic devices by setting the internal molecular switch states after the topological molecular assembly is complete. Simulated logic devices include an inverter, a NAND gate, an XOR gate and a 1-bit adder. Issues of defect and fault tolerance are addressed.

158 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present two new design methodologies for modulo 2/sup n/1 addition in the diminished-one number system, the first leads to carry look-ahead, whereas the second to parallel-prefix adder implementations.
Abstract: This paper presents two new design methodologies for modulo 2/sup n/+1 addition in the diminished-one number system. The first design methodology leads to carry look-ahead, whereas the second to parallel-prefix adder implementations. VLSI realizations of the proposed circuits in a standard-cell technology are utilized for quantitative comparisons against the existing solutions. Our results indicate that the proposed carry look-ahead adders are area and time efficient for small values of n, while for the rest values of n the proposed parallel-prefix adders are considerably faster than any other already known in the open literature.

139 citations


Patent
10 Jun 2002
TL;DR: In this paper, a 64-bit adder implemented in partially depleted silicon on insulator technology and having two levels of lookahead uses a dynamic eight-bit carry module containing a cascode evaluation tree employing a chain of source followers that feeds a sense amplifier.
Abstract: A 64-bit adder implemented in partially depleted silicon on insulator technology and having two levels of lookahead uses a dynamic eight-bit carry module containing a cascode evaluation tree employing a chain of source followers that feeds a sense amplifier, thereby obtaining benefits from high initial drive, low variation in body voltage, resulting in low variation in history-dependent delay, reduced noise sensitivity and noise-based delay.

125 citations


Journal ArticleDOI
TL;DR: It is demonstrated that, for a flexible design, it is more advantageous to use a broad class of reversible gates, called control gates, which form a generalization of Feynman's three gates.

Journal ArticleDOI
TL;DR: A new modular adder design is introduced, based on utilizing concepts developed to realize binary-based adders, that requires less area and time delay than other similar ones.
Abstract: A modular adder is a very instrumental arithmetic component in implementing online residue-based computations for many digital signal processing applications. It is also a basic component in realizing modular multipliers and residue to binary converters. Thus, the design of a high-speed and reduced-area modular adder is an important issue. In this paper, we introduce a new modular adder design. It is based on utilizing concepts developed to realize binary-based adders. VLSI layout implementations and comparative analysis showed that the hardware requirements and the time delay of the new proposed structure are significantly, less than other reported ones. A new modulo (2/sup n/+1) adder is also presented. Compared with other similar ones, this specific modular adder requires less area and time delay.

Proceedings ArticleDOI
26 May 2002
TL;DR: By introducing simplifications to multiplier graphs, the previous work on minimum adder multipliers to five adders is extended and it is shown that this is enough to express all coefficients up to 19 bits.
Abstract: By introducing simplifications to multiplier graphs we extend the previous work on minimum adder multipliers to five adders and show that this is enough to express all coefficients up to 19 bits The average savings are more than 25% for 19 bits compared with CSD multipliers The simplifications include addition reordering and vertex reduction to see that different graphs can generate the same coefficient sets Thus, fewer graphs need to be evaluated A classification of the graphs reduces the effort to search the coefficient space further

Proceedings ArticleDOI
07 Aug 2002
TL;DR: A new family of dynamic logic gates called Dual-rail Data-Driven Dynamic Logic (D/sup 4/L) is introduced, in this logic family, the synchronization clock signal has been eliminated and correct precharge and evaluation sequencing is maintained by appropriate use of data instances.
Abstract: In this paper, a new family of dynamic logic gates called Dual-rail Data-Driven Dynamic Logic (D/sup 4/L) is introduced. In this logic family, the synchronization clock signal has been eliminated and correct precharge and evaluation sequencing is maintained by appropriate use of data instances. The methodology and characteristics of D/sup 4/L are demonstrated in the design of a CLA 32-b adder and a 17-b high-speed multiplier. Based on VHDL simulations, the D/sup 4/L implemented 32-b adder has 23% less switching-activity than a comparable domino adder and for D/sup 4/L multiplier switching-activity is 14.5% less than its domino rival. HSPICE simulation in a 0.6 /spl mu/m CMOS process shows that D/sup 4/L has a 17% power saving over domino in a 32-b CLA adder design and a 10% saving in a 17-b multiplier design while a D/sup 4/L adder has 8% less delay than a domino one.

Patent
25 Dec 2002
TL;DR: In this paper, an error correction code circuit with reduced hardware complexity is positioned inside a microprocessor, and the microprocessor has a Galois field multiplier for performing Galois Field multiplication on data processed by the error-correcting code circuit.
Abstract: An error correction code circuit with reduced hardware complexity is positioned inside a microprocessor. The microprocessor has a Galois field multiplier for performing a Galois field multiplication on data processed by the error correction code circuit. The error correction code circuit has a first register for storing an input data, a plurality of calculation units, a third register for storing an output data corresponding to the input data, and a controller for controlling operation of the error correction code circuit. Each calculation unit has a Galois field adder, and a second register electrically connected to the Galois field adder. The controller transmits data of each calculation unit to the same Galois field multiplier for a corresponding Galois field multiplication, and the result outputted by the Galois field multiplier is transmitted back to the error correction code circuit.

Book ChapterDOI
02 Sep 2002
TL;DR: This paper describes a general methodology to rapidly prototype asynchronous circuits on LUT based FPGAs, based on the use and the design of a Muller gate library, and an asynchronous dual-rail adder is implemented automatically to demonstrate the potential of the methodology.
Abstract: This paper describes a general methodology to rapidly prototype asynchronous circuits on LUT based FPGAs. The main objective is to offer designers the powerfulness of standard synchronous FPGAs to prototype their asynchronous circuits or mixed synchronous/asynchronous circuits. To avoid hazard in FPGAs, the appearance of hazard in configurable logic cells is analyzed. The developed technique is based on the use and the design of a Muller gate library. It is shown how the place and route tools automatically exploit this library. Finally, an asynchronous dual-rail adder is implemented automatically to demonstrate the potential of the methodology. Several FPGA families, like Xilinx X4000, Altera Flex, Xilinx Virtex and uptodate Altera Apex are targeted.

Proceedings ArticleDOI
07 Aug 2002
TL;DR: A 32-bit adder has been designed and simulated using HSPICE Level-49 parameters of a 0.6 /spl mu/m CMOS process and simulated measurements show that the worst-case delay is 1.56 ns, demonstrating 2.1 times speed improvement in comparison to a domino dynamic logic design implemented with the same technology.
Abstract: In this paper, a new logic-design style called Pseudo Dynamic Logic (SDL) is introduced. In this logic-design style, the internal nodes of the logic circuits are not precharged to high or low values, rather the initial charges on nodes are shared to yield an intermediate precharge value for faster evaluation. A 32-bit adder has been designed and simulated using HSPICE Level-49 parameters of a 0.6 /spl mu/m CMOS process. Simulated measurements on this adder show that the worst-case delay is 1.56 ns. This demonstrates 2.1 times speed improvement in comparison to a domino dynamic logic design implemented with the same technology.

Journal ArticleDOI
01 Jun 2002
TL;DR: In this paper, a fast 5:3 compressor is derived for high-speed multiplier implementations by applying two rows of fast 2-bit adder cells to five rows in a partial product matrix.
Abstract: 3:2 counters and 4:2 compressors have been widely used for multiplier implementations. In this paper, a fast 5:3 compressor is derived for high-speed multiplier implementations. The fast 5:3 compression is obtained by applying two rows of fast 2-bit adder cells to five rows in a partial product matrix. As a design example, a 16-bit by 16-bit MAC (Multiply and Accumulate) design is investigated both in a purely logical gate implementation and in a highly customized design. For the partial product reduction, the use of the new 5:3 compression leads to 14.3% speed improvement in terms of XOR gate delay. In a dynamic CMOS circuit implementation using 0.225 μm bulk CMOS technology, 11.7% speed improvement is observed with 8.1% less power consumption for the reduction tree.

Journal ArticleDOI
TL;DR: A new implementation of high-speed 56-bit hybrid adder is proposed that directly implements group carry propagates and group carry generators without individual carry generator/propagate signals.
Abstract: In this paper, we present a general architecture for designing hybrid carry-lookahead/carry-select adders. Several previous adders in the literature are all special cases of this general architecture. They differ in the way Boolean functions for the carries are implemented. Based on the general architecture, we propose a new implementation of high-speed 56-bit hybrid adder. The new adder directly implements group carry propagates and group carry generators without individual carry generator/propagate signals. Moreover, the group carry generator/propagate signals are complemented to gain speed. The new implementation can be in static CMOS or dynamic logic style. The critical path length of our new design is about 2/3 of the critical path lengths of previous adders; therefore, higher speed can be gained.

Journal ArticleDOI
TL;DR: The improved architecture has better performance, is simpler to implement, and is easier to understand.
Abstract: Most of today's digital designs, from small-scale digital block designs to system-on-chip (SoC) designs, are based on "synchronous" design principle. Clock is the most important issue in these designs. Frequency and phase synthesis is closely related to the clock generation. A frequency and phase synthesis technique based on phase-locked loop is proposed in that delivers high performance, easy integration, and high stability. However, there are problems associated with this architecture, such as: 1) its highest deliverable frequency is limited by the speed of the accumulator and 2) the phase synthesis circuitry will not work well in certain ranges (dead zone) and in certain conditions (dual stability). This paper presents an improved architecture that addresses these problems. The new frequency synthesis circuitry has scalability for higher output frequency. It also has an internal node whose frequency is twice that of output signal. When duty cycle is not a concern, this signal can be used directly as clock source. The new phase synthesis circuitry is free of "dead zone" and "dual stability." The improved architecture has better performance, is simpler to implement, and is easier to understand.

Journal ArticleDOI
TL;DR: Results indicate that the proposed 256-bit priority encoder and the proposed 64-bit incrementer/decrementer can operate up to 116 and 139 MHz when they are designed based on a 0.6-/spl mu/m CMOS technology.
Abstract: Lookahead signals to form the multilevel folding architecture for priority-encoding-based designs was used to improve the performance to the order of O(log N). Analysis showed that both the multilevel lookahead and the multilevel folding techniques could be easily merged and implemented in the dynamic CMOS circuits. For the 256-bit priority encoder, the new design adopting all the proposed techniques can achieve nearly ten times performance while spending nearly half the power consumption as compared to the conventional design, utilizing only a simple lookahead structure. For the 64-bit incrementer/decrementer, the new design adopting all the proposed techniques requires less than one-third delay time as compared to a high-speed carry-select adder (CSA)-based incrementer/decrementer. The power consumption evaluated at the maximum operating frequency and the transistor count of the new incrementer/decrementer are also reduced to 67% and 35%, respectively, as compared to the CSA-based design. The measurement results indicate that the proposed 256-bit priority encoder and the proposed 64-bit incrementer/decrementer can operate up to 116 and 139 MHz, respectively, when they are designed based on a 0.6-/spl mu/m CMOS technology.

Journal ArticleDOI
TL;DR: This letter describes an algorithm for systematically finding a multiplierless approximation of transforms by replacing floating-point multipliers with VLSI-friendly binary coefficients of the form k/2/sup n/.
Abstract: This letter describes an algorithm for systematically finding a multiplierless approximation of transforms by replacing floating-point multipliers with VLSI-friendly binary coefficients of the form k/2/sup n/. Assuming the cost of hardware binary shifters is negligible, the total number of binary adders employed to approximate the transform can be regarded as an index of complexity. Because the new algorithm is more systematic and faster than trial-and-error binary approximations with adder constraint, it is a much more efficient design tool. Furthermore, the algorithm is not limited to a specific transform; various approximations of the discrete cosine transform are presented as examples of its versatility.

Proceedings ArticleDOI
04 Aug 2002
TL;DR: In the modified CSA, one of the n-bit adder blocks is replaced by an add-one circuit consisting of fewer transistors, which considerably reduces the power and area, with negligible speed penalty.
Abstract: A carry select adder (CSA) can be implemented by using a single adder block and an add-one circuit instead of using dual adder blocks. The add-one circuit is based on "first" zero detection logic and a few multiplexers. In the modified CSA, one of the n-bit adder blocks is replaced by an add-one circuit consisting of fewer transistors. This scheme considerably reduces the power and area, with negligible speed penalty. For 8-bit length, n=8, this modified CSA requires 38% fewer transistors and consumes only 73% of the power, compared to the conventional design, using a 0.5 /spl mu/m CMOS technology.

Journal ArticleDOI
10 Dec 2002
TL;DR: A digital circuit technique to process directly bit-stream signals from sigma-delta modulation based analogue-to-digital converters and the application of the technique to communication systems is described and a QPSK demodulator is presented.
Abstract: The paper describes a digital circuit technique to process directly bit-stream signals from sigma-delta modulation based analogue-to-digital converters and the application of the technique to communication systems. The newly developed adder and multiplier are fundamental processing circuit modules. Using the fundamental modules and up/down counters, other circuit modules, such as oscillators, dividers and square root circuits, can also be realised. Signal processors built from the modules have three advantages over multi-bit Nyquist rate processors. First, single-bit/multibit converters are not needed at the inputs of the processors because the arithmetic modules directly process the bit-stream signals. Secondly, the physical areas for routing the signals among the circuit modules are small since they are in the form of a bit-stream. Thirdly, the processors are built from a smaller number of logic gates than conventional Nyquist rate processors because of the simple structure of the circuit modules. As an application of the technique to digital signal processing for communications, a QPSK demodulator is presented. In addition to circuit simulations of the demodulator, a useful linear analysis to estimate the influence of the noise components contained in the outputs from the circuit modules on the steady-state demodulation performance is explained.

Patent
Goran Bilski1
01 Feb 2002
TL;DR: In this article, a carry chain is provided for combining the one-bit ALU circuits to generate multi-bit AlUs, where the ALU circuit has two data input signals and two operator input signals that select between the adder, subtractor and other logical functions.
Abstract: Structures and methods that implement an ALU (Arithmetic Logic Unit) circuit in a PLD (Programmable Logic Device) while using only one PLD logic cell to implement a one-bit ALU circuit. The ALU circuit has two data input signals and two operator input signals that select between the adder, subtractor, and other logical functions. A result bit provides the result of the addition, subtraction, or other logical function as selected by the values of the two operator input signals. A carry chain is provided for combining the one-bit ALU circuits to generate multi-bit ALUs. All of this functionality is implemented in a single PLD logic cell per ALU bit.

Proceedings ArticleDOI
07 Aug 2002
TL;DR: Performance comparison with traditional CMOS and various PTL design techniques is presented, with respect to the layout area, number of devices, delay and power dissipation, showing advantages and drawbacks of GDI as compared to other methods.
Abstract: GDI (Gate Diffusion Input) - a new technique of low power digital circuit design is described. This technique allows reducing power consumption, delay and area of digital circuits, while maintaining low complexity of logic design. Performance comparison with traditional CMOS and various PTL design techniques is presented, with respect to the layout area, number of devices, delay and power dissipation, showing advantages and drawbacks of GDI as compared to other methods. A variety of logic gates have been implemented in 0.35 /spl mu/m technology to compare the GDI technique with CMOS and PTL. A prototype test chip of 8-bit CLA adder has been fabricated, based on GDI and CMOS cell libraries, showing up to 45% reduction in power-delay product in GDI. Properties of implemented circuits are discussed, simulation results are reported and measurements of a test chip are presented.

Patent
28 Mar 2002
TL;DR: In this article, a dual-cycle address generation unit is described to generate linear addresses, which includes a first adder to add a product of an index and a scaling factor to an offset and a segment base during a first clock cycle.
Abstract: A dual-cycle address generation unit is described to generate linear addresses The dual-cycle address generation unit includes a first adder to add a product of an index and a scaling factor to an offset and a segment base during a first clock cycle and a second adder to add output of the first adder with a base during a second clock cycle

Proceedings ArticleDOI
13 Jun 2002
TL;DR: The leakage-biased domino circuit (LB-domino) as discussed by the authors maintains high speed in active mode but can be rapidly placed into a low-leakage inactive state by using leakage currents themselves to bias internal nodes.
Abstract: A leakage-biased domino circuit family is proposed that maintains high speed in active mode but which can be rapidly placed into a low-leakage inactive state by using leakage currents themselves to bias internal nodes. A 32-bit Han-Carlson domino adder circuit is used to compare LB-domino with conventional single and dual Vt domino circuits. For equal delay and noise margin, the LB-domino technique gives two decades reduction in steady-state leakage energy compared to a dual-Vt technique.

Journal ArticleDOI
TL;DR: Computation reduction techniques which can either be used to obtain multiplierless implementation of finite impulse response (FIR) digital filters or to further improve multiplier less implementation obtained by currently used techniques are presented.
Abstract: We present computation reduction techniques which can either be used to obtain multiplierless implementation of finite impulse response (FIR) digital filters or to further improve multiplierless implementation obtained by currently used techniques. Although presented in the FIR filtering framework, these ideas are also directly applicable to any task/application which can be expressed as multiplication of vectors by scalars. The presented approach is to remove computational redundancy by reordering computation. The reordering problem is formulated using a graph in which vertices represent coefficients and edges represent resources required in a computation using the differential coefficient defined by the difference of the vertices joined by the edge. This interpretation leads to various methods for computation reduction for which simple polynomial run time algorithms are presented. It is shown that about 20% reduction in the number of add operations per coefficient can be obtained over the conventional multiplierless implementations. It is also shown that implementations requiring less than one adder per coefficient can be obtained using the presented approaches when using nonuniformly scaled coefficients quantized from infinite precision representation by simple rounding.

Proceedings Article
01 Jan 2002
TL;DR: This paper relates the potential energy savings to the energy profile of a circuit by using gate sizing and supply voltage optimization to minimize energy consumption subject to a delay constraint.
Abstract: This paper relates the potential energy savings to the energy profile of a circuit. These savings are obtained by using gate sizing and supply voltage optimization to minimize energy consumption subject to a delay constraint. The sensitivity of energy to delay is derived from a linear delay model extended to multiple supplies. The optimizations are applied to a range of examples that span typical circuit topologies including inverter chains, SRAM decoders and adders. At a delay of 20% larger than the minimum, energy savings of 40% to 70% are possible, indicating that achieving peak performance is expensive in terms of energy.