Showing papers on "Adder published in 2002"

PDF

Open Access

Journal Article•DOI•

Performance analysis of low-power 1-bit CMOS full adder cells

[...]

A.M. Shams¹, T.K. Darwish², Magdy Bayoumi²•Institutions (2)

Intel¹, University of Louisiana at Lafayette²

01 Feb 2002-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A performance analysis of 1-bit full-adder cell is presented, after the adder cell is anatomized into smaller modules, and several designs of each of them are developed, prototyped, simulated and analyzed.

...read moreread less

Abstract: A performance analysis of 1-bit full-adder cell is presented. The adder cell is anatomized into smaller modules. The modules are studied and evaluated extensively. Several designs of each of them are developed, prototyped, simulated and analyzed. Twenty different 1-bit full-adder cells are constructed (most of them are novel circuits) by connecting combinations of different designs of these modules. Each of these cells exhibits different power consumption, speed, area, and driving capability figures. Two realistic circuit structures that include adder cells are used for simulation. A library of full-adder cells is developed and presented to the circuit designers to pick the full-adder cell that satisfies their specific applications.

...read moreread less

454 citations

Journal Article•DOI•

A VLSI architecture for lifting-based forward and inverse wavelet transform

[...]

K. Andra¹, Chaitali Chakrabarti¹, T. Acharya²•Institutions (2)

Arizona State University¹, Intel²

01 Apr 2002-IEEE Transactions on Signal Processing

TL;DR: An architecture that performs the forward and inverse discrete wavelet transform (DWT) using a lifting-based scheme for the set of seven filters proposed in JPEG2000 using an architecture consisting of two row processors, two column processors, and two memory modules.

...read moreread less

Abstract: We propose an architecture that performs the forward and inverse discrete wavelet transform (DWT) using a lifting-based scheme for the set of seven filters proposed in JPEG2000. The architecture consists of two row processors, two column processors, and two memory modules. Each processor contains two adders, one multiplier, and one shifter. The precision of the multipliers and adders has been determined using extensive simulation. Each memory module consists of four banks in order to support the high computational bandwidth. The architecture has been designed to generate an output every cycle for the JPEG2000 default filters. The schedules have been generated by hand and the corresponding timings listed. Finally, the architecture has been implemented in behavioral VHDL. The estimated area of the proposed architecture in 0.18-/spl mu/ technology is 2.8 nun square, and the estimated frequency of operation is 200 MHz.

...read moreread less

350 citations

Journal Article•DOI•

Design and analysis of low-power 10-transistor full adders using novel XOR-XNOR gates

[...]

Hung Tien Bui, Yuke Wang¹, Yingtao Jiang¹•Institutions (1)

University of Texas at Dallas¹

07 Aug 2002-IEEE Transactions on Circuits and Systems Ii: Analog and Digital Signal Processing

TL;DR: This paper proposes a technique to build a total of 41 new 10-transistor full adders using novel XOR and XNOR gates in combination with existing ones to reduce the threshold-voltage loss of the pass transistors.

...read moreread less

Abstract: Full adders are important components in applications such as digital signal processors (DSP) architectures and microprocessors. In this paper, we propose a technique to build a total of 41 new 10-transistor full adders using novel XOR and XNOR gates in combination with existing ones. We have done over 10,000 HSPICE simulation runs of all the different adders in different input patterns, frequencies, and load capacitances. Almost all those new adders consume less power in high frequencies, while three new adders consistently consume on average 10% less power and have higher speed compared with the previous 10-transistor full adder and the conventional 28-transistor CMOS adder. One draw back of the new adders is the threshold-voltage loss of the pass transistors.

...read moreread less

306 citations

Journal Article•DOI•

Adder based residue to binary number converters for (2/sup n/-1, 2/sup n/, 2/sup n/+1)

[...]

Yuke Wang, Xiaoyu Song¹, M. Aboulhamid², H. Shen³•Institutions (3)

Portland State University¹, Université de Montréal², Japan Advanced Institute of Science and Technology³

01 Jul 2002-IEEE Transactions on Signal Processing

TL;DR: In this paper, the authors presented three new residue-to-binary converters for the residue number system (2/sup n/-1, 2 /sup n/, 2/Sup n/+1) using 2n-bit or n-bit adders with improvements on speed, area, or dynamic range compared with various previous converters.

...read moreread less

Abstract: Based on an algorithm derived from the new Chinese remainder theorem I, we present three new residue-to-binary converters for the residue number system (2/sup n/-1, 2/sup n/, 2/sup n/+1) designed using 2n-bit or n-bit adders with improvements on speed, area, or dynamic range compared with various previous converters. The 2n-bit adder based converter is faster and requires about half the hardware required by previous methods. For n-bit adder-based implementations, one new converter is twice as fast as the previous method using a similar amount of hardware, whereas another new converter achieves improvement in either speed, area, or dynamic range compared with previous converters.

...read moreread less

195 citations

Journal Article•DOI•

Analysis and comparison on full adder block in submicron technology

[...]

Massimo Alioto¹, Gaetano Palumbo¹•Institutions (1)

University of Catania¹

01 Dec 2002-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: The results show that, except for short chains of blocks or for cases where minimum power consumption is desired, topologies with only pass transistors or transmission gates are not attractive, and the most interesting implementations in terms of trade off between power and delay are the traditional CMOS and mirror topologies.

...read moreread less

Abstract: In this paper the main topologies of one-bit full adders, including the most interesting of those recently proposed, are analyzed and compared for speed, power consumption, and power-delay product. The comparison has been performed on two classes of circuits, the former with minimum transistor size to minimize power consumption, the latter with optimized transistor dimension to minimize power-delay product. The investigation has been carried out with properly defined simulation runs on a Cadence environment using a 0.35-/spl mu/m process, also including the parasitics derived from layout. Performance has been also compared for different supply voltage values. Thus design guidelines have been derived to select the most suitable topology for the design features required. This paper also proposes a novel figure of merit to realistically compare n-bit adders implemented as a chain of one-bit full adders. The results differ from those previously published both for the more realistic simulations carried out and the more appropriate figure of merit used. They show that, except for short chains of blocks or for cases where minimum power consumption is desired, topologies with only pass transistors or transmission gates are not attractive. In contrast, the most interesting implementations in terms of trade off between power and delay are the traditional CMOS and mirror topologies. Moreover, the dual-rail domino and the CPL allow the best speed performance.

...read moreread less

193 citations

Patent•

Image processing apparatus and method

[...]

Iida Ryohei¹, Takemoto Takashi²•Institutions (2)

Sony Broadcast & Professional Research Laboratories¹, Toshiba²

29 Oct 2002

TL;DR: An image processing apparatus and method which can reduce the size of circuits for alpha-blending and dithering and realize high speed processing which perform in parallel processing for finding an amount of update of present image data to be drawn with respect to image data already stored in a display buffer by using a blending coefficient in a subtractor and a multiplier and processing for adding noise data to the image data stored in the display buffer in a first adder and adding the data obtained by the two processing at a second adder so as to find data comprised of noise data added by linear interpolation

...read moreread less

Abstract: An image processing apparatus and method which can reduce the size of circuits for alpha-blending and dithering and realize high speed processing which perform in parallel processing for finding an amount of update of present image data to be drawn with respect to image data already stored in a display buffer by using a blending coefficient in a subtractor and a multiplier and processing for adding noise data to the image data already stored in the display buffer in a first adder and adding the data obtained by the two processing at a second adder so as to find data comprised of noise data added to data obtained by linear interpolation of two colors, then extracting color valid values at a clamp circuit, thinning out the extracted data in a rounding-off circuit, and writing it back to the display buffer.

...read moreread less

182 citations

Proceedings Article•DOI•

Efficient adder circuits based on a conservative reversible logic gate

[...]

J.W. Bruce¹, Mitchell A. Thornton¹, L. Shivakumaraiah¹, P.S. Kokate¹, X. Li¹ - Show less +1 more•Institutions (1)

Mississippi State University¹

07 Aug 2002

TL;DR: Novel full adder circuits using Fredkin gates are proposed which have lower hardware complexity than the current state-of-the-art, while generating the additional signals required for carry skip adder architectures.

...read moreread less

Abstract: Conservative and reversible logic gates are widely known to be compatible with revolutionary computing paradigms such as optical and quantum computing. A fundamental conservative reversible logic gate is the Fredkin gate. This paper presents efficient adder circuits based on the Fredkin gate. Novel full adder circuits using Fredkin gates air proposed which have lower hardware complexity than the current state-of-the-art, while generating the additional signals required for carry skip adder architectures. The traditional ripple carry adder and several carry skip adder topologies are compared. Theoretical performance of each adder is determined and compared. Although the variable sized block carry skip adder is determined to have shorter delay than the fixed block size carry skip adder, the performance gains are not sufficient to warrant the required additional hardware complexity.

...read moreread less

180 citations

Journal Article•DOI•

Nanocell logic gates for molecular computing

[...]

James M. Tour¹, W.L. Van Zandt¹, C.P. Husband¹, S.M. Husband¹, L.S. Wilson¹, Paul D. Franzon², David P. Nackashi² - Show less +3 more•Institutions (2)

Rice University¹, North Carolina State University²

01 Jun 2002-IEEE Transactions on Nanotechnology

TL;DR: Genetic algorithm-based simulations of molecular device structures in a nanocell where placement and connectivity of the internal molecular switches are not specifically directed and the internal topology is generally disordered show that it is possible to use easily fabricated nanocells as logic devices by setting theinternal molecular switch states after the topological molecular assembly is complete.

...read moreread less

Abstract: Molecular electronics seeks to build electrical devices to implement computation - logic and memory - using individual or small collections of molecules. These devices have the potential to reduce device size and fabrication costs, by several orders of magnitude, relative to conventional CMOS. However, the construction of a practical molecular computer will require the molecular switches and their related interconnect technologies to behave as large-scale diverse logic, with input/output wires scaled to molecular dimensions. It is unclear whether it is necessary or even. possible to control the precise regular placement and interconnection of these diminutive molecular systems. This paper describes genetic algorithm-based simulations of molecular device structures in a nanocell where placement and connectivity of the internal molecular switches are not specifically directed and the internal topology is generally disordered. With some simplifying assumptions, these results show that it is possible to use easily fabricated nanocells as logic devices by setting the internal molecular switch states after the topological molecular assembly is complete. Simulated logic devices include an inverter, a NAND gate, an XOR gate and a 1-bit adder. Issues of defect and fault tolerance are addressed.

...read moreread less

158 citations

Journal Article•DOI•

Diminished-one modulo 2/sup n/+1 adder design

[...]

Haridimos T. Vergos, Constantinos Efstathiou, Dimitris Nikolos

01 Dec 2002-IEEE Transactions on Computers

TL;DR: In this paper, the authors present two new design methodologies for modulo 2/sup n/1 addition in the diminished-one number system, the first leads to carry look-ahead, whereas the second to parallel-prefix adder implementations.

...read moreread less

Abstract: This paper presents two new design methodologies for modulo 2/sup n/+1 addition in the diminished-one number system. The first design methodology leads to carry look-ahead, whereas the second to parallel-prefix adder implementations. VLSI realizations of the proposed circuits in a standard-cell technology are utilized for quantitative comparisons against the existing solutions. Our results indicate that the proposed carry look-ahead adders are area and time efficient for small values of n, while for the rest values of n the proposed parallel-prefix adders are considerably faster than any other already known in the open literature.

...read moreread less

139 citations

Patent•

Sense-amp based adder with source follower evaluation tree

[...]

Jae-Joon Kim¹, Ching-Te K. Chuang¹, Rajiv V. Joshi¹, Kaushik Roy¹•Institutions (1)

IBM¹

10 Jun 2002

TL;DR: In this paper, a 64-bit adder implemented in partially depleted silicon on insulator technology and having two levels of lookahead uses a dynamic eight-bit carry module containing a cascode evaluation tree employing a chain of source followers that feeds a sense amplifier.

...read moreread less

Abstract: A 64-bit adder implemented in partially depleted silicon on insulator technology and having two levels of lookahead uses a dynamic eight-bit carry module containing a cascode evaluation tree employing a chain of source followers that feeds a sense amplifier, thereby obtaining benefits from high initial drive, low variation in body voltage, resulting in low variation in history-dependent delay, reduced noise sensitivity and noise-based delay.

...read moreread less

125 citations

Journal Article•DOI•

A reversible carry-look-ahead adder using control gates

[...]

Bart Desoete¹, Alexis De Vos¹•Institutions (1)

Ghent University¹

01 Dec 2002-Integration

TL;DR: It is demonstrated that, for a flexible design, it is more advantageous to use a broad class of reversible gates, called control gates, which form a generalization of Feynman's three gates.

...read moreread less

Journal Article•DOI•

High-speed and reduced-area modular adder structures for RNS

[...]

A.A. Hiasat¹•Institutions (1)

Princess Sumaya University for Technology¹

01 Jan 2002-IEEE Transactions on Computers

TL;DR: A new modular adder design is introduced, based on utilizing concepts developed to realize binary-based adders, that requires less area and time delay than other similar ones.

...read moreread less

Abstract: A modular adder is a very instrumental arithmetic component in implementing online residue-based computations for many digital signal processing applications. It is also a basic component in realizing modular multipliers and residue to binary converters. Thus, the design of a high-speed and reduced-area modular adder is an important issue. In this paper, we introduce a new modular adder design. It is based on utilizing concepts developed to realize binary-based adders. VLSI layout implementations and comparative analysis showed that the hardware requirements and the time delay of the new proposed structure are significantly, less than other reported ones. A new modulo (2/sup n/+1) adder is also presented. Compared with other similar ones, this specific modular adder requires less area and time delay.

...read moreread less

Proceedings Article•DOI•

Extended results for minimum-adder constant integer multipliers

[...]

Oscar Gustafsson¹, Andrew G. Dempster², Lars Wanhammar¹•Institutions (2)

Linköping University¹, University of Westminster²

26 May 2002

TL;DR: By introducing simplifications to multiplier graphs, the previous work on minimum adder multipliers to five adders is extended and it is shown that this is enough to express all coefficients up to 19 bits.

...read moreread less

Abstract: By introducing simplifications to multiplier graphs we extend the previous work on minimum adder multipliers to five adders and show that this is enough to express all coefficients up to 19 bits The average savings are more than 25% for 19 bits compared with CSD multipliers The simplifications include addition reordering and vertex reduction to see that different graphs can generate the same coefficient sets Thus, fewer graphs need to be evaluated A classification of the graphs reduces the effort to search the coefficient space further

...read moreread less

Proceedings Article•DOI•

Comparison of a 17 b multiplier in Dual-rail domino and in Dual-rail D/sup 3/L (D/sup 4/L) logic styles

[...]

R. Rafati¹, A.Z. Charaki¹, G.R. Chaji¹, Sied Mehdi Fakhraie¹, Kenneth C. Smith² - Show less +1 more•Institutions (2)

University of Tehran¹, University of Toronto²

07 Aug 2002

TL;DR: A new family of dynamic logic gates called Dual-rail Data-Driven Dynamic Logic (D/sup 4/L) is introduced, in this logic family, the synchronization clock signal has been eliminated and correct precharge and evaluation sequencing is maintained by appropriate use of data instances.

...read moreread less

Abstract: In this paper, a new family of dynamic logic gates called Dual-rail Data-Driven Dynamic Logic (D/sup 4/L) is introduced. In this logic family, the synchronization clock signal has been eliminated and correct precharge and evaluation sequencing is maintained by appropriate use of data instances. The methodology and characteristics of D/sup 4/L are demonstrated in the design of a CLA 32-b adder and a 17-b high-speed multiplier. Based on VHDL simulations, the D/sup 4/L implemented 32-b adder has 23% less switching-activity than a comparable domino adder and for D/sup 4/L multiplier switching-activity is 14.5% less than its domino rival. HSPICE simulation in a 0.6 /spl mu/m CMOS process shows that D/sup 4/L has a 17% power saving over domino in a 32-b CLA adder design and a 10% saving in a 17-b multiplier design while a D/sup 4/L adder has 8% less delay than a domino one.

...read moreread less

Patent•

Error correction code circuit with reduced hardware complexity

[...]

Heng-Kuan Lee

25 Dec 2002

TL;DR: In this paper, an error correction code circuit with reduced hardware complexity is positioned inside a microprocessor, and the microprocessor has a Galois field multiplier for performing Galois Field multiplication on data processed by the error-correcting code circuit.

...read moreread less

Abstract: An error correction code circuit with reduced hardware complexity is positioned inside a microprocessor. The microprocessor has a Galois field multiplier for performing a Galois field multiplication on data processed by the error correction code circuit. The error correction code circuit has a first register for storing an input data, a plurality of calculation units, a third register for storing an output data corresponding to the input data, and a controller for controlling operation of the error correction code circuit. Each calculation unit has a Galois field adder, and a second register electrically connected to the Galois field adder. The controller transmits data of each calculation unit to the same Galois field multiplier for a corresponding Galois field multiplication, and the result outputted by the Galois field multiplier is transmitted back to the error correction code circuit.

...read moreread less

Book Chapter•DOI•

Implementing Asynchronous Circuits on LUT Based FPGAs

[...]

Quoc-Thai Ho, Jean-Baptiste Rigaud, Laurent Fesquet, Marc Renaudin, Robin Rolland - Show less +1 more

02 Sep 2002

TL;DR: This paper describes a general methodology to rapidly prototype asynchronous circuits on LUT based FPGAs, based on the use and the design of a Muller gate library, and an asynchronous dual-rail adder is implemented automatically to demonstrate the potential of the methodology.

...read moreread less

Abstract: This paper describes a general methodology to rapidly prototype asynchronous circuits on LUT based FPGAs. The main objective is to offer designers the powerfulness of standard synchronous FPGAs to prototype their asynchronous circuits or mixed synchronous/asynchronous circuits. To avoid hazard in FPGAs, the appearance of hazard in configurable logic cells is analyzed. The developed technique is based on the use and the design of a Muller gate library. It is shown how the place and route tools automatically exploit this library. Finally, an asynchronous dual-rail adder is implemented automatically to demonstrate the potential of the methodology. Several FPGA families, like Xilinx X4000, Altera Flex, Xilinx Virtex and uptodate Altera Apex are targeted.

...read moreread less

Proceedings Article•DOI•

Pseudo dynamic logic (SDL): a high-speed and low-power dynamic logic family

[...]

G.R. Chaji¹, Sied Mehdi Fakhraie¹, Kenneth C. Smith•Institutions (1)

University of Tehran¹

07 Aug 2002

TL;DR: A 32-bit adder has been designed and simulated using HSPICE Level-49 parameters of a 0.6 /spl mu/m CMOS process and simulated measurements show that the worst-case delay is 1.56 ns, demonstrating 2.1 times speed improvement in comparison to a domino dynamic logic design implemented with the same technology.

...read moreread less

Abstract: In this paper, a new logic-design style called Pseudo Dynamic Logic (SDL) is introduced. In this logic-design style, the internal nodes of the logic circuits are not precharged to high or low values, rather the initial charges on nodes are shared to yield an intermediate precharge value for faster evaluation. A 32-bit adder has been designed and simulated using HSPICE Level-49 parameters of a 0.6 /spl mu/m CMOS process. Simulated measurements on this adder show that the worst-case delay is 1.56 ns. This demonstrates 2.1 times speed improvement in comparison to a domino dynamic logic design implemented with the same technology.

...read moreread less

Journal Article•DOI•

A 16-Bit by 16-Bit MAC Design Using Fast 5: 3 Compressor Cells

[...]

Ohsang Kwon¹, Kevin J. Nowka², Earl E. Swartzlander³•Institutions (3)

Sun Microsystems¹, IBM², University of Texas at Austin³

01 Jun 2002

TL;DR: In this paper, a fast 5:3 compressor is derived for high-speed multiplier implementations by applying two rows of fast 2-bit adder cells to five rows in a partial product matrix.

...read moreread less

Abstract: 3:2 counters and 4:2 compressors have been widely used for multiplier implementations. In this paper, a fast 5:3 compressor is derived for high-speed multiplier implementations. The fast 5:3 compression is obtained by applying two rows of fast 2-bit adder cells to five rows in a partial product matrix. As a design example, a 16-bit by 16-bit MAC (Multiply and Accumulate) design is investigated both in a purely logical gate implementation and in a highly customized design. For the partial product reduction, the use of the new 5:3 compression leads to 14.3% speed improvement in terms of XOR gate delay. In a dynamic CMOS circuit implementation using 0.225 μm bulk CMOS technology, 11.7% speed improvement is observed with 8.1% less power consumption for the reduction tree.

...read moreread less

Journal Article•DOI•

The design of hybrid carry-lookahead/carry-select adders

[...]

Yuke Wang, C. Pai¹, Xiaoyu Song²•Institutions (2)

Concordia University¹, Portland State University²

07 Aug 2002-IEEE Transactions on Circuits and Systems Ii: Analog and Digital Signal Processing

TL;DR: A new implementation of high-speed 56-bit hybrid adder is proposed that directly implements group carry propagates and group carry generators without individual carry generator/propagate signals.

...read moreread less

Abstract: In this paper, we present a general architecture for designing hybrid carry-lookahead/carry-select adders. Several previous adders in the literature are all special cases of this general architecture. They differ in the way Boolean functions for the carries are implemented. Based on the general architecture, we propose a new implementation of high-speed 56-bit hybrid adder. The new adder directly implements group carry propagates and group carry generators without individual carry generator/propagate signals. Moreover, the group carry generator/propagate signals are complemented to gain speed. The new implementation can be in static CMOS or dynamic logic style. The critical path length of our new design is about 2/3 of the critical path lengths of previous adders; therefore, higher speed can be gained.

...read moreread less

Journal Article•DOI•

A "flying-adder" architecture of frequency and phase synthesis with scalability

[...]

Liming Xiu¹, Zhihong You¹•Institutions (1)

Texas Instruments¹

01 Oct 2002-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: The improved architecture has better performance, is simpler to implement, and is easier to understand.

...read moreread less

Abstract: Most of today's digital designs, from small-scale digital block designs to system-on-chip (SoC) designs, are based on "synchronous" design principle. Clock is the most important issue in these designs. Frequency and phase synthesis is closely related to the clock generation. A frequency and phase synthesis technique based on phase-locked loop is proposed in that delivers high performance, easy integration, and high stability. However, there are problems associated with this architecture, such as: 1) its highest deliverable frequency is limited by the speed of the accumulator and 2) the phase synthesis circuitry will not work well in certain ranges (dead zone) and in certain conditions (dual stability). This paper presents an improved architecture that addresses these problems. The new frequency synthesis circuitry has scalability for higher output frequency. It also has an internal node whose frequency is twice that of output signal. When duty cycle is not a concern, this signal can be used directly as clock source. The new phase synthesis circuitry is free of "dead zone" and "dual stability." The improved architecture has better performance, is simpler to implement, and is easier to understand.

...read moreread less

Journal Article•DOI•

Design of high-performance CMOS priority encoders and incrementer/decrementers using multilevel lookahead and multilevel folding techniques

[...]

Chung-Hsun Huang¹, Jinn-Shyan Wang¹, Yan-Chao Huang¹•Institutions (1)

National Chung Cheng University¹

07 Aug 2002-IEEE Journal of Solid-state Circuits

TL;DR: Results indicate that the proposed 256-bit priority encoder and the proposed 64-bit incrementer/decrementer can operate up to 116 and 139 MHz when they are designed based on a 0.6-/spl mu/m CMOS technology.

...read moreread less

Abstract: Lookahead signals to form the multilevel folding architecture for priority-encoding-based designs was used to improve the performance to the order of O(log N). Analysis showed that both the multilevel lookahead and the multilevel folding techniques could be easily merged and implemented in the dynamic CMOS circuits. For the 256-bit priority encoder, the new design adopting all the proposed techniques can achieve nearly ten times performance while spending nearly half the power consumption as compared to the conventional design, utilizing only a simple lookahead structure. For the 64-bit incrementer/decrementer, the new design adopting all the proposed techniques requires less than one-third delay time as compared to a high-speed carry-select adder (CSA)-based incrementer/decrementer. The power consumption evaluated at the maximum operating frequency and the transistor count of the new incrementer/decrementer are also reduced to 67% and 35%, respectively, as compared to the CSA-based design. The measurement results indicate that the proposed 256-bit priority encoder and the proposed 64-bit incrementer/decrementer can operate up to 116 and 139 MHz, respectively, when they are designed based on a 0.6-/spl mu/m CMOS technology.

...read moreread less

Journal Article•DOI•

Multiplierless approximation of transforms with adder constraint

[...]

Ying-Jui Chen¹, Soontorn Oraintara², Trac D. Tran³, Kevin Amaratunga¹, T.Q. Nguyen⁴ - Show less +1 more•Institutions (4)

Massachusetts Institute of Technology¹, Luleå University of Technology², Johns Hopkins University³, University of California, Berkeley⁴

10 Dec 2002-IEEE Signal Processing Letters

TL;DR: This letter describes an algorithm for systematically finding a multiplierless approximation of transforms by replacing floating-point multipliers with VLSI-friendly binary coefficients of the form k/2/sup n/.

...read moreread less

Abstract: This letter describes an algorithm for systematically finding a multiplierless approximation of transforms by replacing floating-point multipliers with VLSI-friendly binary coefficients of the form k/2/sup n/. Assuming the cost of hardware binary shifters is negligible, the total number of binary adders employed to approximate the transform can be regarded as an index of complexity. Because the new algorithm is more systematic and faster than trial-and-error binary approximations with adder constraint, it is a much more efficient design tool. Furthermore, the algorithm is not limited to a specific transform; various approximations of the discrete cosine transform are presented as examples of its versatility.

...read moreread less

Proceedings Article•DOI•

A low power and reduced area carry select adder

[...]

K. Rawat¹, T. Darwish¹, Magdy Bayoumi¹•Institutions (1)

University of Louisiana at Lafayette¹

04 Aug 2002

TL;DR: In the modified CSA, one of the n-bit adder blocks is replaced by an add-one circuit consisting of fewer transistors, which considerably reduces the power and area, with negligible speed penalty.

...read moreread less

Abstract: A carry select adder (CSA) can be implemented by using a single adder block and an add-one circuit instead of using dual adder blocks. The add-one circuit is based on "first" zero detection logic and a few multiplexers. In the modified CSA, one of the n-bit adder blocks is replaced by an add-one circuit consisting of fewer transistors. This scheme considerably reduces the power and area, with negligible speed penalty. For 8-bit length, n=8, this modified CSA requires 38% fewer transistors and consumes only 73% of the power, compared to the conventional design, using a 0.5 /spl mu/m CMOS technology.

...read moreread less

Journal Article•DOI•

Bit-stream signal processing and its application to communication systems

[...]

H. Fujisaka¹, R. Kurata¹, M. Sakamoto¹, M. Morisue¹•Institutions (1)

Hiroshima City University¹

10 Dec 2002

TL;DR: A digital circuit technique to process directly bit-stream signals from sigma-delta modulation based analogue-to-digital converters and the application of the technique to communication systems is described and a QPSK demodulator is presented.

...read moreread less

Abstract: The paper describes a digital circuit technique to process directly bit-stream signals from sigma-delta modulation based analogue-to-digital converters and the application of the technique to communication systems. The newly developed adder and multiplier are fundamental processing circuit modules. Using the fundamental modules and up/down counters, other circuit modules, such as oscillators, dividers and square root circuits, can also be realised. Signal processors built from the modules have three advantages over multi-bit Nyquist rate processors. First, single-bit/multibit converters are not needed at the inputs of the processors because the arithmetic modules directly process the bit-stream signals. Secondly, the physical areas for routing the signals among the circuit modules are small since they are in the form of a bit-stream. Thirdly, the processors are built from a smaller number of logic gates than conventional Nyquist rate processors because of the simple structure of the circuit modules. As an application of the technique to digital signal processing for communications, a QPSK demodulator is presented. In addition to circuit simulations of the demodulator, a useful linear analysis to estimate the influence of the noise components contained in the outputs from the circuit modules on the steady-state demodulation performance is explained.

...read moreread less

Patent•

ALU implementation in single PLD logic cell

[...]

Goran Bilski¹•Institutions (1)

Xilinx¹

01 Feb 2002

TL;DR: In this article, a carry chain is provided for combining the one-bit ALU circuits to generate multi-bit AlUs, where the ALU circuit has two data input signals and two operator input signals that select between the adder, subtractor and other logical functions.

...read moreread less

Abstract: Structures and methods that implement an ALU (Arithmetic Logic Unit) circuit in a PLD (Programmable Logic Device) while using only one PLD logic cell to implement a one-bit ALU circuit. The ALU circuit has two data input signals and two operator input signals that select between the adder, subtractor, and other logical functions. A result bit provides the result of the addition, subtraction, or other logical function as selected by the values of the two operator input signals. A carry chain is provided for combining the one-bit ALU circuits to generate multi-bit ALUs. All of this functionality is implemented in a single PLD logic cell per ALU bit.

...read moreread less

Proceedings Article•DOI•

Gate-diffusion input (GDI) - a technique for low power design of digital circuits: analysis and characterization

[...]

Arkadiy Morgenshtein¹, Alexander Fish², Israel A. Wagner³•Institutions (3)

Technion – Israel Institute of Technology¹, Ben-Gurion University of the Negev², IBM³

07 Aug 2002

TL;DR: Performance comparison with traditional CMOS and various PTL design techniques is presented, with respect to the layout area, number of devices, delay and power dissipation, showing advantages and drawbacks of GDI as compared to other methods.

...read moreread less

Abstract: GDI (Gate Diffusion Input) - a new technique of low power digital circuit design is described. This technique allows reducing power consumption, delay and area of digital circuits, while maintaining low complexity of logic design. Performance comparison with traditional CMOS and various PTL design techniques is presented, with respect to the layout area, number of devices, delay and power dissipation, showing advantages and drawbacks of GDI as compared to other methods. A variety of logic gates have been implemented in 0.35 /spl mu/m technology to compare the GDI technique with CMOS and PTL. A prototype test chip of 8-bit CLA adder has been fabricated, based on GDI and CMOS cell libraries, showing up to 45% reduction in power-delay product in GDI. Properties of implemented circuits are discussed, simulation results are reported and measurements of a test chip are presented.

...read moreread less

Patent•

Apparatus and method for address calculation

[...]

Ross A. Segelken¹, Feng Chen, David J. Sager•Institutions (1)

Intel¹

28 Mar 2002

TL;DR: In this article, a dual-cycle address generation unit is described to generate linear addresses, which includes a first adder to add a product of an index and a scaling factor to an offset and a segment base during a first clock cycle.

...read moreread less

Abstract: A dual-cycle address generation unit is described to generate linear addresses The dual-cycle address generation unit includes a first adder to add a product of an index and a scaling factor to an offset and a segment base during a first clock cycle and a second adder to add output of the first adder with a base during a second clock cycle

...read moreread less

Proceedings Article•DOI•

Leakage-biased domino circuits for dynamic fine-grain leakage reduction

[...]

Seongmoo Heo¹, Krste Asanovic¹•Institutions (1)

Massachusetts Institute of Technology¹

13 Jun 2002

TL;DR: The leakage-biased domino circuit (LB-domino) as discussed by the authors maintains high speed in active mode but can be rapidly placed into a low-leakage inactive state by using leakage currents themselves to bias internal nodes.

...read moreread less

Abstract: A leakage-biased domino circuit family is proposed that maintains high speed in active mode but which can be rapidly placed into a low-leakage inactive state by using leakage currents themselves to bias internal nodes. A 32-bit Han-Carlson domino adder circuit is used to compare LB-domino with conventional single and dual Vt domino circuits. For equal delay and noise margin, the LB-domino technique gives two decades reduction in steady-state leakage energy compared to a dual-Vt technique.

...read moreread less

Journal Article•DOI•

A graph theoretic approach for synthesizing very low-complexity high-speed digital filters

[...]

K. Muhammad¹, Kaushik Roy²•Institutions (2)

Texas Instruments¹, Purdue University²

07 Aug 2002-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: Computation reduction techniques which can either be used to obtain multiplierless implementation of finite impulse response (FIR) digital filters or to further improve multiplier less implementation obtained by currently used techniques are presented.

...read moreread less

Abstract: We present computation reduction techniques which can either be used to obtain multiplierless implementation of finite impulse response (FIR) digital filters or to further improve multiplierless implementation obtained by currently used techniques. Although presented in the FIR filtering framework, these ideas are also directly applicable to any task/application which can be expressed as multiplication of vectors by scalars. The presented approach is to remove computational redundancy by reordering computation. The reordering problem is formulated using a graph in which vertices represent coefficients and edges represent resources required in a computation using the differential coefficient defined by the difference of the vertices joined by the edge. This interpretation leads to various methods for computation reduction for which simple polynomial run time algorithms are presented. It is shown that about 20% reduction in the number of add operations per coefficient can be obtained over the conventional multiplierless implementations. It is also shown that implementations requiring less than one adder per coefficient can be obtained using the presented approaches when using nonuniformly scaled coefficients quantized from infinite precision representation by simple rounding.

...read moreread less

Proceedings Article•

Energy–delay tradeoffs in combinational logic using gate sizing and supply voltage optimization

[...]

Vladimir Stojanovic¹, Dejan Markovic, Borivoje Nikolic, Mark Horowitz, Robert W. Brodersen - Show less +1 more•Institutions (1)

Stanford University¹

01 Jan 2002

TL;DR: This paper relates the potential energy savings to the energy profile of a circuit by using gate sizing and supply voltage optimization to minimize energy consumption subject to a delay constraint.

...read moreread less

Abstract: This paper relates the potential energy savings to the energy profile of a circuit. These savings are obtained by using gate sizing and supply voltage optimization to minimize energy consumption subject to a delay constraint. The sensitivity of energy to delay is derived from a linear delay model extended to multiple supplies. The optimizations are applied to a range of examples that span typical circuit topologies including inverter chains, SRAM decoders and adders. At a delay of 20% larger than the minimum, energy savings of 40% to 70% are possible, indicating that achieving peak performance is expensive in terms of energy.

...read moreread less

Collapse