scispace - formally typeset
Search or ask a question

Showing papers on "Carry-lookahead adder published in 1996"


Journal ArticleDOI
TL;DR: A uniform static CMOS layout methodology whereby short circuit power mininization is used as the optimization criterion is adopted and a large adder design space is formulated from which an architect can choose an adder with the desired characteristics.
Abstract: In this paper, several classes of parallel, synchronous adders are surveyed based on their power, delay and area characteristics. The adders studied include the linear time ripple carry and Manchester carry chain adders, the square-root time carry skip and carry select adders, the logarithmic time carry lookahead adder and its variations, and the constant time signed-digit and carry-save adders. Most of the research in the last few decades has concentrated on reducing the delay of addition. With the rising popularity of portable computers, however, the emphasis is on both high speed and low power operation. In this paper we adopt a uniform static CMOS layout methodology whereby short circuit power mininization is used as the optimization criterion. The relative merits of the different adders are evaluated by performing a detailed transistor-level simulation of the adders using HSPICE. Among the two's complement adders, a variation of the carry lookahead adder, called ELM, was found to have the best power-delay product. Based on the results of our experiments, a large adder design space is formulated from which an architect can choose an adder with the desired characteristics.

221 citations


Proceedings ArticleDOI
07 Oct 1996
TL;DR: An innovative dynamic logic family, clock-delayed (CD) domino, was developed to provide gates with either inverting or non-inverting outputs, and the high speed and layout compactness of dynamic logic.
Abstract: An innovative dynamic logic family, clock-delayed (CD) domino, was developed to provide gates with either inverting or non-inverting outputs, and the high speed and layout compactness of dynamic logic. The characteristics of CD domino are demonstrated in two carry lookahead adder designs and three MCNC combinational logic benchmark circuits. The CD domino designs are compared to designs using static CMOS and standard domino logic. A circuit design tool was developed to automate the design of CD domino circuits. Simulations show a 32-bit CD domino adder comprised of four 8-bit full adders to be 30% faster than a 32-bit standard domino adder, anal a 32-bit CD domino adder comprised of a single 32-bit full adder to be 45% faster. In the combinational logic benchmark circuits, complex inverting and non-inverting gates were used to implement C1355, C3540, and b9. The CD domino circuits were 22%, 43% and 34% faster than their static CMOS counterparts of C1355, C3540 and b9, respectively.

62 citations


Proceedings ArticleDOI
12 May 1996
TL;DR: The adder has a novel array structure which represents a variant of the architecture suggested by Brent and Kung, however, it does not require the back propagation of the signals which is necessary for the intermediate carry bits; hence only log/sub 2/ n logic levels are employed for the generation of all the carry signals.
Abstract: A 3.5 ns, 64 bit, carry-lookahead adder has been designed in full-custom domino logic and manufactured in a standard 1 /spl mu/m CMOS technology featuring two metal levels. The adder has a novel array structure which represents a variant of the architecture suggested by Brent and Kung. As opposed to the latter, however, it does not require the back propagation of the signals which is necessary for the intermediate carry bits; hence only log/sub 2/ n logic levels are employed for the generation of all the carry signals. Furthermore, the structure is highly regular and modular and can be assembled with n log/sub 2/ n identical cells with a fan-out of 2. Therefore, a compact circuit is achieved with excellent performance. The occupied area is 3370/spl times/482 /spl mu/m/sup 2/ with a worst-case 650 mW power dissipation at 100 MHz.

15 citations


Patent
Roney S. Wong1
22 Apr 1996
TL;DR: In this article, the n-bit average of four signed or unsigned n-bits integer operands (A, B, C and D) rounded away from zero as prescribed in the MPEG standard is calculated in one instruction cycle by appending two bits to a left side of each of the operands, summing the extended operands to provide an n+2 bit sum, removing the two least significant bits of the n + 2 bits sum, and incrementing the n -bit sum as appropriate.
Abstract: The n-bit average of four signed or unsigned n-bit integer operands (A, B, C and D) rounded away from zero as prescribed in the MPEG standard is calculated in one instruction cycle by appending two bits to a left side of each of the operands to provide four n+2 bit extended operands, summing the extended operands to provide an n+2 bit sum, removing the two least significant bits of the n+2 bit sum to provide an n-bit sum, and incrementing the n-bit sum as appropriate. An append circuit (302) appends two bits to the left sides of the operands, and the extended operands are coupled to an adder circuit (306) that includes adder logic (308) and an n-bit carry lookahead adder (310). The adder logic (308) provides the two least significant bits of the sum of the extended operands, along with n partial sum bits and n partial carry bits to the adder (310). The adder (310) provides a sum output, representing the n most significant bits of the sum of the extended operands, and a sum-plus-one output representing the sum output incremented by one. A multiplexer (314) under control of a control circuit (312) selects one of the sum and sum-plus-one outputs as the n-bit average based on inspection of the two least significant bits and the most significant bit of the sum of the extended operands, and a mode signal indicative of whether the operands are signed or unsigned values.

15 citations


Patent
08 Oct 1996
TL;DR: In this paper, the shift amount generator includes a multiple input adder utilizing carry save adder and carry lookahead adder techniques to minimize delay, and separate decoders for each multiplexer or group of multiplexers.
Abstract: A floating point arithmetic unit performs a multiply-add function B+(A*C) in which an alignment shifter is responsive to an input signal representative of the B mantissa. The shifter includes a sequential stack of multiplexers, typically three (3), for shifting the B mantissa to align it with the A*C product, and a complementer contained between two of the multiplexers to invert the signals when B is a negative number. A shift amount generator responsive to the A, B and C exponents produces control signals for the multiplexers. The shift amount generator includes a multiple input adder utilizing carry save adder and carry lookahead adder techniques to minimize delay, and separate decoders for each multiplexer or group of multiplexers. The generator also includes a Leading Zeros Anticipator (LZA) circuit for the most significant bits to limit shift amount signals that are within the shifting range of the shifter, which reduces the delay attributed to the carry lookahead adder. The multiplexers are arranged in a sequence such that the control signals for the first multiplexers are dependent only on the least significant bits and thus can be generated earliest, and therefore the delay of these multiplexers and the delay of the complementer is in parallel with the delay for producing the control signals to the last multiplexers.

14 citations


Journal ArticleDOI
TL;DR: The work presented here examines the implementation of the most basic element in any datapath-an adder, a carry elimination adder (CEA), which uses self-timing at both the algorithmic and implementation levels and presents a minimal hardware high speed addition mechanism.
Abstract: Recent advances in VLSI technology have facilitated high levels of integration and the implementation of faster circuits on a chip. Most of the improvements in the performance of digital systems have been brought about by such faster technologies. However, these improvements in technology have brought along with them a host of other constraints. In the faster deep submicron technologies, the wire delays constitute a significant portion of the overall delay of the system and hence some of the advantages of faster technologies are lost. The high level of integration necessitates clock distribution schemes which minimize skew across the die. These result in area penalties and adversely affect the level of integration possible at the chip level. Hence, changes in the basic architecture of computing elements of a system, which when implemented in silicon introduces reduced interconnect delays and simpler clock distribution networks, will result in more effective performance improvements. The work presented here examines the implementation of the most basic element in any datapath-an adder. The adder, a carry elimination adder (CEA), uses self-timing at both the algorithmic and implementation levels and presents a minimal hardware high speed addition mechanism. The adder exploits the nature of the input operands dynamically, which results in its average case convergence time approaching that of the ubiquitous carry lookahead adder (CLA) and the hardware complexity of a carry ripple adder (CRA). Use of self-timing results in the elimination of a global clock and hence clock-skew.

8 citations


Journal ArticleDOI
TL;DR: This paper presents a new high radix square rooting algorithm where a number of square root bits (one digit) are generated in one step, which offers a higher speed than that of the conventional bit parallel binary one.
Abstract: This paper presents a new high radix square rooting algorithm where a number of square root bits (one digit) are generated in one step. Therefore, the proposed algorithm offers a higher speed than that of the conventional bit parallel binary one. This algorithm can be considered as a generalisation of the conventional bit parallel binary algorithm, and therefore it can be implemented using the existing simple binary elements. The proposed algorithm makes use only of the odd values of the square root to generate the possible values of the radicand and therefore, it requires less area than the conventional restoring high radix algorithm which uses all the values of the square root. This algorithm is general for any radix. Any adder can be used in the basic cell, it can be a carry ripple adder or a carry lookahead adder. As an example of a radix-2k square root architecture, a 9-bit radix-23 architecture is presented in this paper.

3 citations


Proceedings ArticleDOI
03 Nov 1996
TL;DR: This paper examines the optimization of the 64-bit spanning tree carry lookahead adder by sizing the transistors in the different Manchester carry chain blocks and by adjusting the block widths within the carry tree to reduce the critical delay paths of the carry signals.
Abstract: This paper examines the optimization of the 64-bit spanning tree carry lookahead adder by sizing the transistors in the different Manchester carry chain blocks and by adjusting the block widths within the carry tree to reduce the critical delay paths of the carry signals. Previous spanning tree designs are re-simulated using HSPICE, with parameters for a 0.35 /spl mu/m CMOS process, to compare against the circuits designed for this paper. After analyzing many different configurations using the 16-bit carry select boundary, two circuits employing an 8-bit carry select boundary are designed and simulated.

2 citations


Proceedings ArticleDOI
23 Sep 1996
TL;DR: In this paper Pseudo Dynamic Latched Logic (PDLL) is introduced, this class of logic takes benefits of both static and dynamic structures, by using a permanently refreshing circuitry which allows functionality even at low frequencies and high temperatures.
Abstract: In this paper Pseudo Dynamic Latched Logic (PDLL) is introduced. This class of logic takes benefits of both static and dynamic structures, by using a permanently refreshing circuitry which allows functionality even at low frequencies and high temperatures. Moreover, because of its dynamic structure, complex gates are possible with a subsequent delay-area-power reduction. PDLL performance is demonstrated by implementing a 4-bit carry lookahead adder fully operative in a range of 6 to 100/spl deg/C. The adder operates at 0.8 GHz with an associated power dissipation of only 5.2 mW.

1 citations