scispace - formally typeset
Search or ask a question

Showing papers on "Adder published in 1999"


Proceedings ArticleDOI
R. Zimmermann1
14 Apr 1999
TL;DR: It is shown that the parallel-prefix adder architecture is well suited to realize fast end-around-carry adders used for modulo addition, and a high-performance modulo multiplier-adder for the IDEA block cipher is presented.
Abstract: New VLSI circuit architectures for addition and multiplication modulo (2/sup n/-1) and (2/sup n/+1) are proposed that allow the implementation of highly efficient combinational and pipelined circuits for modular arithmetic. It is shown that the parallel-prefix adder architecture is well suited to realize fast end-around-carry adders used for modulo addition. Existing modulo multiplier architectures are improved for higher speed and regularity. These allow the use of common multiplier speed-up techniques like Wallace-tree addition and Booth recoding, resulting in the fastest known modulo multipliers. Finally, a high-performance modulo multiplier-adder for the IDEA block cipher is presented. The resulting circuits are compared qualitatively and quantitatively, i.e., in a standard-cell technology, with existing solutions and ordinary integer adders and multipliers.

304 citations


Proceedings ArticleDOI
04 Mar 1999
TL;DR: The proposed SERF adder design was proven to be superior to the other three designs in power dissipation and area, and second in propagation delay only to the DVL adder.
Abstract: A novel low power and low transistor count static energy recovery full adder (SERF) is presented in this paper. The power consumption and general characteristics of the SERF adder are then compared against three low powerful adders; the transmission function adder (TFA) the dual value logic (DVL) adder and the fourteen transistor (14 T) full adder. The proposed SERF adder design was proven to be superior to the other three designs in power dissipation and area, and second in propagation delay only to the DVL adder. The combination of low power and low transistor count makes the new SERF cell a viable option for low power design.

197 citations


Patent
26 Feb 1999
TL;DR: An encryption chip is programmable to process a variety of secret key and public key encryption algorithms as mentioned in this paper, which includes a pipeline of processing elements, each of which can process a round within a secret key algorithm.
Abstract: An encryption chip is programmable to process a variety of secret key and public key encryption algorithms. The chip includes a pipeline of processing elements, each of which can process a round within a secret key algorithm. Data is transferred between the processing elements through dual port memories. A central processing unit allows for processing of very wide data words from global memory in single cycle operations. An adder circuit is simplified by using plural relatively small adder circuits with sums and carries looped back in plural cycles. Multiplier circuitry can be shared between the processing elements and the central processor by adapting the smaller processing element multipliers for concatenation as a very wide central processor multiplier.

155 citations


Journal ArticleDOI
01 Jun 1999
TL;DR: This method has a closed-form solution for the table entries and can be applied to any differentiable function and requires two to three orders of magnitude less memory than conventional table lookups.
Abstract: This paper presents a high-speed method for computing elementary functions using parallel table lookups and multi-operand addition. Increasing the number of tables and inputs to the multi-operand adder significantly reduces the amount of memory required. Symmetry and leading zeros in the table coefficients are used to reduce the amount of memory even further. This method has a closed-form solution for the table entries and can be applied to any differentiable function. For 24-bit operands, it requires two to three orders of magnitude less memory than conventional table lookups.

132 citations


Proceedings ArticleDOI
20 Oct 1999
TL;DR: It is explained how exclusive OR and NOR circuits (XOR/XNOR) are used to realize a general full adder circuit based on pass transistors, which is realized using only 14 MOSFETs, while having full voltage-swing in all circuit nodes.
Abstract: We explain how exclusive OR and NOR circuits (XOR/XNOR) are used to realize a general full adder circuit based on pass transistors. A six-transistor CMOS XOR circuit that also produces a complementary XNOR output is introduced in the general full adder. The resulting full adder circuit is realized using only 14 MOSFETs, while having full voltage-swing in all circuit nodes. Layouts have been made in a 0.35 /spl mu/m process for both the proposed full adder circuit and another 16-transistor full adder circuit based on pass transistors. The performance of the proposed full adder is evaluated by comparison of the simulation results obtained from HSPICE for both layouts. The two adders yield similar performance in terms of power consumption, power delay product, and propagation delay. The area is somewhat lower for the proposed adder due to the reduced device count. However, due to two feedback MOSFETs in the proposed adder that need to be ratioed, there is a higher cost in terms of design effort for the proposed adder.

123 citations


Journal ArticleDOI
TL;DR: The design presented here incorporates a concurrent position correction logic, operating in parallel with the LOP, to detect the presence of that error and produce the correct shift amount.
Abstract: This paper describes the design of a leading-one prediction (LOP) logic for floating-point addition with an exact determination of the shift amount for normalization of the adder result. Leading-one prediction is a technique to calculate the number of leading zeros of the result in parallel with the addition. However, the prediction might be in error by one bit and previous schemes to correct this error result in a delay increase. The design presented here incorporates a concurrent position correction logic, operating in parallel with the LOP, to detect the presence of that error and produce the correct shift amount. We describe the error detection as part of the overall LOP, perform estimates of its delay and complexity, and compare with previous schemes.

98 citations


Journal ArticleDOI
TL;DR: In this article, the authors demonstrate an all-optical digital processing circuit that can perform the full addition of binary optical words, using two coupled alloptical regenerative memories.

83 citations


Patent
Andrew J. Miller1
21 Jul 1999
TL;DR: In this article, a bit-serial multiplier and an infinite impulse response filter implemented on an FPGA are described in various embodiments, and function generators are arranged to perform bit serial multiplication of values in the multiplier and multiplicand memories.
Abstract: A bit-serial multiplier and an infinite impulse response filter implemented therewith, both implemented on an FPGA, are described in various embodiments. The bit-serial multiplier includes function generators configured as a multiplicand memory, a multiplier memory, a product memory, a bit-serial multiplier, and a bit-serial adder. The function generators are arranged to perform bit-serial multiplication of values in the multiplier and multiplicand memories.

80 citations


Proceedings ArticleDOI
14 Apr 1999
TL;DR: The design and implementation of a double precision floating-point IEEE-754 standard adder which uses "flagged prefix addition" to merge rounding with the significand addition and incorporates a fast prediction scheme for the true subtraction of significands with an exponent difference of 1.
Abstract: The design and implementation of a double precision floating-point IEEE-754 standard adder is described which uses "flagged prefix addition" to merge rounding with the significand addition. The floating-point adder is implemented in 0.5 /spl mu/m CMOS, measures 1.8 mm/sup 2/, has a 3-cycle latency and implements all rounding modes. A modified version of this floating-point adder can perform accumulation in 2-cycles with a small amount of extra hardware for use in a parallel processor node. This is achieved by feeding back the previous un-normalised but correctly rounded result together with the normalisation distance. A 2-cycle latency floating-point adder architecture with potentially the same cycle time that also employs flagged prefix addition is described. It also incorporates a fast prediction scheme for the true subtraction of significands with an exponent difference of 1, with one less adder.

73 citations


Journal ArticleDOI
TL;DR: An energy-efficient carry-lookahead adder using reversible energy recovery logic (RERL), which is a new dual-rail reversible adiabatic logic, and an eight-phase, clocked power generator that requires an off-chip inductor is described.
Abstract: In this paper, we describe an energy-efficient carry-lookahead adder using reversible energy recovery logic (RERL), which is a new dual-rail reversible adiabatic logic. We also describe an eight-phase, clocked power generator that requires an off-chip inductor. For the energy-efficient design of reversible logic, we explain how to control the overhead of reversibility with a self-energy-recovery circuit. A test chip was implemented with a 0.8 /spl mu/m CMOS technology, which included two 16-bit carry-lookahead adders to allow fair comparison: an RERL one and a static CMOS one. Experimental results showed that the RERL adder had substantial advantages in energy consumption over the static CMOS one at low operating frequencies. We also confirmed that we could minimize the energy consumption in the RERL circuit by reducing the operating frequency until adiabatic and leakage losses were equal.

71 citations


Journal ArticleDOI
Hiroomi Hikawa1
TL;DR: Simple and modular structure of the proposed MNN leads to a massive parallel and flexible network architecture, which is well suited for very large scale integration (VLSI) implementation.
Abstract: A new digital architecture of the frequency-based multilayer neural network (MNN) with on-chip learning is proposed. As the signal level is expressed by the frequency, the multiplier is replaced by a simple frequency converter, and the neuron unit uses the voting circuit as the nonlinear adder to improve the nonlinear characteristic. In addition, the pulse multiplier is employed to enhance the neuron characteristics. The backpropagation algorithm is modified for the on-chip learning. The proposed MNN architecture is implemented on field programmable gate arrays (FPGA) and the various experiments are conducted to test the performance of the system. The experimental results show that the proposed neuron has a very good nonlinear function owing to the voting circuit. The learning behavior of the MNN with on-chip learning is also tested by experiments, which show that the proposed MNN has good learning and generalization capabilities. Simple and modular structure of the proposed MNN leads to a massive parallel and flexible network architecture, which is well suited for VLSI implementation.

Journal ArticleDOI
TL;DR: Based on simulation studies, a temporally tiled array multiplier achieves 50% and 35% improvements in delay and power dissipation compared to a conventional array multiplier.
Abstract: Digital multipliers are a major source power dissipation in digital signal processors. Array architecture is a popular technique to implement these multipliers due to its regular compact structure. High power dissipation in these structures is mainly due to the switching of a large number of gates during multiplication. In addition, much power is also dissipated due to a large number of spurious transitions on internal nodes. Timing analysis of a full adder, which is a basic building block in array multipliers, has resulted in a different array connection pattern that reduces power dissipation due to the spurious transition activity. Furthermore, this connection pattern also improves the multiplier throughput. This array pattern is based on creating a compact tiled structure, wherein the shape of a tile represents the delay through that tile. That is, a compact structure created using these tiles is nothing but a structure with high throughput. Such a temporal tiling technique can also be applied to other digital circuits. Based on our simulation studies, a temporally tiled array multiplier achieves 50% and 35% improvements in delay and power dissipation compared to a conventional array multiplier. Improvement in delay can be traded for power using voltage reduction techniques.

Patent
02 Mar 1999
TL;DR: In this paper, an infinite impulse response (IIR) digital filter and method of performing the same is disclosed, which can be realized by programmable logic devices, such as a digital signal processor (75), or alternatively by way of dedicated logic including adders (44, 48, 50, 54, 58, 62, 66, 70, 72) and shifters (46, 52, 56,.60, 64).
Abstract: An infinite impulse response (IIR) digital filter and method of performing the same is disclosed. The digital filter may be realized by way of a programmable logic device, such as a digital signal processor (75), or alternatively by way of dedicated logic including adders (44, 48, 50, 54, 58, 62, 66, 70, 72) and shifters (46, 52, 56, .60, 64). In either case, addition operations (34) are interleaved among first and second output sample values (yn-1, yn-2), so that the resulting addition (30; 72; 215; 320) may be carried out with adder circuitry of the same precision as the signal input (xn) and signal output (yn). Carry control circuitry (76, 78, 80, 82, 84, 88; 217; 317) is provided to efficiently incorporate magnitude truncation quantization.

Patent
13 Dec 1999
TL;DR: In this article, a Rake receiver with fingers and a combiner is described, where the combiner combines outputs from the frequency discriminators to output an average error signal, which is fed back to the discriminators for removing frequency offsets from all the fingers.
Abstract: A Rake receiver having fingers is disclosed. The Rake receiver includes frequency discriminators (110-1, 110-2, 110-3) for automatic frequency control and a combiner (205). Each finger includes a frequency discriminator (110-1, 110-2, 110-3). The combiner (205) combines outputs from the frequency discriminators (110-1, 110-2, 110-3) to output an average error signal which is fed back to the frequency discriminators (110-1, 110-2, 110-3) for removing frequency offsets from all the fingers. The combiner (205) includes an adder (207) which adds the outputs of the frequency discriminators (110-1, 110-2, 110-3), and a divide circuit which divides the combined adder output by the number of the outputs which were added together to form the average error signal. The combiner further includes a filter connected between the adder and the divide circuit. In addition, further adders are provided where each of theses adders receives the average error signal and a respective one of the outputs from the frequency discriminators in order to provide an estimate of one of the frequency offsets.

Patent
18 Feb 1999
TL;DR: In this paper, a programmable multi-mode accelerator is proposed for low-precision operations at an extremely high rate, such as finite impulse response (FIR), correlation and Viterbi.
Abstract: A programmable multi-mode accelerator is disclosed for use with a programmable processor or microprocessor. The programmable multi-mode accelerator allows a programmable processor to execute specific algorithms, such as certain types of finite impulse response (FIR), correlation and Viterbi computations, that require low-precision operations at an extremely high rate. The accelerator extends the digital signal processor's performance into the required range for low-precision computations. The accelerator can be coupled with the main data path of a programmable processor or microprocessor and can directly read and write to the main register files of the programmable processor. In an illustrative implementation, the accelerator data path accesses its input values (source operands) directly from a main register file of the programmable processor and writes results back into a second main register file. The accelerator allows a plurality of low-precision algorithms requiring primarily addition or multiply-add computations, such as finite impulse response, correlation and Viterbi computations, to utilize the same adder cells. The accelerator includes a multi-mode adder that can be programmatically reconfigured to perform various addition computations. In a first mode, referred to as the “single-add mode,” the adder operates as a 17-input 16-bit adder. The single-add mode can be utilized to perform finite impulse response and correlation computations. The second mode, referred to as the “ACS mode,” can be utilized to perform Viterbi computations. The accelerator has a small instruction set and instruction memory and, once started by the main data path, the accelerator executes its own instruction stream. In addition, the accelerator includes a delay line having delays of z −1 or z −2 .

Book ChapterDOI
30 Aug 1999
TL;DR: This paper describes the study of a new field programmable gate array architecture based on on-line arithmetic, dedicated to single chip implementation of numerical algorithms in low-power signal processing and digital control applications.
Abstract: This paper describes the study of a new field programmable gate array architecture based on on-line arithmetic. This architecture, called Field Programmable On-line oPerators (FPOP), is dedicated to single chip implementation of numerical algorithms in low-power signal processing and digital control applications. FPOP is based on a reprogrammable array of on-line arithmetic operators. On-line arithmetic is a digit-serial arithmetic with most significant digits first using a redundant number system. The digit-level pipeline, the small number of communication wires between the operators and the small size of the arithmetic operators lead to high-performance parallel computations. In FPOP, the basic elements are arithmetic operators such as adders, subtracters, multipliers, dividers, square-rooters, sine or cosine operators.... An equation model is then sufficient to describe the mapping of the algorithm on the circuit. The digit-serial communication mode also significantly reduces the necessary programmable routing resources compared to standard FPGAs.

Proceedings ArticleDOI
30 May 1999
TL;DR: A high-speed low-power 10-transistor 1-bit full adder cell that saves power, area, and time in an n-bit adder circuit and is used to build a prototype for a 32-bit ripple carry adder.
Abstract: In this paper, we introduce a high-speed low-power 10-transistor 1-bit full adder cell. The critical path consists of an XOR gate; an inverter and one pass transistor. A prototype of the proposed adder cell in 0.6 /spl mu/m CMOS technology has an average delay time of 0.084 ns. It also exhibits low average power dissipation of 0.891/spl times/10/sup -4/ watt at frequency equal to one GHz. In an n-bit adder circuit, the new adder cell will give alternate polarity for the carryout in the odd and even positions. The inverters in the structure of the proposed FA cell act as drivers. Therefore, each stage will not suffer a degradation in its deriving capabilities. This saves power, area, and time. The new cell is used to build a prototype for a 32-bit ripple carry adder. This prototype has 384 transistors and it operates at 2.8 V with an average delay of 4.1 ns, and a low power dissipation of 2.6 mW at frequency equal to 250 MHz.

Patent
23 Dec 1999
TL;DR: A multiply-accumulate unit (MAC) as mentioned in this paper can perform Wallace tree and carry look-ahead adder functions simultaneously for different operations, such as lookahead adders and Wallace trees.
Abstract: A multiply-accumulate unit, or MAC, may achieve high throughput. The MAC need not use redundant hardware, such as multiple Wallace trees, or pipelining logic, yet may perform Wallace tree and carry look-ahead adder functions simultaneously for different operations.

Patent
26 Mar 1999
TL;DR: In this paper, the authors proposed to add circuits at a minimum to compensate the DC offset existent in the base band of modulation and demodulation systems by adding circuits at the minimum.
Abstract: PROBLEM TO BE SOLVED: To highly accurately compensate DC offset existent in the base band of modulation and demodulation systems by adding circuits at a minimum. SOLUTION: This device is provided with a switching part 102 for switching a reference signal and a base band signal, an adder 103 for compensating the DC offset of the modulation system by inputting the output of the switching part 102, a quadrature modulation part 106 for modulating a signal, for which the output of the adder 103 is converted to analog, to a carrier frequency, a distributor 108 for distributing a modulation signal, a switch 109 for turning on/off passage with one distributed signal as a feedback modulation signal, a quadrature demodulation part 110 for performing the quadrature demodulation of the feedback modulation signal, an adder 113 for compensating the DC offset of the demodulation system by inputting a signal, for which the quadrature demodulation signal is converted to digital, and a DC offset estimating part 114 for estimating the DC offset of the modulation and demodulation systems while using the reference signal and the output of the adder 113 so that high- accuracy DC offset compensation is enabled.

Patent
13 Apr 1999
TL;DR: In this article, an IC monitoring chip for remotely monitoring the output of a thermal diode formed in the substrate of a CPU for monitoring the temperature of the thermal plate of the CPU.
Abstract: An IC monitoring chip (10) for remotely monitoring the output of a thermal diode (5) formed in the substrate of a CPU (2) for monitoring the temperature of a thermal plate (3) of the CPU (2) comprises a signal conditioning circuit (12) which relays the output from the diode (5) to an analog-to-digital converter (14), which in turns outputs a two's compliment signal to an adder (22). The adder (22) adds the two's compliment signal to a temperature offset value stored in a temperature offset register (17), which compensates for the temperature difference between the diode (5) and the thermal plate (3). Comparators (24) and (25) compare the output from the adder (22) with upper and lower predetermined temperature limits in upper and lower limit registers (19) and (20) for determining the temperature of the thermal plate (3). The temperature offset value is stored in ROM (35) of the computer (1) and is written to the register (17) each time the computer (1) is powered up. The IC chip (10) operates independently of the CPU (2).

Patent
12 Oct 1999
TL;DR: In this article, the authors describe a multiplier that can be configured to perform multiplication of both scalar floating point values (X×Y) and packed floating-point values (i.e., X 1 ×Y 1 and X 2 ×Y 2 ).
Abstract: A multiplier configured to perform multiplication of both scalar floating point values (X×Y) and packed floating point values (i.e., X 1 ×Y 1 and X 2 ×Y 2 ). In addition, the multiplier may be configured to calculate X×Y−Z. The multiplier comprises selection logic for selecting source operands, a partial product generator, an adder tree, and two or more adders configured to sum the results from the adder tree to achieve a final result. The multiplier may also be configured to perform iterative multiplication operations to implement such arithmetical operations such as division and square root. The multiplier may be configured to generate two versions of the final result, one assuming there is an overflow, and another assuming there is not an overflow. A computer system and method for performing multiplication are also disclosed.

Patent
04 Feb 1999
TL;DR: A reconfigurable co-processor adapted for multiple multiply-accumulate operations as discussed by the authors includes plural pairs of multipliers, plural first adders receiving respective product outputs from a pairs of multiplier adders, and at least one second adder receiving sum output from a corresponding pair of first multipliers.
Abstract: A reconfigurable co-processor adapted for multiple multiply-accumulate operations includes plural pairs of multipliers, plural first adders receiving respective product outputs from a pairs of multipliers, and at least one second adder receiving sum outputs from a corresponding pair of first adders. The co-processor includes sign extend circuits at the output of each multiplier. One multiplier of each pair has a fixed left shift circuit that left shifts the product output a predetermined number of bits. The other multiplier in each pair includes a right shift circuit that right shifts the product output the number of bits. Multiplexers at the output of the first multiplier in each pair select the sign extended or the left shifted products. Multiplexers at the output of the second multiplier in each pair select the product, the right shifted product or pass through the inputs. The sign extend circuit for the second multiplier follows the multiplexer. Third adders receive the sum outputs of the second adders and produce a third sum output. These third adders include plural selectable output accumulators and variable right shifter at their outputs. The third adders may separately sum the product sums from four multipliers each. Alternatively, the third adders may accumulate the products of eight multipliers.

Patent
08 Oct 1999
TL;DR: In this article, the authors present a partitioned multiplier circuit which is designed for high speed operations, which can perform one 32×32 bit multiplication, two 16×16 bit multiplications (simultaneously) or four 8×8 bits multiplications depending on input partitioning signals.
Abstract: A partitioned multiplier circuit which is designed for high speed operations. The multiplier of the present invention can perform one 32×32 bit multiplication, two 16×16 bit multiplications (simultaneously) or four 8×8 bit multiplications (simultaneously) depending on input partitioning signals. The time required to perform either the 32×32 bit or the 16×16 bit or the 8×8 bit multiplications is constant. Therefore, multiplication results are available with a constant latency regardless of operand bit-size. In one embodiment, the latency is two clock cycles but the multiplier circuit has a throughput of one clock cycle due to pipelining. The input operands can be signed or unsigned. The hardware is partitioned without any significant increase in the delay or area and the multiplier can provide six different modes of operation. In one embodiment, Booth encoding is used for the generation of 17 partial products which are compressed using a compression tree into two 64-bit values. This is performed in the first pipeline stage to generate a sum and a carry vector. These values are then added, in the second pipestage, using a carry propagate adder circuit to provide a single 64-bit result. In the case of 16×16 bit multiplication, the 64-bit result contains two 32-bit results. In the case of 8×8 bit multiplication, the 64-bit result contains four 16-bit results. Due to its high operating speed, the multiplier circuit is advantageous for use in multi-media applications, such as audio/visual rendering and playback.

Journal ArticleDOI
TL;DR: A new property of the Chinese remainder theorem (CRT) is presented and this property is used to develop a fast converter for this 3 moduli set, based on carry-save adders and one carry-propagate adder stage, without the need for any look-up tables.
Abstract: This paper presents a new fast RNS converter for the 3 moduli set of the form {2/sup n/-1,2/sup n/,2/sup n/+1}. A new property of the Chinese remainder theorem (CRT) is also presented and this property is used to develop a fast converter for this 3 moduli set. The resulting implementation is based on carry-save adders and one carry-propagate adder stage, without the need for any look-up tables. The new design is faster and smaller than existing designs.

Patent
Keith Bryan Hardin1
27 Aug 1999
TL;DR: In this article, a digital spread spectrum clock circuit is made variable by employing RAM memory (29), and a multiplexer (39) to receive initiation data before the circuit is ready to run normally.
Abstract: A digital spread spectrum clock circuit is made variable by employing RAM memory (29), and a multiplexer (39) to receive initiation data before the circuit is ready to run normally. A second counter (17) and an adder (25) each receiving the contents of programmable registers (17 and 25) permit wide variation in operation although RAM memory is of relatively small size.

Proceedings ArticleDOI
30 May 1999
TL;DR: The proposed input test pattern proves the correct functionality, and produces correct time delay and power dissipation, and guarantees correct and fair comparison among different full adder cells.
Abstract: Evaluating the performance measures of a full adder cell, like other circuits, is input pattern dependent. The issue gets more complicated when evaluating several parameters such as time delay, area, power dissipation, and correct functionality at the same time. The proposed input test pattern is based on full coverage of all possible transitions from one input pattern to another. It is composed of two parts: the first is a 56 transitions input pattern for speed measurement, followed by 9 different input patterns concatenated together for power consumption measurement. The proposed input test pattern proves the correct functionality, and produces correct time delay and power dissipation. Using this input test pattern guarantees correct and fair comparison among different full adder cells.

Journal ArticleDOI
TL;DR: A construction of uniquely decodable codes for the two-user binary adder channel that are greater than the rates guaranteed by the Coebergh van den Braak and van Tilborg construction and can be used with simple encoding and decoding procedures.
Abstract: A construction of uniquely decodable codes for the two-user binary adder channel is presented. The rates of the codes obtained by this construction are greater than the rates guaranteed by the Coebergh van den Braak and van Tilborg construction and these codes can be used with simple encoding and decoding procedures.

Proceedings ArticleDOI
01 Jan 1999
TL;DR: A new noise-tolerant dynamic circuit technique is presented, and the average noise threshold energy (ANTE) and the energy normalized ANTE metrics are proposed for quantifying the noise immunity and energy efficiency of circuit techniques.
Abstract: Noise in deep submicron technology combined with the move towards dynamic circuit techniques for higher performance have raised concerns about reliability and energy efficiency of VLSI systems in the deep submicron era. To address this problem, a new noise-tolerant dynamic circuit technique is presented. In addition, the average noise threshold energy (ANTE) and the energy normalized ANTE metrics are proposed for quantifying the noise immunity and energy efficiency, respectively, of circuit techniques. Simulation results in 0.35 micron CMOS for NAND gate designs indicate that the proposed technique improves the ANTE and energy normalized ANTE by 2.54X and 2.25X over the conventional domino circuit. The improvement in energy normalized ANTE is 1.22X higher than the existing noise-tolerance techniques. A full adder design based on the proposed technique improves the ANTE and energy normalized ANTE by 3.7X and 1.95X over the conventional dynamic circuit. In comparison, the static circuit improves ANTE by 2.2X but degrades the energy normalized ANTE by 11%. In addition, the proposed technique has a smaller area overhead (69%) as compared to the static circuit whose area overhead is 98%.


Patent
22 Mar 1999
TL;DR: In this article, an error cancelation vector modulator is used to adjust the output of a feed-forward linearizer, such that the output signal of the signal cancelation adder is a signal that substantially represents the error components provided by the first amplifier, with substantially no internmodulation components.
Abstract: A feedforward linearizer for amplifying an input signal comprises a signal cancellation circuit which has a first branch and a second branch. A first amplifier provided in the first branch receives the input signal intended to be amplified and generates an output signal received by a signal cancellation vector modulator. A signal cancellation adder receives the signal generated by the signal cancellation vector modulator and the input signal via the second branch and provides an error signal. The feedforward linearizer also comprises an error cancellation circuit that has a first branch and a second branch. An error cancellation adder in the first branch receives the output signal provided by the first amplifier and generates the output signal of the linearizer. An error cancellation vector modulator in the second branch receives an error signal provided by the signal cancellation adder and provides an error adjusted signal to a second auxiliary amplifier. The second auxiliary amplifier provides an input signal to the error cancellation adder. A digital signal processor provides a signal cancellation adjustment signal, α, to the signal cancellation vector modulator and an error cancellation adjustment signal, β, to the error cancellation vector modulator respectively, such that the output signal of the signal cancellation adder is a signal that substantially represents the error components provided by the first amplifier, and the output signal of the error cancellation adder is an amplified version of the input signal, with substantially no internmodulation components.