scispace - formally typeset
Search or ask a question

Showing papers on "Adder published in 1994"


Journal ArticleDOI
TL;DR: This work examines the possible implementation of logic devices using coupled quantum dot cells, which use these cells to design inverters, programmable logic gates, dedicated AND and OR gates, and non‐interfering wire crossings.
Abstract: We examine the possible implementation of logic devices using coupled quantum dot cells. Each quantum cell contains two electrons which interact Coulombically with neighboring cells. The charge distribution in each cell tends to align along one of two perpendicular axes, which allows the encoding of binary information using the state of the cell. The state of each cell is affected in a very nonlinear way by the states of its neighbors. A line of these cells can be used to transmit binary information. We use these cells to design inverters, programmable logic gates, dedicated AND and OR gates, and non‐interfering wire crossings. Complex arrays are simulated which implement the exclusive‐OR function and a single‐bit full adder.

1,149 citations


Patent
03 Nov 1994
TL;DR: In this paper, a monitoring circuit for an electrosurgical generator has active and return output conductors, and feedback modifies the output when the adders determine the power applied to the load (12) in real time.
Abstract: A monitoring circuit (10) for an electrosurgical generator (11) has active and return output conductors. Voltage, current (24) and the inverse of current (24) picked up inductively are provided to adder circuits for summing the picked up voltage (20) and current (24) and computing the difference of the picked up voltage (20) and the current (24). Root mean square to direct current converters (26 and 28) signal RMS average values of the sum and difference. A microprocessor squares the values and applies them to a formula wherein the sum signals (22) have subtracted therefrom the difference signals (25); the results are divided by four to provide the root mean square of the power applied to the load (12). During desiccation the output is regulated in response to impedance to shut off output. A diagnostic circuit relates impedance load and output response during operation to a look up table or a microprocessor algorithm to calibrate. Feedback modifies the output when the adders determine the power applied to the load (12) in real time. A method has generator output to active and return conductors (14 and 15) and to inductive pick ups (16 and 17) for voltage and current (24), computes sum and differential values (25), changes root mean square to direct currents (24), squares the values and subtracts the differential from the summation, then divides the result finding the root mean square value of the power.

876 citations


Journal ArticleDOI
TL;DR: A comprehensive study of new residue generators and MOMA's is presented and four design schemes of the n-input residue generators mod A, which are best suited for various pairs of n and A, are proposed.
Abstract: Residue generator is an essential building block of encoding/decoding circuitry for arithmetic error detecting codes and binary-to-residue number system (RNS) converter. In either case, a residue generator is an overhead for a system and as such it should be built with minimum amount of hardware and should not compromise the speed of a system. Multioperand modular adder (MOMA) is a computational element used to implement various operations in digital signal processing systems using RNS. A comprehensive study of new residue generators and MOMA's is presented. The design methods given here take advantage of the periodicity of the series of powers of 2 taken module A (A is a module). Four design schemes of the n-input residue generators mod A, which are best suited for various pairs of n and A, are proposed. Their pipelined versions can be clocked with the cycle determined by the delay of a full-adder and a latch. A family of design methods for parallel and word-serial, using similar concepts, is also given. Both classes of circuits employ new highly-parallel schemes using carry-save adders with end-around carry and a minimal amount of ROM and are well-suited for VLSI implementation. They are faster and use less hardware than similar circuits known to date. One of the MOMA's can be used to build a high-speed residue-to-binary converter based on the Chinese remainder theorem. >

224 citations


Patent
29 Jun 1994
TL;DR: In this paper, a method for adding information to image data of a multicolor image and a monochromatic image includes the performing of a dispersion conversion on the spectrum of additional information to be added to the image data by multiplying a code sequence of a PN sequence generator by the additional data which is inputted from an input terminal and converted into serial data in a P/S converter.
Abstract: An apparatus and method for adding information to image data of a multicolor image and a monochromatic image includes the performing of a dispersion conversion on the spectrum of additional information to be added to the image data by multiplying a code sequence of a PN sequence generator by the additional data which is inputted from an input terminal and converted into serial data in a P/S converter. At this time, the data sequence transmitted from an image signal processor to a printer engine 107 is converted by a scan converter to correspond to the spatial axis for dispersion. The converted data sequence is added to the output from an adder by another adder 110, and reconverted to the original scan by a scan inverter.

200 citations


01 Jan 1994
TL;DR: Methods of implementing binary multiplication with the smallest possible latency are investigated, and traditional Booth encoded multipliers are superior in layout area, power, and delay to non-Booth encode multipliers.
Abstract: This thesis investigates methods of implementing binary multiplication with the smallest possible latency. The principle area of concentration is on multipliers with lengths of 53 bits, which makes the results suitable for IEEE-754 double precision multiplication. Low latency demands high performance circuitry, and small physical size to limit propagation delays. VLSI implementations are the only available means for meeting these two requirements, but efficient algorithms are also crucial. An extension to Booth's algorithm for multiplication (redundant Booth) has been developed, which represents partial products in a partially redundant form. This redundant representation can reduce or eliminate the time required to produce "hard" multiples (multiples that require a carry propagate addition) required by the traditional higher order Booth algorithms. This extension reduces the area and power requirements of fully parallel implementations, but is also as fast as any multiplication method yet reported. In order to evaluate various multiplication algorithms, a software tool has been developed which automates the layout and optimization of parallel multiplier trees. The tool takes into consideration wire and asymmetric input delays, as well as gate delays, as the tree is built. The tool is used to design multipliers based upon various algorithms, using both Booth encoded, non-Booth encoded and the new extended Booth algorithms. The designs are then compared on the basis of delay, power, and area. For maximum speed, the designs are based upon a 0.6$\mu$ BiCMOS process using emitter coupled logic (ECL). The algorithms developed in this thesis make possible 53 x 53 multipliers with a latency of less than 2.6 nanoseconds @ 10.5 Watts and a layout area of 13mm$\sp2$. Smaller and lower power designs are also possible, as illustrated by an example with a latency of 3.6 nanoseconds @ 5.8 W, and an area of 8.9mm$\sp2$. The conclusions based upon ECL designs are extended where possible to other technologies (CMOS). Crucial to the performance of multipliers are high speed carry propagate adders. A number of high speed adder designs have been developed, and the algorithms and design of these adders are discussed. The implementations developed for this study indicate that traditional Booth encoded multipliers are superior in layout area, power, and delay to non-Booth encoded multipliers. Redundant Booth encoding further reduces the area and power requirements. Finally, only half of the total multiplier delay was found to be due to the summation of the partial products. The remaining delay was due to wires and carry propagate adder delays.

158 citations


Journal ArticleDOI
01 Dec 1994
TL;DR: The implementation of a 200 MHz 13.3 mm/sup 2/ 8/spl times/8 2-D DCT macrocell capable of HDTV rates, based on a direct realization of the DCT, and using distributed arithmetic is presented.
Abstract: The two-dimensional discrete cosine transform (2D DCT) has been widely recognized as a key processing unit for image data compression/decompression. In this paper, the implementation of a 200 MHz 13.3 mm/sup 2/ 8/spl times/8 2-D DCT macrocell capable of HDTV rates, based on a direct realization of the DCT, and using distributed arithmetic is presented. The macrocell, fabricated using 0.8 /spl mu/m base-rule CMOS technology and 0.5 /spl mu/m MOSFET's, performs the DCT processing with 1 sample-(pixel)-per-clock throughput. The fast speed and small area are achieved by a novel sense-amplifying pipeline flip-flop (SA-F/F) circuit technique in combination with nMOS differential logic. The SA-F/F, a class of delay flip-flops, can be used as a differential synchronous sense-amplifier, and can amplify dual-rail inputs with swings lower than 100 mV. A 1.6 ns 20 bit carry skip adder used in the DCT macrocell, which was designed by the same scheme, is also described. The adder is 50% faster and 30% smaller than a conventional CMOS carry look ahead adder, which reduces the macrocell size by 15% compared to a conventional CMOS implementation. >

156 citations


Journal ArticleDOI
TL;DR: Hardware designs that produce exactly rounded results for the functions of reciprocal, square-root, 2/sup x/, and log/sub 2/(x) are presented, and delay and area comparisons are made based on the degree of the approximating polynomial and the accuracy of the final result.
Abstract: This paper presents hardware designs that produce exactly rounded results for the functions of reciprocal, square-root, 2/sup x/, and log/sub 2/(x). These designs use polynomial approximation in which the terms in the approximation are generated in parallel, and then summed by using a multi-operand adder. To reduce the number of terms in the approximation, the input interval is partitioned into subintervals of equal size, and different coefficients are used for each subinterval. The coefficients used in the approximation are initially determined based on the Chebyshev series approximation. They are then adjusted to obtain exactly rounded results for all inputs. Hardware designs are presented, and delay and area comparisons are made based on the degree of the approximating polynomial and the accuracy of the final result. For single-precision floating point numbers, a design that produces exactly rounded results for all four functions has an estimated delay of 80 ns and a total chip area of 98 mm/sup 2/ in a 1.0-micron CMOS technology. Allowing the results to have a maximum error of one unit in the last place reduces the computational delay by 5% to 30% and the area requirements by 33% to 77%. >

121 citations


Patent
02 Aug 1994
TL;DR: In this paper, an apparatus for combining the contents of an X register, shifted by m places, with the corresponding contents of a Y register to generate a result Z is presented.
Abstract: An apparatus for combining the contents of an X register, shifted by m places, with the contents of a Y register to generate a result Z. The functional unit can also be configured to perform parallel operations on sub-operands in the X and Y registers. The division of the apparatus into sub-operands is controlled by a mask which specifies the boundary of the sub-operands. The shifting operation is accomplished by multiplexers that connect the pth bit of the X register to the adder stage that operates on bit Yp-m of the Y register. Circuitry is provided at the boundary of the sub-operands to prevent the bit signals corresponding to the X register from being routed across a sub-operand boundary. Similarly, circuitry is provided for preventing the carry output of an adder stage that operates on one sub-operand from being propagated to an adder stage that operates on another sub-operand.

98 citations


Journal ArticleDOI
TL;DR: An assessment of the strengths and weaknesses of using FPGA's for floating-point arithmetic.
Abstract: We present empirical results describing the implementation of an IEEE Standard 754 compliant floating-point adder/multiplier using field programmable gate arrays. The use of FPGA's permits fast and accurate quantitative evaluation of a variety of circuit design tradeoffs for addition and multiplication. PPGA's also permit accurate assessments of the area and time costs associated with various features of the IEEE floating-point standard, including rounding and gradual underflow. These costs are analyzed, along with the effects of architectural correlation, a phenomenon that occurs when the cost of combining architectural features exceeds the sum of separate implementation. We conclude with an assessment of the strengths and weaknesses of using FPGA's for floating-point arithmetic. >

93 citations


Patent
31 Jan 1994
TL;DR: In this paper, a synchronous memory device is provided in which a timing and control circuit (28) receives timing inputs and control inputs, and a state machine (130) operates to prevent indeterminate operation if invalid mode data is input to the operation mode register (29).
Abstract: A synchronous memory device is provided in which a timing and control circuit (28) receives timing and control inputs. A row address buffer (38) and row decoders (40 and 42) operate to enable rows in plural memory sections (30, 32, 34, and 36). Column decoders (58, 60, 62, and 64) operate to enable columns in each of the memory sections (respectively, 32, 36, 30 and 34). The column decoders (58, 60, 62, and 64) decode addresses received from counters (respectively 52, 54, 48, and 50), an adder (46), and a latch (56). Counters (48, 50, 52, and 54) and adder (46) generate column addresses for each memory section based on a starting address, thereby allowing for internal operation at less than the external system frequency. An operation mode register (29) stores mode data for controlling certain operations, and a state machine (130) operates to prevent indeterminate operation if invalid mode data is input to the operation mode register (29).

86 citations


Journal ArticleDOI
TL;DR: In this article, a new subfamily of rapid single-fluxquantum (RSFQ) digital cells, based on a single B flip-flop template, is introduced.
Abstract: We are introducing a new subfamily of rapid single-flux-quantum (RSFQ) digital cells, based on a single B flip-flop template. The template can be considered as an SFQ flip-flop with up to 4 inputs and 6 outputs. Each input SFQ pulse can change the flip-flop's internal state. Each output presents (in the form of SFQ pulses a specific logic function of the initial state of the cell and input signals. Simple connection of various inputs and/or outputs, combined with shortening or opening of certain branches of the template allows one to implement a variety of RSFQ cells with wide margins. Of these various RSFQ cells, we have designed and successfully tested the T1 cell (asynchronous toggle flip-flop with synchronous destructive readout), single-bit full adder, and single-bit stage of an up-down counter. Experimentally measured margins for dc power supply voltage for these circuits were /spl plusmn/24%, /spl plusmn/24%, and /spl plusmn/17%, respectively. >

Patent
Tetsuro Kawata1
16 Mar 1994
TL;DR: A field-programmable gate array (FPGA) as mentioned in this paper consists of a first group of signal lines interconnecting the logic elements adjacent to each other, and a second group of signals not adjacent to them to provide a high utilization of logic elements.
Abstract: A field-programmable gate array comprises regularly arrayed logic elements, a first group of signal lines interconnecting the logic elements adjacent to each other, and a second group of signal lines interconnecting the logic elements not adjacent to each other to provide a field-programmable gate array capable of forming an adder, logic operation unit, or the like having a high utilization of logic elements.

Patent
09 Sep 1994
TL;DR: In this paper, a sigma-delta signal converter is implemented using switched capacitor switching elements in which a first switch (31) serves as a mixer, and the output of the mixer is directed to the second input of an adder (16), and its second input is the feedback signal (f1) of the SDS signal converter, which is also directed into a base-frequency output signal through a decimator and low-pass filtering.
Abstract: A sigma-delta signal converter is implemented using switched capacitor switching elements in which a first switch (31) serves as a mixer (11). The output of the mixer is directed to the second input of an adder (16), and its second input is the feedback signal (f1) of the sigma-delta signal converter, which is also directed into a base-frequency output signal through a decimator (14) and low-pass filtering (15).

Journal ArticleDOI
TL;DR: A novel hybrid number representation is proposed that includes the two's complement representation and the signed-digit representation as special cases and is capable of bounding the maximum length of carry propagation chains during addition to any desired value between 1 and the entire word length.
Abstract: A novel hybrid number representation is proposed. It includes the two's complement representation and the signed-digit representation as special cases. The hybrid number representations proposed are capable of bounding the maximum length of carry propagation chains during addition to any desired value between 1 and the entire word length. The framework reveals a continuum of number representations between the two extremes of two's complement and signed-digit number systems and allows a unified performance analysis of the entire spectrum of implementations of adders, multipliers and alike. We present several static CMOS implementations of a two-operand adder which employ the proposed representations. We then derive quantitative estimates of area (in terms of the required number of transistors) and the maximum carry propagation delay for such an adder. The analysis clearly illustrates the trade-offs between area and execution time associated with each of the possible representations. We also discuss adder trees for parallel multipliers and show that the proposed representations lead to compact adder trees with fast execution times. In practice, the area available to a designer is often limited. In such cases, the designer can select the particular hybrid representation that yields the most suitable implementation (fastest, lowest power consumption, etc.) while satisfying the area constraint. Similarly, if the worst case delay is predetermined, the designer can select a hybrid representation that minimizes area or power under the delay constraint. >

Patent
Gensuke Goto1
05 Dec 1994
TL;DR: In this paper, a basic cell is formed by a predetermined number of partial product generators and one first multi-input adder, and the basic cells are repetitively arranged to obtain a rectangular configuration.
Abstract: A plurality of multiplicand bit transmission lines and a plurality of multiplier bit transmission lines or their decoding signal transmission lines are arranged in a two-dimensional plane, and partial product generators are arranged at their intersections. A plurality of rows of first multi-input adders are arranged at predetermined numbers of rows, and at least one row of second multi-input adders are arranged at predetermined numbers of the first multi-input adders. A basic cell is formed by a predetermined number of partial product generators and one first multi-input adder, and the basic cells are repetitively arranged to obtain a rectangular configuration.

Journal ArticleDOI
TL;DR: This work designs and performs initial experiments for handling 8-bit MSD number addition and subtraction and presents the results, to confirm the underlining operational principles of the proposed optoelectronic shared content-addressable-memory MSD adder.
Abstract: Addition is the most primitive arithmetic operation in digital computation. Other arithmetic operations such as subtraction, multiplication, and division can all be performed by addition together with some logic operations. With the binary number system, addition speed is inevitably limited by the carry-propagation schemes. On the other hand, carry-free addition is possible when the modified signed-digit (MSD) number representation is used. We propose a novel optoelectronic scheme to handle the parallel MSD addition and subtraction operations. An optoelectronic shared content-addressable memroy is introduced. The shared content-addressable memory uses free-space optical processing to handle the large amount of parallel memory access operations and uses electronics to postprocess and derive logic decisions. We analyze the accuracy that the required optical hardware can deliver by using a statistical cross-talk-rate model that we propose. We also evaluate other important device and system performance parameters, such as the memory capacity or the maximum number of parallel bits the adder can handle in terms of a given cross-talk rate at a certain repetition rate, the corresponding diffraction-limited memory density, and the system’s power efficiency. To confirm the underlining operational principles of the proposed optoelectronic shared content-addressable-memory MSD adder, we design and perform initial experiments for handling 8-bit MSD number addition and subtraction and present the results.

Journal ArticleDOI
TL;DR: Two new design procedures are given, based on the one-level and the two-level carry look-ahead addition algorithms, which are significantly more efficient, with respect to speed and the cost function area-time product than the corresponding adders already known from open literature.
Abstract: In this paper the design of modulo 2/sup n/-1 adders is discussed. Two new design procedures are given, based on the one-level and the two-level carry look-ahead addition algorithms. The adders designed according to the procedures proposed in this paper are significantly more efficient, with respect to speed and the cost function area-time product, than the corresponding adders already known from open literature. >

Journal ArticleDOI
TL;DR: It is shown that by sizing transistors judiciously it is possible to gain significant speed improvements at the cost of only a slight increase in power and hence a better power-delay product.
Abstract: An approach to designing CMOS adders for both high speed and low power is presented by analyzing the performance of three types of adders - linear time adders, logN time adders and constant time adders. The representative adders used are a ripple carry adder, a blocked carry lookahead adder and several signed-digit adders, respectively. Some of the tradeoffs that are possible during the logic design of an adder to improve its power-delay product are identified. An effective way of improving the speed of a circuit is by transistor sizing which unfortunately increases power dissipation to a large extent. It is shown that by sizing transistors judiciously it is possible to gain significant speed improvements at the cost of only a slight increase in power and hence a better power-delay product. Perflex, an in-house performance driven layout generator, is used to systematically generate sized layouts. >

Patent
30 Jun 1994
TL;DR: In this paper, a modular arithmetic unit consisting of an input register, a multiple computing section, an adder, and a correcting section is presented, where the adder adds the modulo N retrieved from the multiple table and the contents of the input register.
Abstract: A modular arithmetic unit comprises an input register, a multiple computing section, an adder, and a correcting section. There is provided a multiple table in which multiples of a modulo N are stored to correspond with low-order some bits of an input number T in the input register. The low-order some bits of the input number T are used to look up its corresponding multiple of the modulo N in the multiple table. The adder adds the multiple of the modulo N retrieved from the multiple table and the contents of the input register. This addition is performed n times. The contents of the input register are updated with high-order predetermined bits of the sum in the adder each time addition is performed in the adder. The correcting section makes a correction on the result t of addition by the adder after n additions have been performed.

Proceedings ArticleDOI
01 May 1994
TL;DR: Swing Restored Pass-transistor Logic (SRPL), a high speed, low power logic circuit technique for VLSI applications is described, by the use of a pass-transistors network to perform logic evaluation, and a latch type swing restoring circuit to drive gate outputs.
Abstract: Swing Restored Pass-transistor Logic (SRPL), a high speed, low power logic circuit technique for VLSI applications is described. By the use of a pass-transistor network to perform logic evaluation, and a latch type swing restoring circuit to drive gate outputs, this technique renders highly competitive circuit performance. An SRPL based multiply and accumulate circuit for multimedia applications is implemented in double metal 0.4 /spl mu/m CMOS technology. >

Journal ArticleDOI
01 Mar 1994
TL;DR: Simulation studies show that application of pipelining techniques can provide an effective throughput of one 32-bit addition every 1.6 ns using minimal hardware.
Abstract: Negative differential resistance characteristics of several new quantum electronic devices have been used to design high-speed logic gates with the latching property. These latching gates form the basis of the ultrafast pipelined adder circuit described in this paper. The latching or memory feature of these circuits, which was previously considered to be a nuisance in the design of combinational circuits, is exploited to overcome the pipeline overheads of area and time. Simulation studies show that application of pipelining techniques can provide an effective throughput of one 32-bit addition every 1.6 ns using minimal hardware.

Journal ArticleDOI
TL;DR: A symbolic substitution algorithm with the modified signed-digit number representation is used to perform fixed-point additions with limited carries and a new set of substitution rules and encodings is developed to combine the recognition and substitution steps into one correlation operation.
Abstract: A high-accuracy fixed-point optical adder that operates in parallel on many long words and that uses a pipelined correlator architecture is described. A symbolic substitution algorithm with the modified signed-digit number representation is used to perform fixed-point additions with limited carries. A new set of substitution rules and encodings is developed to combine the recognition and substitution steps into one correlation operation. This reduces hardware requirements, improves throughput by reducing the space–bandwidth product needed, and reduces latency (the delay between when data enter the processor and when the final output is available) by a factor of 2. This algorithm and our new modified signed-digit encodings and substitution rules improve the performance of other correlator and noncorrelator optical numeric computing architectures.

Journal ArticleDOI
TL;DR: A 16-b parallel adder, utilizing wave pipelining is implemented with MOSIS 2-/spl mu/m technology and test results of fabricated devices show more than nine times speedup over nonpipelined operation.
Abstract: Wave pipelining (also known as maximal rate pipelining) is a timing methodology used in digital systems to increase the number of effective pipelined stages without increasing the number of physical registers in the system. Using this technique, new data are applied to the inputs of a combinational block before the previous outputs are available, thus effectively pipelining the combinational logic. Achieving a high degree of wave pipelining in CMOS technology requires careful study of delay balancing technique involving circuit design, layout method, and testing structure. A 16-b parallel adder, utilizing wave pipelining is implemented with MOSIS 2-/spl mu/m technology and test results of fabricated devices show more than nine times speedup over nonpipelined operation. >

Patent
12 Aug 1994
TL;DR: A blocking effect attenuation apparatus for a high definition television receiver, including a variable length decoder for receiving encoded video data and extracting therefrom a quantization level, a quantized DCT coefficient and a motion vector, was proposed in this article.
Abstract: A blocking effect attenuation apparatus for a high definition television receiver, includes a variable length decoder for receiving encoded video data and extracting therefrom a quantization level, a quantized DCT coefficient and a motion vector, an inverse quantizer for generating a DCT coefficient by convening the quantized DCT coefficient from the variable length decoder into frequency domain data according to the quantization level from the variable length decoder, an inverse DCT unit for restoring the DCT coefficient from the inverse quantizer into spatial domain data, a frame memory for storing video data of a previous frame, an adder for adding data output from the frame memory to data output from the inverse DCT unit to output video data of a present frame, a motion estimator for transferring the motion vector from the variable length decoder to the frame memory, a block analysis circuit for generating a filtering flag in response to the quantization level and the quantized DCT coefficient from the variable length decoder, and a block filtering circuit for selectively filtering blocks of the video data of the present frame from the adder in response to the filtering flag from the block analysis circuit.

Patent
02 Aug 1994
TL;DR: In this article, an adder can be used for adding or subtracting one set of two integers wherein each integer is of some predetermined length or a plurality of sets of two integer provided the sum of the lengths of the integers is less than or equal to this predetermined length.
Abstract: An apparatus[10, 30, 100] that can also be used for generating the average of two integers The apparatus[10, 30, 100] can be divided into a plurality of sub-adders[102] that operate on sub-words of the input integers in parallel Hence, the adder can be used for adding or subtracting one set of two integers wherein each integer is of some predetermined length or a plurality of sets of two integers provided the sum of the lengths of the integers is less than or equal to this predetermined length The apparatus[10, 30, 100] can also generate the sum, or difference, of each of the sub-words divided by two The parallel operations can be carried out in response to a single instruction The results of the division by two are rounded in a manner that eliminates biasing of the results

Journal ArticleDOI
TL;DR: The design of a pipelined CMOS 16/spl times/16 redundant binary multiplication-and-accumulation (MAC) unit uses a novel coding scheme for representing binary signed digits that produces a factor of four reduction in the number of summands feeding the adder tree without preprocessing.
Abstract: This paper describes the design of a pipelined CMOS 16/spl times/16 redundant binary multiplication-and-accumulation (MAC) unit. The MAC unit uses a novel coding scheme for representing binary signed digits. The coding, integrated with the modified Booth algorithm, produces a factor of four reduction in the number of summands feeding the adder tree without preprocessing. The consequent chip layout is compact and small. Furthermore, the MAC's pipeline stages are balanced, resulting in a clock rate exceeding 200 MHz with 0.8-/spl mu/m two-level metal CMOS technology. >

Book ChapterDOI
26 Sep 1994
TL;DR: This paper shows how a non-restoring integer square root algorithm can be transformed to a very efficient hardware implementation, and proves that the algorithm correctly implements the square root function.
Abstract: Theorem proving techniques are particularly well suited for reasoning about arithmetic above the bit level and for relating different levels of abstraction. In this paper we show how a non-restoring integer square root algorithm can be transformed to a very efficient hardware implementation. The top level is a Standard ML function that operates on unbounded integers. The bottom level is a structural description of the hardware consisting of an adder/subtracter, simple combinational logic and some registers. Looking at the hardware, it is not at all obvious what function the circuit implements. At the top level, we prove that the algorithm correctly implements the square root function. We then show a series of optimizing transformations that refine the top level algorithm into the hardware implementation. Each transformation can be verified, and in places the transformations are motivated by knowledge about the operands that we can guarantee through verification. By decomposing the verification effort into these transformations, we can show that the hardware design implements a square root. We have implemented the algorithm in hardware both as an Altera programmable device and in full-custom CMOS.

PatentDOI
Ryoji Suzuki1, Masayuki Misaki1
TL;DR: An apparatus for transforming an input signal having a time length L into an output signal with time length αL in accordance with a given time-scale modification ratio α, including a correlator (17) for calculating a value of a correlation function between a first signal and a second signal having time length T and determining a time delay Tc at which the value of the correlation function becomes the greatest.
Abstract: An apparatus for transforming an input signal having a time length L into an output signal having a time length αL in accordance with a given time-scale modification ratio α, including a correlator (17) for calculating a value of a correlation function between a first signal and a second signal having a time length T and for determining a time delay Tc at which the value of the correlation function becomes the greatest; an adder (21) for adding the first signal multiplied by a first window function to the second signal multiplied by a second window function with a displacement of the time delay Tc; and an outputting circuit (22) for selectively outputting the output of the adder and a third signal succeeding the output of the adder so that the sum of a time length of the output of the adder and a time length of the third signal is substantially equal to a time length defined by the time-scale modification ratio α, the time delay Tc and the time length T.

Proceedings ArticleDOI
01 Jan 1994
TL;DR: The reduced area multiplier, the Wallace multiplier, and the Dadda multiplier each offer fast multiplication of signed binary numbers with the use of a large adder tree and a carry lookahead adder as mentioned in this paper.
Abstract: The reduced area multiplier, the Wallace multiplier, and the Dadda (1965) multiplier each offer fast multiplication of signed binary numbers with the use of a large adder tree and a carry lookahead adder. However, their complexity makes them undesirable for some applications. A Booth (1951) multiplier, on the other hand, offers simplicity and flexibility, by both breaking a multiplication up into pieces, and by allowing the size of the pieces to be chosen. Unfortunately, Booth multipliers become difficult to design for higher radices. The use of a fast adder tree, such as that found in a reduced area multiplier, permits straightforward design of very high radix Booth multipliers. Increasing the radix of a Booth multiplier in this manner results in large increases in speed with reasonable hardware cost. >

Patent
07 Dec 1994
TL;DR: In this paper, a digital temperature compensated crystal oscillator (DTCXO) system is arranged to offer superior oscillating performance with reduced size and cost with reduced memory capacity.
Abstract: A digital control system such as a digital temperature compensated crystal oscillator (DTCXO) system is arranged to offer superior oscillating performance with reduced size and cost. For example, to reduce the memory capacity, a memory 31 receives upper 6 bits of temperature data, and a decoder 32 calculates temperature compensation data from lower 4 bits and output data from the memory (FIGS. 1-11). For a one-chip configuration and low power consumption, a MOS type Colpitts oscillator (FIG. 16) is provided with a circuit for adjusting the source resistance of the MOS. For size reduction and fine frequency adjustment, a DTCXO is provided with sections such as an adder 341, an up-down counter 342 and an auxiliary frequency control section (AFC) 332 (FIGS. 20, 21, 24 and 25). An adding section 415 is provided between a D/A converting section 414 and a capacitance varying section 416 to obtain superior linearity with respect to a control voltage and quality of offset.