scispace - formally typeset
Search or ask a question

Showing papers on "Adder published in 1990"


Journal ArticleDOI
TL;DR: A report is presented on a multiplication scheme (left-to-right, carry-free, LRCF) that performs the multiplication most-significant bit first and produces a conventional sign-and-magnitude product by means of an on-the-fly conversion.
Abstract: Conventional schemes for fast multiplication accumulate the partial products in redundant form (carry-save or signed-digit) and convert the result to conventional representation in the last step. This step requires a carry-propagate adder which is comparatively slow and occupies a significant area of the chip in a VLSI implementation. A report is presented on a multiplication scheme (left-to-right, carry-free, LRCF) that does not require this carry-propagate step. The LRCF scheme performs the multiplication most-significant bit first and produces a conventional sign-and-magnitude product (most significant n bits) by means of an on-the-fly conversion. The resulting implementation is fast and regular and is very well suited for VLSI. The LRCF scheme for general radix r and a radix-4 signed-digit implementation are presented. >

98 citations


Journal ArticleDOI
M. Nagamatsu1, S. Tanaka1, J. Mori1, K. Hirano1, T. Noguchi1, K. Hatanaka1 
TL;DR: In this paper, a high-speed 32*32b parallel multiplier with an improved parallel structure using 0.8-mu m CMOS triple-level-metal technology is discussed.
Abstract: A high-speed 32*32-b parallel multiplier with an improved parallel structure using 0.8- mu m CMOS triple-level-metal technology is discussed. A unit adder, a 4-2 compressor, enhances the parallelism of the multiplier array. A 25% reduction in the propagation delay time is achieved by using the compressor. The multiplier contains 27704 transistors with a 2.68-*2.71-mm/sup 2/ die area. The multiplication time is 15 ns at 5 V with a power dissipation of 277 mW at 10-MHz operation. The triple-level-metal interconnection technology reduces the multiplier layout area. Compared with double-level-metal technology, a 27% chip size reduction is achieved. >

98 citations


Journal ArticleDOI
TL;DR: It is concluded that rational approximations can successfully complete with previously used methods when execution time and silicon area are considered.
Abstract: A different approach to hardware evaluation of elementary functions for high-precision floating-point numbers (in particular, the extended double precision format of the IEEE standard P754) is examined. The evaluation is based on rational approximations of the elementary functions, a method which is commonly used in scientific software packages. A hardware model is presented of a floating-point numeric coprocessor consisting of a fast adder and a fast multiplier, and the minimum hardware required for evaluation of the elementary functions is added to it. Next, rational approximations for evaluating the elementary functions and testing the accuracy of the results are derived. The calculation time of these approximations in the proposed numeric processor is then estimated. It is concluded that rational approximations can successfully complete with previously used methods when execution time and silicon area are considered. >

84 citations


Journal ArticleDOI
TL;DR: A systematic method of implementing a VLSI parallel adder based on a modular design, formulated as a dynamic programming problem, optimizing with respect to area and time, results in an area-time optimal adder in the design family.
Abstract: A systematic method of implementing a VLSI parallel adder is presented. A family of adders based on a modular design is defined. The design uses three types of component cells, which are implemented in static CMOS. The adder design is formulated as a dynamic programming problem, optimizing with respect to area and time. The result is an area-time optimal adder in the design family. The approach is illustrated by implementing a 66-bit adder for use in a floating-point processor. It is shown how to use the method for implementations in technologies and design styles other than static CMOS. >

83 citations


Proceedings ArticleDOI
John P. Fishburn1
24 Jun 1990
TL;DR: A heuristic for speeding up combinational logic by decreasing the logic depth, at the expense of a minimal increase in circuit size is described, capable of reproducing or even beating several classic global optimizations.
Abstract: This paper describes a heuristic for speeding up combinational logic by decreasing the logic depth, at the expense of a minimal increase in circuit size. The heuristic iteratively speeds up sections of the critical path by the use of Shannon factorization on the late input. This procedure is empirically found to be capable of reproducing or even beating several classic global optimizations: a chain of an associative operator is transformed into a tree, a ripple prefix circuit into a parallel prefix circuit, and a ripple-carry adder into a slightly smaller and faster circuit than the carry-lookahead adder.

70 citations


Journal ArticleDOI
TL;DR: Two different CMOS implementations of the Manchester carry-skip adder are analyzed using the RC timing model, which provides a unified way of analyzing both CMOS circuits and interconnect, and efficient polynomial algorithms are developed to determine near-optimal as well as optimal block sizes.
Abstract: Two different CMOS implementations of the Manchester carry-skip adder are analyzed using the RC timing model, which provides a unified way of analyzing both CMOS circuits and interconnect. Based on the RC timing model, the authors develop efficient polynomial algorithms to determine near-optimal (in latency) as well as optimal block sizes for the one-level manchester adder with variable carry-skip. An analysis shows that the carry-skip delay in a Manchester adder block is linearly proportional to the block size. The approach provides a general paradigm for analysis and design, applicable to different models of ripple-propagation and carry skip. >

64 citations


Patent
Donald Lee Freerksen1
13 Mar 1990
TL;DR: Combinatorial bias-adjust logic (CFA) as mentioned in this paper removes bias from one exponent before the two operand exponents are added together in adder for a multiply operation, and inserts a bias into another exponent before subtraction by theadder for a divide operation.
Abstract: A floating-point arithmetic unit includes an exponent unit for biased exponents. Combinatorial bias-adjust logic (324) removes the bias from one operand exponent before the two operand exponents are added together in adder (322) for a multiply operation, and inserts a bias into one exponent before the exponents are subtracted by the adder for a divide operation.

61 citations


Patent
28 Dec 1990
TL;DR: In this paper, a data processing system having a memory packaged therein for realizing a large-scale and high-speed parallel distributed processing and especially a neural network processing system for the neural network.
Abstract: Herein disclosed is a data processing system having a memory packaged therein for realizing a large-scale and high-speed parallel distributed processing and, especially, a data processing system for the neural network processing. The neural network processing system according to the present invention comprises: a memory circuit for storing neuron output values, connection weights, the desired values of outputs, and data necessary for learning; an input/output circuit for writing or reading data in or out of said memory circuit; a processing circuit for performing a processing for determining the neuron outputs such as the product, sum and nonlinear conversion of the data stored in said memory circuit, a comparison of the output value and its desired value, and a processing necessary for learning; and a control circuit for controlling the operations of said memory circuit, said input/output circuit and said processing circuit. The processing circuit is constructed to include at least one of an adder, a multiplier, a nonlinear transfer function circuit and a comparator so that at least a portion of the processing necessary for determining the neutron output values such as the product or sum may be accomplished in parallel. Moreover, these circuits are shared among a plurality of neutrons and are operated in a time sharing manner to determine the plural neuron output values. Still moreover, the aforementioned comparator compares the neuron output value determined and the desired value of the output in parallel.

59 citations


Patent
Yoshinori Tanaka1, Tomohiko Taniguchi1, Fumio Amano1, Yasuji Ohta1, Shigeyuki Unagami1 
29 Jun 1990
TL;DR: In this paper, a gain-shape vector quantization apparatus for compressing the data of voice signal is presented, where a code book portion is constituted by a plurality of shape vectors which produce a plurality selected shape vectors.
Abstract: A gain-shape vector quantization apparatus for compressing the data of voice signal. A code book portion is constituted by a plurality of shape vectors which produce a plurality of selected shape vectors. A plurality of variable gain circuits impart gains to each shape vector produced from the code book portion. A plurality of synthesis filters regenerate signals from the outputs of the variable gain circuits. An adder adds the signals regenerated by the synthesis filters. An evaluation unit produces an index to select a plurality of shape vectors in the code book portion in order to minimize an error between the output of the adder and an input speech signal and further produces gain adjusting signal for the variable gain circuits.

57 citations


PatentDOI
Ira A. Gerson1, Mark A. Jasiuk1
TL;DR: In this paper, a digital speech coder includes a longterm filter (124) having an improved sub-sample resolution long-term predictor (FIG. 5 ) which allows for subsample resolution for the lag parameter L. The output vector b(n) is fed back to a delayed vector generator block (530) of the longterm predictor.
Abstract: A digital speech coder includes a long-term filter (124) having an improved sub-sample resolution long-term predictor (FIG. 5 ) which allows for subsample resolution for the lag parameter L. A frame of N samples of input speech vector s(n) is applied to an adder (510). The output of the adder (510) produces the output vector b(n) for the long term filter (124). The output vector b(n) is fed back to a delayed vector generator block (530) of the long-term predictor. The nominal long-term predictor lag parameter L is also input to the delayed vector generator block (530). The long-term predictor lag parameter L can take on non-integer values, which may be multiples of one half, one third, one fourth or any other rational fraction. The delayed vector generator (530) includes a memory which holds past samples of b(n). In addition, interpolated samples of b(n) are also calculated by the delayed vector generator (530) and stored in its memory, at least one interpolated sample being calculated and stored between each past sample of b(n). The delayed vector generator (530) provides output vector q(n) to the long-term multiplier block (520), which scales the long-term predictor response by the long-term predictor coefficient β. The scaled output βq(n) is then applied to the adder (510) to complete the feedback loop of the recursive filter (124).

50 citations


Patent
19 Jan 1990
TL;DR: In this article, a modular memory address block determination circuit is presented, in which the starting address of the first block and the enable signal of the second block are added to produce the starting addresses of the two blocks, and this procedure is repeated for each block.
Abstract: An adder (204,206,208) and a comparator (242) form portions of a modular memory address block determination circuit. The starting address of the first block and the enable signal of the first block are added to produce the starting address of the second block. This procedure is repeated for each block. The determined starting address for each block is compared with the requested memory address and, unless the block is inhibited or disabled, if equal a signal indicates a match. The circuit is used on a circuit board which emulates three conventionally separate memory circuit boards. The registers for each emulated circuit board are provided and appropriate bus signals are developed.

Journal ArticleDOI
TL;DR: A method of adding oversampling coded data streams is presented to add the spectrums with the minimum of degradation in the band of interest.
Abstract: A method of adding oversampling coded data streams is presented. The available oversampling is used to add the spectrums with the minimum of degradation in the band of interest.

Patent
10 Aug 1990
TL;DR: In this article, a method for approximating mathematical functions using polynomial expansions is implemented in a numeric processing system (10) which comprises a control and timing circuit (18), a microprogram store (20) and a multiplier circuit (34).
Abstract: A method for approximating mathematical functions using polynomial expansions is implemented in a numeric processing system (10) which comprises a control and timing circuit (18), a microprogram store (20) and a multiplier circuit (34). The multiplier circuit (34) may comprise a rectangular aspect ratio multiplier circuit (40) having an additional ADDER INPUT to enable the repeated evaluation of first order polynomials to evaluate polynomial expansions associated with each mathematical function. A constant store (28) is used to store predetermined coefficients for the polynomial expansion associated with each mathematical functions. The microprogram store (20) is used to store argument transformation routines, polynomial expansions and result transformation routines associated with each mathematical function.

Journal ArticleDOI
TL;DR: VLSI-oriented multiple-valued current-mode MOS arithmetic circuits using radix-2 signed-digit number representations are proposed and it is shown that the technology is also potentially effective for the reduction of the data-bus area in VLSI.
Abstract: VLSI-oriented multiple-valued current-mode MOS arithmetic circuits using radix-2 signed-digit number representations are proposed. A prototype adder chip is implemented with 10- mu m CMOS technology to confirm the principle of operation. A multiplication scheme using four-input current-mode wired summations for realizing a high-speed small-size multiplier is presented. The 32*32-b multiplier is composed of 18800 transistors and required fewer interconnections. The multiply time is estimated to be 45 ns by SPICE simulation in 2- mu m CMOS technology. It is shown that the technology is also potentially effective for the reduction of the data-bus area in VLSI. >

Patent
10 Dec 1990
TL;DR: In this paper, a video signal processor includes two eight-bit adders, each of which has a carry-in input terminal and a carryout output terminal, selectively coupled via an AND gate.
Abstract: A video signal processor includes cicuitry which may be conditioned by a mode control signal to operate as a single 16-bit adder or as two eight-bit adders. The circuitry includes two eight-bit adders, each of which has a carry-in input terminal and a carry-out output terminal. The carry-out output terminal of one of the adders is selectively coupled, via an AND gate, to the carry-in input terminal of the other adder. The AND gate is controlled by the mode control signal. In the mode where the circuitry operates as two eight-bit adders, additional circuiry is included to detect output values which may exceed the zero to 255 range of valid values and to saturate these invalid values either at zero or 255.

Patent
15 Oct 1990
TL;DR: A synchronous vector processor (SVP) as discussed by the authors is an SVP with a plurality of one-bit processor elements organized in a linear array, which are all controlled in common by a sequencer, a state machine or a control circuit (controller) to enable operation as a parallel processing device.
Abstract: A synchronous vector processor SVP device (102) having a plurality of one-bit processor elements (150) organized in a linear array. The processor elements are all controlled in common by a sequencer, a state machine or a control circuit (controller) (128) to enable operation as a parallel processing device. Each processor element (150) includes a set of input registers (154), two sets of register files (158,166), a set of working registers (162), an arithmetic logic unit (164) including a one-bit full adder/subtractor, and a set of output registers (168). In video applications each processor element (150) operates on one pixel of a horizontal scan line and is capable of real-time digital processing of video signals. The SVP (102) includes interconnecting circuitry (160,308,310,312,322,324) enabling the individual processor elements to retrieve data from and transmit data to their first and second nearest neighbors on either side. At the chip level external connections are provided to enable cascading of several SVP devices.

Journal ArticleDOI
TL;DR: A completely pipelined array for modular multiplication designed by cascading n carry-save adders performs modulator multiplication at the clock rate.
Abstract: The letter describes a new algorithm for modulator multiplication using carry-save adders. The proposed algorithm is based on the sign-estimation technique. A carry-save adder structure consisting of three rows of n + 3 simple 1-bit adder cells, and two copies of 3-bit carry look-ahead logic can be used to implement a single step of the algorithm. A completely pipelined array for modular multiplication designed by cascading n carry-save adders performs modulator multiplication at the clock rate.

Proceedings ArticleDOI
24 Jun 1990
TL;DR: This paper gives a constructive resolution of the question as to whether every circuit has an irredundant circuit that is at least as fast and is of equal or lesser area, and demonstrates the utility of this algorithm on a well known circuit, the carry-skip adder, and presents a novel irredundy design of that adder.
Abstract: Logic optimization procedures principally attempt to optimize three criteria: performance, area, and testability. The relationship between area optimization and testability has recently been explored. As to the relationship between performance and testability, experience has shown that performance optimizations can, and do in practice, introduce single stuck-at-fault redundancies into designs. Are these redundancies necessary to increase performance or are they only an unnecessary by-product of performance optimization? The authors give a constructive resolution of this question in the form of an algorithm that takes as input a combinational circuit and returns an irredundant circuit that is as fast. They demonstrate the utility of this algorithm on a well-known circuit, the carry-skip adder, and present a novel irredundant design of that adder. As this algorithm may either increase or decrease circuit area, the authors leave unresolved the question as to whether every circuit has all irredundant circuit that is at least as fast and is of equal or lesser area. >

Proceedings ArticleDOI
17 Sep 1990
TL;DR: A reduced-area carry-select adder scheme in which the second copy of the carry-chain is substituted by an OR gate per bit position is proposed, which reduces the relative area advantage of a carry-skip adder to roughly 35%.
Abstract: For a medium-speed addition application, a carry-skip adder is usually preferred over a carry-select adder, due to its smaller area. A reduced-area carry-select adder scheme in which the second copy of the carry-chain is substituted by an OR gate per bit position is proposed. This alternative implementation for a carry-select adder reduces the relative area advantage of a carry-skip adder to roughly 35%. Replacing the ripple-carry blocks with parallel-prefix blocks results in a select-prefix adder with a slightly better area and time than a parallel-prefix adder. >

Proceedings ArticleDOI
13 May 1990
TL;DR: The Labyrinth architecture, a RAM-based reconfigurable logic array, provides the flexibility and malleability of software with the performance of a dedicated circuit.
Abstract: As a RAM-based reconfigurable logic array, Labyrinth provides the flexibility and malleability of software with the performance of a dedicated circuit. With a single bit register and a half adder per cell, the architecture is optimized for register intensive, massively parallel algorithms. The fine-grained, highly-symmetric architecture scales very naturally and facilitates compact circuit layouts. A 64-cell test chip has been successfully built and tested, and a 4096-cell chip is in the final stages of preparation for fabrication. >

Patent
Haruyuki Suzuki1
22 May 1990
TL;DR: In this paper, a servo control system for an optical disk device which comprises a unit for scanning a disk by an optical spot converged and irradiated on the disk, a unit to feed back controlling position of the optical spot in a focussing direction, and a unit in a tracking direction is presented.
Abstract: A servo control system for an optical disk device which comprises a unit for scanning an optical disk by an optical spot converged and irradiated on the disk; a unit for feed back controlling position of the optical spot in a focussing direction; and a unit for feed back controlling position of the optical spot in a tracking direction. The system further comprises: a first and second optical detectors, each being divided to at least two detection areas; a multiplexer to which the detection areas are connected for time sharing the detection signals; an adder for adding up the detection signals output from one of the detectors; an A/D converter which converts the time shared signal output from the multiplexer to a digital signal using the added up signal as a reference thereof; and a logic circuit which is connected to the A/D converter and calculates from the digital signal output from the A/D converter to obtain digital error signals of focus and track.

Journal ArticleDOI
TL;DR: In this article, an analysis of the dynamic behavior of a second-order digital filter operating outside the region of absolute stability in the case of saturation nonlinearity of the accumulator is presented.
Abstract: An analysis of the dynamic behavior of a second-order digital filter operating outside the region of absolute stability in the case of saturation nonlinearity of the accumulator is presented. Bifurcation phenomena associated with changes of one of the parameters of the filter's linear part are investigated. On the basis of a numerical study, it is conjectured that systems trajectories become chaotic for some parameter ranges. Several interesting bifurcation patterns have been revealed-including devil's staircase and self-similar, fan-type structures. >

Patent
11 May 1990
TL;DR: In this paper, a correction circuit processes digitized signals from an image sensor and generates gain correction values to compensate for variations in the output of the sensor, and the difference signals are serially accumulated by means of pair of registers and an adder.
Abstract: A correction circuit processes digitized signals from an image sensor and generates gain correction values to compensate for variations in the output of the sensor. While imaging a gain calibration object, the sensor is operated in a calibration mode in which a plurality of calibration values are generated that pertain to each photosite. The digitized calibration values are transformed into log space for processing by a gain level averaging circuit. The log calibration signals are first subtracted from a reference corresponding to a maximum expected signal value. The difference signals are serially accumulated by means of pair of registers and an adder, and the sum is stored in a gain memory. In a subsequent normal operating mode, the summed signals for each photosite are retrieved from the gain memory and bit-shifted to form an average correction value for each photosite. The correction values are applied to an adder in synchronism with sensor signals from like photosites and added therewith in log space to provide gain compensation.

Patent
10 Dec 1990
TL;DR: In this article, a convolver including a matrix of multipliers for providing a plurality of products PiCj of input data Pi and a dedicated coefficient Cj and each connected to a data input by a buffer which stores and delays input data pi as a function of its position in the matrix and the length of the row M of the input data array.
Abstract: A convolver including a matrix of multipliers for providing a plurality of products PiCj of input data Pi and a dedicated coefficient Cj and each connected to a data input by a buffer which stores and delays input data Pi as a function of its position in the matrix and the length of the row M of the input data array. The buffers include row buffers which are programmable for various input data array row lengths M as well as programmable input buffer stages. A second data input is selectively connected to the multipliers or to an output adder for connection to external delay units or cascade connection to other convolvers. Unique control logic is provided to change the structure of the convolver as well as reset and program various elements using common inputs.

Patent
30 May 1990
TL;DR: In this paper, the authors propose a virtual zero architecture for a single instruction stream, multiple data stream (SIMD) processor which includes an input bus, an input unit, manipulation units, an output unit and an output bus.
Abstract: A virtual-zero architecture is intended for use in a single instruction stream, multiple data stream (SIMD) processor which includes an input bus, an input unit, manipulation units, an output unit and an output bus. The virtual-zero architecture includes a memory unit (40) for storing data, an arithmetic unit (42) for mathematically operating on the data, a memory address generation unit (32) and an adder for computing a next memory address. The memory address generation unit (32) includes an address register (34) in the memory unit for identifying the address of a particular data block, a counter (38) for counting the number of memory addresses in a particular data block, and a rotation register (36) for providing a data-void address in the memory unit if and only if all of the entries in the data block are zero. The memory (40) and the address (32) units provide zero-value data blocks to the arithmetic unit (44) to simulate the data block having the data-void address during processing. The architecture may also be used to selectively handle input to a system.

Patent
23 Jul 1990
TL;DR: In this paper, the authors simulate the delay synthesis effect by musical instruments and performance sound fields by providing acoustic body synthesizing sections constituted of digital filters including registers to delay voice data and multipliers, etc.
Abstract: PURPOSE:To simulate the delay synthesis effect by musical instruments and performance sound fields by providing acoustic body synthesizing sections constituted of digital filters including registers to delay voice data and multipliers, etc. CONSTITUTION:The output of a digital adder 4 is put through the musical instrument acoustic body synthesizing section 7 and the sound field acoustic body synthesizing section 8 into a D/A converter 5. The inputs to the synthesizing sections 7, 8 are delayed in one clock unit by the delay registers 21 and are multiplied by the coeffts. of coefft. memories 23 the contents of which are set in a control section 1, by means of the multipliers 22. After these inputs are added by the digital adder 4, the input are added to musical tones data in an adder 25 and are outputted to a converter 5. The simulation of the effect of both the acoustic bodies is possible in this way and the electronic musical instrument which can produce the outputs with the acoustic effect approximate to the acoustic effect of the environment of the performance sound field is obtd.

Proceedings Article
01 Sep 1990
TL;DR: A CORDIC processor for vector rotations using a carry-save architecture has been developed and realized and it is found that this architecture is well suited for real-time applications.
Abstract: A CORDIC processor for vector rotations using a carry-save architecture has been developed and realized. The CORDIC algorithm is based on an iteration, directed by the sign of intermediate results. To achieve a high clock frequency of 60 MHz the CORDIC iteration was built up with pipelined carry-save adder stages. Due to the redundant number representation of the carry-save architecture an exact sign detection is not possible, so that the algorithm had been modified. Due to the high throughput rate and its regularity this architecture is well suited for real-time applications.

Patent
Ho-sun Chung1, Seung-yeob Paek1
10 Jul 1990
TL;DR: In this article, a floating point adder circuit using neural network concepts and having high speed operation is obtained by a controlling circuit using a comparator and an operating circuit using an adder and a subtractor.
Abstract: A floating point adder circuit using neural network concepts and having high speed operation is obtained by a controlling circuit using a comparator and an operating circuit using an adder and a subtractor

Patent
Feng-Hsien W. Shih1
02 Jan 1990
TL;DR: In this paper, a carry select adder is used in which carry inputs to ripple adder stages are not fixed, and the adder stage determines which of the two ripple adders of that stage has output the correct sum while the variable carry input is equal to a given value.
Abstract: A carry select adder may be used in which carry inputs to ripple adder stages are not fixed. The adder stage determines which of the two ripple adders of that stage has output the correct sum while the variable carry input is equal to a given value. Then the variable carry input value is switched to a different value and the adder stage determines the correct output sum from the other ripple adder. The adder performs self-checking by comparing these two sums to ensure that the output sum is accurate.

Patent
29 Jan 1990
TL;DR: In this article, a pipelined floating-point adder/subtractor system was proposed to produce a sticky bit signal when the number of consecutive zeros is less than the number number of positions the one fraction is shifted in the aligning step, indicating the truncation of at least one set bit.
Abstract: A system for subtracting two floating-point binary numbers in a pipelined floating-point adder/subtractor by aligning the two fractions for substraction; arbitrarily designating the fraction of one of the two floating-point numbers as the subtrahend, and producing the complement of that designated fraction; adding that complement to the other fraction, normalizing the result; determining whether the result is negative and, if it is, producing the complement of the normalized result; and selecting the larger of the exponents of the two floating-point numbers, and adjusting the value of the selected exponent in accordance with the normalization of the result. The preferred system produces a sticky bit signal by aligning the two fractions for subtraction by shifting one of the two fractions to the right; determining the number of consecutive zeros in the one fraction, prior to the shifting thereof, beginning at the least significant bit position; comparing the number of positions the one fraction is shifted in the aligning step, with the number of consecutive zeros in the one fraction; and producing a sticky bit signal when the number of consecutive zeros is less than the number of positions the one fraction is shifted in the aligning step, the sticky bit signal indicating the truncation of at least one set bit during the aligning step.