A New Family of High.Performance Parallel Decimal Multipliers
read more
Citations
Improved Design of High-Performance Parallel Decimal Multipliers
Improving the Speed of Parallel Decimal Multiplication
A parallel IEEE P754 decimal floating-point multiplier
High-Speed Parallel Decimal Multiplication with Redundant Internal Encodings
Decimal Floating-Point Multiplication
References
IEEE Standard for Floating-Point Arithmetic
Decimal floating-point: algorism for computers
A 4.4 ns CMOS 54/spl times/54-b multiplier using pass-transistor multiplexer
Decimal multiplication via carry-save addition
Decimal multiplication with efficient partial product generation
Related Papers (5)
Frequently Asked Questions (13)
Q2. What is the way to add a digit to a decimal floating point?
Extension to decimal floating–point multiplication involves exponent addition, rounding of X · Y to fit the required precision and sign calculations.
Q3. How is the complement +1 performed in the partial product reduction tree?
6. For BCD–4221, a 10’s complement is performed simply by bit–complementing the positive multiple, since 9 − Xi = Xi. Addition of the 10’s complement +1 is performed in the partial product reduction tree by a tail encoding bit, since each partial product is 4–bit (or at least 1–bit) left shifted from the previous one.
Q4. How long does multiple 8X take to generate?
Multiple 8X is obtained as 2 × 2 × 2X , so the latency of multiplicand multiples generation is about three times the latency of 2X operation.
Q5. How fast is the proposed SD radix–5 scheme?
The proposed SD radix–5 is 1.7 times faster than [5] but generates 32 partial products while the proposed SD radix– 10 scheme is 1.3 times slower than [5].
Q6. What is the critical path delay of the radix–4 binary multiplier?
Synthesis results given in [8] show a critical path delay of 2.65ns and an equivalent area of 68.000 NAND2 gates, while ratios are 1.90 for delay and1.50 for area respect to a radix–4 binary multiplier.
Q7. What are the three techniques proposed in the binary CSA?
In order to eliminate decimal corrections from the critical path of the binary CSA, three different techniques were proposed in [6].
Q8. What is the way to compare the CMOS gates with other proposed architectures?
The authors have used an area–delay model for static CMOS gates based on logical effort to evaluate the area–delay figures of the proposed architectures and two representative binary parallel multipliers [10, 13].
Q9. What is the 2 multiplication for the final decimal carry operand?
The ×2 multiplication for the final decimal carry operand is performed in parallel with the first stage of the decimal carry–propagate adder (+6 digit addition).
Q10. what is the area delay of a decimal radix?
The area–delay figures from a comparative study including conventional binary parallel multipliers and other representative decimal proposals show that their decimal SD radix–10 multiplier is an interesting option for high performance with moderate area.
Q11. What is the recoding of the SD radix–10 multiplier?
The recoded SD radix–10 multiplier can be expressed in terms of Y ∗i asY = d−1∑ i=0 ( y∗i,3 10 − y∗i,3 5 + 2∑ j=0 y∗i,j 2 j ) 10i= y∗d−1,3 10 d − d−1∑ i=0 Y bi 10iwhere the value of each SD radix–10 digit Y bi ∈ [−5, 5] is18th IEEE Symposium on Computer Arithmetic(ARITH'07) 0-7695-2854-6/07 $20.00 © 2007Authorized licensed use limited to: Univ of Calif Davis.
Q12. What is the difference between the two radix–5 encodings?
Although BCD to SD radix–4 encoding is slightly simpler than radix–5, partial product generation for decimal SD radix–5 is faster and comparable in latency with binary SD radix–4, due to a faster generation of multiplicand multiples as the authors show in the following subsection.
Q13. What is the recoding of the digits?
This recoding transforms the digit set {0, . . . , 9} into the signed–digit (SD) set {−5, . . . , 5} to perform the selection of multiples in a similar way as modified Booth recoding.