A New High Radix-2 r ( r ≥ 8) Multibit Recoding Algorithm for Large Operand Size ( N ≥ 32) Multipliers
read more
Citations
Radix- $2^{r}$ Arithmetic for Multiplication by a Constant
Radix-2 r Arithmetic for Multiplication by a Constant: Further Results and Improvements
Some Algorithms for Computing Short-Length Linear Convolution
A new binary arithmetic for finite-word-length linear controllers: MEMS applications
Radix-2 w Arithmetic for Scalar Multiplication in Elliptic Curve Cryptography
Related Papers (5)
Frequently Asked Questions (20)
Q2. What are the three major requirements for today’s multiplication-intensive applications?
in large-operand-size applications (N≥32), the need for a scalable architecture is essential to ensure a linearincrease O(N) of multiply-time while multiplier size grows quadratically O(N2) with operand bit-length N.Consequently, high-speed, low-power, and highly-scalablearchitecture are the three major requirements for today’sgeneral-purpose multipliers [1].
Q3. How many clock cycles does a two’s complement require?
For instance, a 64-bit two’s complement finely pipelined multiplier requires a latency of seven clock cycles only (critical path composed of a series of 7 adders).
Q4. What is the critical path of the multiplier in terms of logic levels?
Based on the total number of adders (AddT), the critical path of the multiplier in terms of logic levels is: DelT= N/r-1+Del+ds, where Del is the delay due to adder stages inside PPGj and ds is the delay due to multiplexer logic inside PPGji.
Q5. What are the basic components of a recoding algorithm?
1) AreaThree basic components are necessary for theimplementation of RTL multipliers:• multiplexers (Mux1) to recode the digit terms (Qj,Pj,…) included in the recoding expression; • shifters (Mux2) for partial product generation; • and adders for partial product summation.
Q6. What is the purpose of the recoding of large slices in a mono-bloc?
Recoding large slices (r≥8) in a mono-bloc PPG such as in [11][12], requires the use of an RTL “case statement” with r+1 entries.
Q7. What is the simplest way to reduce the number of bits of the multiplier?
To comply with time constraint of a given application, the authors need a multiplication algorithm that allows, to some extent, a parameterized reduction (N/r) of the multiply-time without sacrificing area.
Q8. What is the tradeoff for a recoding scheme?
based on theory and implementation results, the authors conclude that the best tradeoff related to their recoding schemes depends on N and r values.
Q9. Why is the solution space a deterministic C-program?
Because of an explosive number of possible combinations (N>>), the solution space is exhaustively explored using a deterministic C-program for r varying from 8 to 1024.
Q10. What is the important reason for the radix-28 PPGj?
radix-28 PPGji of equation (15) is the least area consumer because it does not employ odd-multiples and requires a small amount of multiplexers as the total number of input combinations in each radix-28 PPGji is equal to 8+8+8+8=32.
Q11. Why is the look-up table based multiplication algorithm so fast?
Because exploiting the maximum parallelisminherent in multiply operation, their look-up-table basedmultiplier (eq. 15) is even speed-competitive with Xilinx’shardwired multiplier employing DSP-Slices (18×18 bit full-custom multipliers).
Q12. What is the topology of the proposed recoding schemes?
The topology of their proposed recoding schemes showshigh capabilities for pipelining which can be finely orcoarsely grained to satisfy both high throughput and low latency applications.
Q13. What is the way to solve the problem of radix-2r?
Guided by accurate area heuristics, the final result of an optimization process, gradually undertaken in this paper, delivers for each value of N (N=8..8192) the appropriate radix-2r (r=8..512) and sub-radix-2 s (s=4..32) that lead to the architecture with the shortest critical path ( 3233 −/N ) in adder stages.
Q14. What is the solution to the problem of radix-2r two?
The solution consists essentially in dividing the high radix-2r mono-bloc PPGj (Fig. 1.a) into a number of lower sub-radix-2s odd-multiple free PPGji (Fig. 1.b), such as s is a divider of r .
Q15. What is the corresponding adigit set of rrrj?
In literature, equation (1) is referred to by radix-2r equation, to which corresponds adigit set ( )rD 2 such as ( ) { }11 2022 −−−=∈ rrrj ,...,,...,DQ .
Q16. What is the advantage of partitioning PPGj?
As direct benefits of the partitioning of Fig. 1.b:• there is no need to pre-compute odd-multiples of the multiplicand, which drastically reduces the requiredamount of hardware resources and routing;• since the size of PPGji entry is much smaller than the size of PPGj one (s≤r/2), the total multiplexing logic required by RTL “case statements” to recode theentries is greatly reduced;
Q17. What is the important reason for the recoding of equations?
Based on theory (Table II) and implementation results (Table III), Dimitrov recoding is the most space consuming due to the use of odd-multiples of the multiplicand.
Q18. What is the purpose of a mono-bloc PPG?
mono-bloc PPG recoding is incompatible with high radix (r≥8) approach whose purpose is to reduce the multiply-time (N/r) of large operand size (N ≥32) multipliers.
Q19. What is the reason why the solution space is not balanced?
even the “balanced” solution is not really balanced enough since the mean values of Del and Mux are 1.4×Delmin and 5.2×Muxmin , respectively.
Q20. What is the important reason for the radix-232 algorithm?
A. Area occupationFor operand size N=64, equation (15) is a composite radix-232 algorithm (Table X), where each PPGj processes simultaneously 32+1 inputs that are split on four sub-radix28 PPGji made of four instances ( jikC ) of McSorley algorithm (Fig. 4).