Pipelined FPGA Adders
read more
Citations
Designing Custom Arithmetic Data Paths with FloPoCo
Floating-point exponential functions for DSP-enabled FPGAs
Parameter Space for the Architecture of FFT-Based Montgomery Modular Multiplication
FPGA-Specific Arithmetic Optimizations of Short-Latency Adders
Efficient implementation of parallel BCD multiplication in LUT-6 FPGAs
References
IEEE Standard for Floating-Point Arithmetic
Digital arithmetic
FPGA adders: performance evaluation and optimal design
Generating high-performance custom floating-point pipelines
Arithmétique des ordinateurs
Related Papers (5)
Frequently Asked Questions (12)
Q2. What are the future works mentioned in the paper "Pipelined fpga adders" ?
Future work also includes extending the optimization options to include operator latency, and possibly combinations such as “ LUTs and latency ”.
Q3. What is the effect of this optimization on the number of registers?
In addition to latency reduction, this optimization brings the following gains: the number of registers is reduced by the carry propagation size (which now needs no registering), the LUT count is reduced by approximatively w, and the number of slices by approximatively w/2.
Q4. What is the common solution for a tight frequency-driven pipelining?
A tight frequency-driven pipelining is obtained by first determining the maximal addition size α in equation 1 for which the critical path delay is less than the target period T :α = 1 +⌊ T − δLUT − δxorδcarry⌋ .
Q5. What is the worst case relative error of the estimation formulae?
The worst case relative error is of the order of 10−2 (one percent) which makes them sufficiently accurate for estimation formulae.
Q6. How many slices are in a VirtexII-Pro device?
All the slices in a VirtexII-Pro device were similar to sliceM, but they were reduced to half the total number of slices for Virtex4 and Spartan3, and about a quarter in Virtex5 and Virtex6 devices (with higher density at the input of the DSP48E blocks).
Q7. What is the novel feature of the classical addition architecture?
In this section the authors propose a scalable low-latency addition architecture based on the textbook carry-select architecture, whose novel feature is to make efficient use of the fast-carry chains for the carry-bit computations.
Q8. What is the way to estimate the number of registers in the pipeline?
When no SRL are allowed, the number of registers propagated above the diagonal will be approximatively halved, and may still be packed in shift registers.
Q9. What is the function of the Xilinx FPGAs?
The LUTs of the Xilinx FPGAs can be be used either as a function generator or as a variable length shift-register, as previously presented in Section I-D.
Q10. What is the option for a low-latency operator?
For both alternative and low-latency architectures, there are two options: either perform all additions in using chunk size γ, or buffer the inputs and perform computations using chunk size α.
Q11. What is the smallest number of bits in the addition diagonal?
Each adder on the addition diagonal takes as input an operand on α+1 bits and a 1-bit carry in and returns a α+1-bit wide result.
Q12. How is the proposed adder generation implemented?
Work is under way to integrate the proposed adders in all the coarser cores of the FloPoCo project, and to support more FPGA targets.