An efficient floating point multiplier design for high speed applications using Karatsuba algorithm and Urdhva-Tiryagbhyam algorithm

doi:10.1109/ICSPCOM.2015.7150666

Home
/
Papers
/
An efficient floating point multiplier design for high speed applications using Karatsuba algorithm and Urdhva-Tiryagbhyam algorithm

Proceedings Article•DOI•

An efficient floating point multiplier design for high speed applications using Karatsuba algorithm and Urdhva-Tiryagbhyam algorithm

S Arish¹, Rajender Kumar Sharma¹•Institutions (1)

National Institute of Technology, Kurukshetra¹

09 Jul 2015-pp 303-308

TL;DR: A combination of Karatsuba algorithm and Urdhva-Tiryagbhyam algorithm is used to implement unsigned binary multiplier for mantissa multiplication which gives a better implementation in terms of delay and power.

read less

Abstract: Floating point multiplication is a crucial operation in high power computing applications such as image processing, signal processing etc. And also multiplication is the most time and power consuming operation. This paper proposes an efficient method for IEEE 754 floating point multiplication which gives a better implementation in terms of delay and power. A combination of Karatsuba algorithm and Urdhva-Tiryagbhyam algorithm (Vedic Mathematics) is used to implement unsigned binary multiplier for mantissa multiplication. The multiplier is implemented using Verilog HDL, targeted on Spartan-3E and Virtex-4 FPGA.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Weighted Partitioning for Fast Multiplierless Multiple-Constant Convolution Circuit

[...]

Gian Domenico Licciardo¹, Carmine Cappetta¹, Luigi Di Benedetto¹, Mario Vigliar•Institutions (1)

University of Salerno¹

01 Jan 2017-IEEE Transactions on Circuits and Systems Ii-express Briefs

TL;DR: The proposed solution achieves state-of-the-art performances in terms of elaboration velocity, achieving a critical path delay of about 2 ns both on a Xilinx Virtex 7 and with CMOS 90-nm std_cells.

...read moreread less

Abstract: A new radix-3 partitioning method of natural numbers, derived by the weight partition theory, is employed to build a multiplierless circuit that is well suited for multimedia filtering applications. The partitioning method allows conveniently premultiplying 32-b floating-point filter coefficients with the smallest set of parts composing an unsigned integer input. In this way, similar to the distributed arithmetic, shifters and recoding circuitry, typical of other well-known multiplier circuits, are completely substituted with simplified floating-point adders. Compared to the existent literature, targeted to both field-programmable gate array and std_cell technology, the proposed solution achieves state-of-the-art performances in terms of elaboration velocity, achieving a critical path delay of about 2 ns both on a Xilinx Virtex 7 and with CMOS 90-nm std_cells.

...read moreread less

32 citations

Cites background or methods from "An efficient floating point multipl..."

...The mapped physical resources are approximatively lower by 30% than those in [23], while the delay is about one-half, although this value is not representative because of the technology differences between the two target platforms....
[...]
...The work in [23] for FPGA and that in [16] and [17] for std_cells have been used as comparative terms....
[...]

Journal Article•DOI•

An Efficient and High-Speed Overlap-Free Karatsuba-Based Finite-Field Multiplier for FGPA Implementation

[...]

Moslem Heidarpur¹, Mitra Mirhassani¹•Institutions (1)

University of Windsor¹

24 Feb 2021-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: In this paper, a novel hardware architecture for efficient field-programmable gate array (FPGA) implementation of Finite-field multipliers for ECC was proposed, which resulted in a lower combinational delay and area-delay product indicating the efficiency of design.

...read moreread less

Abstract: Cryptography systems have become inseparable parts of almost every communication device. Among cryptography algorithms, public-key cryptography, and in particular elliptic curve cryptography (ECC), has become the most dominant protocol at this time. In ECC systems, polynomial multiplication is considered to be the most slow and area consuming operation. This article proposes a novel hardware architecture for efficient field-programmable gate array (FPGA) implementation of Finite-field multipliers for ECC. Proposed hardware was implemented on different FPGA devices for various operand sizes, and performance parameters were determined. Comparing to state-of-the-art works, the proposed method resulted in a lower combinational delay and area–delay product indicating the efficiency of design.

...read moreread less

12 citations

Proceedings Article•DOI•

Parallel and Pipelined VLSI Implementation of the New Radix-2 DIT FFT Algorithm

[...]

Harsha Keerthan Samudrala, Shaik Qadeer, Syed Azeemuddin, Zafar Ali Khan

01 Dec 2018

TL;DR: The VLSI implementation of the new radix-2 Decimation In Time (DIT) Fast Fourier Transform (FFT) algorithm with reduced arithmetic complexity which is based on scaling the twiddle factor and results show that the proposed architecture significantly reduces the hardware area and power consumption.

...read moreread less

Abstract: In this paper we discuss the VLSI implementation of the new radix-2 Decimation In Time (DIT) Fast Fourier Transform (FFT) algorithm with reduced arithmetic complexity which is based on scaling the twiddle factor. Some signal processing require high performance FFT processors and to meet these performance requirements, the processor needs to be pipelined and parallelized. An optimized ASIC design is derived from this new radix-2 algorithm with fewer multipliers and adopted a complete parallel and pipelined architecture for hardware implementation of a 64 point FFT. The implementation results show that the proposed architecture significantly reduces the hardware area by 13.74 percent and power consumption by 16 percent when compared to the standard FFT architecture. Simulation of design units is done in Xilinx ISE WebPack 13.1 and synthesized using Cadence Encounter RTL Compiler.

...read moreread less

9 citations

Cites background from "An efficient floating point multipl..."

...[16] for power and area efficient computation of...
[...]

Proceedings Article•DOI•

Comparative Performance Analysis of Karatsuba Vedic Multiplier with Butterfly Unit

[...]

V. Harish¹, Kamatchi S¹•Institutions (1)

Amrita Vishwa Vidyapeetham¹

12 Jun 2019

TL;DR: This paper proposes an efficient method for signed binary multiplication using Urdhva-Tiryagbhyam technique, Karatsuba algorithm and efficient carry select adder to design a binary multiplier which consumes lesser area, power and delay.

...read moreread less

Abstract: In DSP processors or other applications which use multiply-accumulate units (MAC) etc., multiplication of large numbers is the main bottleneck. Multiplying two n-bit binary numbers requires n (n − 1 ) adders and n2 AND gates, which consumes more time, power and area for large n since the hardware scales as the square of n so, there is a need to design a binary multiplier which consumes lesser area, power and delay but in general there will be tradeoff between area, power and delay. With the shrinking of technology we can slightly compromise with area. This paper proposes an efficient method for signed binary multiplication using Urdhva-Tiryagbhyam technique, Karatsuba algorithm and efficient carry select adder. Urdhva-Tiryagbhyam technique is known for its low delay [8] as it produces partial products at same instant and sums them up. It is best suited when the number of bits in the multiplier and multiplicand are less than 16 [8], [14]. Whereas Karatsuba algorithm is applicable for multiplication of larger number of bits [5]. The proposed multiplier is implemented for a first stage butterfly unit [12] of a radix-2 FFT algorithm. This proposed design is implemented using Verilog HDL and synthesized in both 90 nm and 45 nm technology at multiplier level and in 45nm technology for butterfly unit using cadence RTL compiler and results are compared.

...read moreread less

4 citations

Proceedings Article•DOI•

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation

[...]

01 Jun 2022

TL;DR: Non-uniform quantization (N2UQ) as mentioned in this paper learns the flexible inequidistant input thresholds to better fit the underlying distribution while quantizing these real-valued inputs into equidistant output levels.

...read moreread less

Abstract: The nonuniform quantization strategy for compressing neural networks usually achieves better performance than its counterpart, i.e., uniform strategy, due to its superior representational capacity. However, many nonuniform quantization methods overlook the complicated projection process in implementing the nonuniformly quantized weights/activations, which incurs non-negligible time and space overhead in hardware deployment. In this study, we propose Nonuniform-to-Uniform Quantization (N2UQ), a method that can maintain the strong representation ability of nonuniform methods while being hardware-friendly and efficient as the uniform quantization for model inference. We achieve this through learning the flexible inequidistant input thresholds to better fit the underlying distribution while quantizing these real-valued inputs into equidistant output levels. To train the quantized network with learnable input thresholds, we introduce a generalized straight-through estimator (G-STE) for intractable backward derivative calculation w.r.t. threshold parameters. Additionally, we consider entropy preserving regularization to further reduce information loss in weight quantization. Even under this adverse constraint of imposing uniformly quantized weights and activations, our N2UQ outperforms state-of-the-art nonuniform quantization methods by 0.5 ~ 1.7% on ImageNet, demonstrating the contribution of N2UQ design. Code and models are available at: https://github.com/liuzechun/Nonuniform-to-Uniform-Quantization.

...read moreread less

3 citations

1
2
3
4
…

References

PDF

Open Access

More filters

Standard•DOI•

IEEE Standard for Floating-Point Arithmetic

[...]

Dan Zuras, M. F. Cowlishaw, Alex Aiken, Matthew Applegate, David H. Bailey, Steve Bass, Dileep Bhandarkar, Mahesh Bhat, David Bindel, Sylvie Boldo, Stephen Canon, Steven R. Carlough, Marius Cornea, John H. Crawford, Joseph D. Darcy, Debjit Das Sarma, Marc Daumas, Bob Davis, Mark Davis, Dick Delp, James Demmel, Mark A. Erle, Hossam A. H. Fahmy, J. P. Fasano, Richard J. Fateman, Eric Feng, Warren E. Ferguson, Alex Fit-Florea, Laurent Fournier, Chip Freitag, Ivan Godard, Roger A. Golliver, David Gustafson, Michel Hack, John R. Harrison, John Hauser, Yozo Hida, Chris N. Hinds, Graydon Hoare, David G. Hough, Jerry Huck, Jim Hull, Michael Ingrassia, David V. James, Rick James, William Kahan, John Kapernick, Richard Karpinski, Jeff Kidder, Plamen Koev, Ren-Cang Li, Zhishun A. Liu, Raymond Mak, Peter Markstein, David W. Matula, Guillaume Melquiond, Nobuyoshi Mori, Ricardo Morin, Ned Nedialkov, Craig Nelson, Stuart Oberman, Jon Okada, Ian Ollmann, Michael Parks, Tom Pittman, Eric Postpischil, Jason Riedy, Eric M. Schwarz, David Scott, Don Senzig, Ilya Sharapov, Jim Shearer, Michael Siu, Ron Smith, Chuck Stevens, Peter Tang, Pamela J. Taylor, James W. Thomas, Brandon Thompson, Wendy Thrash, Neil Toda, Son Dao Trong, Leonard Tsai, Charles Tsen, Fred Tydeman, Liang Wang, Scott Westbrook, Steve Winkler, Anthony Wood, Umit Yalcinalp, Fred Zemke, Paul Zimmermann - Show less +88 more

01 Jan 2008

1,354 citations

Journal Article•

A Reduced-Bit Multiplication Algorithm for Digital Arithmetic

[...]

Harpreet S. Dhillon, Abhijit Mitra

25 Jul 2008-World Academy of Science, Engineering and Technology, International Journal of Electrical, Computer, Energetic, Electronic and Communication Engineering

TL;DR: A reduced-bit multiplication algorithm based on the ancient Vedic multiplication formulae, Urdhva tiryakbhyam and Nikhilam, is proposed and is further optimized by use of some general arithmetic operations such as expansion and bit-shifting to take advantage of bit-reduction in multiplication.

...read moreread less

Abstract: A reduced-bit multiplication algorithm based on the ancient Vedic multiplication formulae is proposed in this paper. Both the Vedic multiplication formulae, Urdhva tiryakbhyam and Nikhilam, are first discussed in detail. Urdhva tiryakbhyam, being a general multiplication formula, is equally applicable to all cases of multiplication. It is applied to the digital arithmetic and is shown to yield a multiplier architecture which is very similar to the popular array multiplier. Due to its structure, it leads to a high carry prop- agation delay in case of multiplication of large numbers. Nikhilam Sutra, on the other hand, is more efficient in the multiplication of large numbers as it reduces the multiplication of two large numbers to that of two smaller numbers. The framework of the proposed algorithm is taken from this Sutra and is further optimized by use of some general arithmetic operations such as expansion and bit-shifting to take advantage of bit-reduction in multiplication. We illustrate the proposed algorithm by reducing a general 4£4-bit multiplication to a single 2 £ 2-bit multiplication operation.

...read moreread less

105 citations

"An efficient floating point multipl..." refers methods in this paper

...A more optimized hard [9, 10] is shown in Fig....
[...]

Proceedings Article•DOI•

A high speed and area efficient Booth recoded Wallace tree multiplier for fast arithmetic circuits

[...]

M Jagadeshwar Rao¹, Sanjay Dubey¹•Institutions (1)

Padmasri Dr. B. V. Raju Institute of Technology¹

01 Dec 2012

TL;DR: An improved version of tree based Wallace tree multiplier architecture using Booth Recoder using Booth algorithm and compressor adders is proposed, which shows that the proposed architecture is around 67 percent faster than the existing Wallace-tree multiplier.

...read moreread less

Abstract: A Wallace tree multiplier using Booth Recoder is proposed in this paper. It is an improved version of tree based Wallace tree multiplier architecture. This paper aims at additional reduction of latency and area of the Wallace tree multiplier. This is accomplished by the use of Booth algorithm and compressor adders. The coding is done in Verilog HDL and synthesized for Xilinx Virtex 6 FPGA device. The result shows that the proposed architecture is around 67 percent faster than the existing Wallace-tree multiplier, 53 percent faster than the Vedic multiplier, 22 percent faster than the radix-8 Booth multiplier, 18 percent faster than the radix-16 Booth Multiplier. In terms of area also, the proposed multiplier is much efficient.

...read moreread less

50 citations

Journal Article•DOI•

Low Power High Speed 16x16 bit Multiplier using Vedic Mathematics

[...]

R. K. Bathija, R. S. Meena, Sutapa Sarkar, Rajesh Sahu

18 Dec 2012-International Journal of Computer Applications

TL;DR: A simple digital multiplier architecture based on the Urdhva Tiryakbhyam (Vertically and Cross wise) Sutra of Vedic Mathematics is presented and an improved technique for low power and high speed multiplier of two binary numbers (16 bit each) is developed.

...read moreread less

Abstract: High-speed parallel multipliers are one of the keys in RISCs (Reduced Instruction Set Computers), DSPs (Digital Signal Processors), and graphics accelerators and so on. Array multiplier, Booth Multiplier and Wallace Tree multipliers are some of the standard approaches used in implementation of binary multiplier which are suitable for VLSI implementation. A simple digital multiplier (henceforth referred to as Vedic Multiplier in short VM) architecture based on the Urdhva Tiryakbhyam (Vertically and Cross wise) Sutra of Vedic Mathematics is presented. An improved technique for low power and high speed multiplier of two binary numbers (16 bit each) is developed. An algorithm is proposed and implemented on 16nm CMOS technology. The designed 16x16 bit multiplier dissipates a power of 0.17 mW. The propagation delay time of the proposed architecture is 27.15ns. These results are many improvements over power dissipations and delays reported in literature for Vedic and Booth Multiplier.

...read moreread less

32 citations

Design And Implementation Of 8-bit Vedic Multiplier

[...]

Miss. Rutuja Abhangrao, Miss. Shilpa Jadhav, Miss Priyanka Ghodke, Altaaf Mulani

26 Mar 2017

TL;DR: In this article, a vedic multiplier using Urdhva Tiryagbhyam sutra in Xilinx ISE is proposed and the design takes lesser time for operation than currently available multipliers.

...read moreread less

Abstract: Today's technology has raised demand for Fast and real time signal processing operation. Multiplication is one of the most important arithmetic operations. In this paper, we have proposed design of vedic multiplier using Urdhva Tiryagbhyam sutra in Xilinx ISE. This design takes lesser time for operation than currently available multipliers .It encompasses wide era of image processing and digital signal processing in much efficient way with increase in speed and thus leading to higher performance rating

...read moreread less

24 citations