Implementation of single precision floating point multiplier using Karatsuba algorithm

doi:10.1109/ICGCE.2013.6823439

Home
/
Papers
/
Implementation of single precision floating point multiplier using Karatsuba algorithm

Proceedings Article•DOI•

Implementation of single precision floating point multiplier using Karatsuba algorithm

Anand Mehta¹, C. B. Bidhul¹, Sajeevan Joseph¹, P. Jayakrishnan¹•Institutions (1)

VIT University¹

01 Nov 2013-pp 254-256

TL;DR: An efficient floating point multiplier using Karatsuba algorithm that implements the significant multiplication along with sign bit and exponent computations is presented.

read less

Abstract: This paper presents an efficient floating point multiplier using Karatsuba algorithm Digital signal processing algorithms and media applications use a large number of multiplications, which is both time and power consuming We have used IEEE 754 format for binary representation of the floating point numbers Verilog HDL is used to implement Karatsuba multiplication algorithm which is technology independent pipelined design This multiplier implements the significant multiplication along with sign bit and exponent computations Three stage pipelining is being used in the design with the latency of 8 clock cycles In this design, the mantissa bits are divided into three parts of particular bit width in such a way so that the multiplication can be done using the standard multipliers available in FPGA cyclone II device family and synthesized using Altera-Quartus II

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

FPGA implementation of vedic floating point multiplier

[...]

Ravi Kishore Kodali¹, Lakshmi Boppana¹, Sai Sourabh Yenamachintala¹•Institutions (1)

National Institute of Technology, Warangal¹

23 Apr 2015

TL;DR: An IEEE-754 based Vedic multiplier has been developed to carry out both single precision and double precision format floating point operations and its performance has been compared with Booth and Karatsuba based floating point multipliers.

...read moreread less

Abstract: Most of the scientific operation involve floating point computations. It is necessary to implement faster multipliers occupying less area and consuming less power. Multipliers play a critical role in any digital design. Even though various multiplication algorithms have been in use, the performance of Vedic multipliers has not drawn a wider attention. Vedic mathematics involves application of 16 sutras or algorithms. One among these, the Urdhva tiryakbhyam sutra for multiplication has been considered in this work. An IEEE-754 based Vedic multiplier has been developed to carry out both single precision and double precision format floating point operations and its performance has been compared with Booth and Karatsuba based floating point multipliers. Xilinx FPGA has been made use of while implementing these algorithms and a resource utilization and timing performance based comparison has also been made.

...read moreread less

19 citations

Cites methods from "Implementation of single precision ..."

...Implementation of floating point multiplier using Karatsuba algorithm incorporating pipelining techniques with a latency of 8 cycles is implemented in [7]....
[...]
...The mantissa is multiplied using any of the multiplication algorithms [7]....
[...]

Proceedings Article•DOI•

An efficient floating point multiplier design for high speed applications using Karatsuba algorithm and Urdhva-Tiryagbhyam algorithm

[...]

S Arish¹, Rajender Kumar Sharma¹•Institutions (1)

National Institute of Technology, Kurukshetra¹

09 Jul 2015

TL;DR: A combination of Karatsuba algorithm and Urdhva-Tiryagbhyam algorithm is used to implement unsigned binary multiplier for mantissa multiplication which gives a better implementation in terms of delay and power.

...read moreread less

Abstract: Floating point multiplication is a crucial operation in high power computing applications such as image processing, signal processing etc. And also multiplication is the most time and power consuming operation. This paper proposes an efficient method for IEEE 754 floating point multiplication which gives a better implementation in terms of delay and power. A combination of Karatsuba algorithm and Urdhva-Tiryagbhyam algorithm (Vedic Mathematics) is used to implement unsigned binary multiplier for mantissa multiplication. The multiplier is implemented using Verilog HDL, targeted on Spartan-3E and Virtex-4 FPGA.

...read moreread less

17 citations

Cites methods from "Implementation of single precision ..."

...Karatsuba Algorithm for multiplication Karatsuba multiplication algorithm [11, 12] multiplying very large numbers....
[...]

Proceedings Article•DOI•

An efficient binary multiplier design for high speed applications using Karatsuba algorithm and Urdhva-Tiryagbhyam algorithm

[...]

S Arish¹, Rajender Kumar Sharma¹•Institutions (1)

National Institute of Technology, Kurukshetra¹

23 Apr 2015

TL;DR: A combination of Karatsuba algorithm and Urdhva-Tiryagbhyam algorithm is used to implement the proposed unsigned binary multiplier, which gives a better implementation in terms of delay and area.

...read moreread less

Abstract: Binary multiplication is an important operation in many high power computing applications and floating point multiplier designs. And also multiplication is the most time, area and power consuming operation. This paper proposes an efficient method for unsigned binary multiplication which gives a better implementation in terms of delay and area. A combination of Karatsuba algorithm and Urdhva-Tiryagbhyam algorithm (Vedic Mathematics) is used to implement the proposed unsigned binary multiplier. Karatsuba algorithm is best suited for higher bits and Urdhva-Tiryagbhyam algorithm is best for lower bit multiplication. A new algorithm by combining both helps to reduce the drawbacks of both. The multiplier is implemented using Verilog HDL, targeted on Spartan-3E and Virtex-4 FPGA.

...read moreread less

11 citations

Cites methods from "Implementation of single precision ..."

...Karatsuba algorithm [10, 11] is the most used algorithm for higher bit length multipliers....
[...]

Journal Article•DOI•

Approximate Karatsuba multiplier for error-resilient applications

[...]

Riya Jain¹, Neeta Pandey¹•Institutions (1)

Delhi Technological University¹

01 Feb 2021-Aeu-international Journal of Electronics and Communications

TL;DR: An algorithm for approximate multiplication based on Karatsuba multiplication method is proposed which is compared with an existing approximate hybrid Wallace tree multiplier and it is found that the proposed approximateKaratsuba multiplier is better than existing approximate Hybrid Wallace Tree multiplier in terms of hardware, latency as well as accuracy.

...read moreread less

Abstract: Approximate computing is one of the most trending topics for research since the introduction of error-resilient applications. Approximate arithmetic helps reduce the power consumption, hardware utilization and delay time at the expense of accuracy. Out of all arithmetic operations, multiplication is the most widely used and it forms the crucial section in many applications. Therefore, it is necessary to optimize it as per the requirement of a system. This paper proposes an algorithm for approximate multiplication based on Karatsuba multiplication method which is compared with an existing approximate hybrid Wallace tree multiplier and it is found that the proposed approximate Karatsuba multiplier is better than existing approximate hybrid Wallace Tree multiplier in terms of hardware, latency as well as accuracy. The performance of proposed multiplier is also evaluated with the help of a application of image processing and it is found that proposed multiplier gives similar results as exact multiplier unit. Both the multipliers are implemented in Verilog HDL using Vivado 2018.3.

...read moreread less

8 citations

Proceedings Article•DOI•

Comparative study on performance of single precision floating point multiplier using vedic multiplier and different types of adders

[...]

K V Gowreesrinivas¹, P. Samundiswary¹•Institutions (1)

Pondicherry University¹

01 Dec 2016

TL;DR: A novel approach for single-precision floating multiplier is developed by using Urdhva Tiryagbhyam technique and different adders to decrease the complexity of mantissa multiplication.

...read moreread less

Abstract: Floating-point arithmetic plays major role in computer systems. Many of the digital signal processing applications use floating-point algorithms for execution of the floating-point computations and every operating system is answerable practically for floating-point special cases like underflow and overflow. The single precision floating point arithmetic operations are multiplication, division, addition and subtraction among all these multiplication is extensively used and involves composite arithmetic functions. The single precision (32-bit) floating point number split into three parts, Sign part, and Exponent part and Mantissa part. The most significant bit of the number is a sign bit and it is a 1-bit length. Next 8-bits represent the exponent part of the number and next 23-bits represent the mantissa part of the number. Mantissa part needs large 24-bit multiplication. The performance of the single-precision floating point number mostly based on the occupied area and delay of the multiplier. In this paper, a novel approach for single-precision floating multiplier is developed by using Urdhva Tiryagbhyam technique and different adders to decrease the complexity of mantissa multiplication. This requires less hardware for multiplication compared to that conventional multipliers and used different regular adders like carry select, carry skip adders and parallel prefix adders for exponent addition. Further, the performance parameters comparison was done in terms of area and delay. All modules are coded by using Verilog HDL and simulated with Xilinx ISE tool.

...read moreread less

7 citations

Cites background or methods from "Implementation of single precision ..."

...The enhancement of the worst case delay is attained by incorporating more number of carry skip logics to form a block of carry skip adder [8]....
[...]
...The carry bit from the last stage that means previous least significant stage is used to select the computed values of the output carry and sum [8]....
[...]

1
2
3
4
…
5

References

PDF

Open Access

More filters

Journal Article•DOI•

What every computer scientist should know about floating-point arithmetic

[...]

David E. Goldberg¹•Institutions (1)

PARC¹

01 Mar 1991-ACM Computing Surveys

TL;DR: This paper presents a tutorial on the aspects of floating-point that have a direct impact on designers of computer systems, and concludes with examples of how computer system builders can better support floating point.

...read moreread less

Abstract: Floating-point arithmetic is considered as esoteric subject by many people. This is rather surprising, because floating-point is ubiquitous in computer systems: Almost every language has a floating-point datatype; computers from PCs to supercomputers have floating-point accelerators; most compilers will be called upon to compile floating-point algorithms from time to time; and virtually every operating system must respond to floating-point exceptions such as overflow. This paper presents a tutorial on the aspects of floating-point that have a direct impact on designers of computer systems. It begins with background on floating-point representation and rounding error, continues with a discussion of the IEEE floating point standard, and concludes with examples of how computer system builders can better support floating point.

...read moreread less

1,372 citations

Journal Article•DOI•

Field programmable gate arrays and floating point arithmetic

[...]

Barry Fagin¹, C. Renard²•Institutions (2)

United States Air Force Academy¹, University of Nantes²

01 Sep 1994-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: An assessment of the strengths and weaknesses of using FPGA's for floating-point arithmetic.

...read moreread less

Abstract: We present empirical results describing the implementation of an IEEE Standard 754 compliant floating-point adder/multiplier using field programmable gate arrays. The use of FPGA's permits fast and accurate quantitative evaluation of a variety of circuit design tradeoffs for addition and multiplication. PPGA's also permit accurate assessments of the area and time costs associated with various features of the IEEE floating-point standard, including rounding and gradual underflow. These costs are analyzed, along with the effects of architectural correlation, a phenomenon that occurs when the cost of combining architectural features exceeds the sum of separate implementation. We conclude with an assessment of the strengths and weaknesses of using FPGA's for floating-point arithmetic. >

...read moreread less

93 citations

"Implementation of single precision ..." refers background in this paper

...That is why to form a complete significand [2], we need to add one extra bit to the fractional part....
[...]

Posted Content•

Resource Sharing and Pipelining in Coarse-Grained Reconfigurable Architecture for Domain-Specific Optimization

[...]

Yoonjin Kim¹, Mary Kiemb¹, Chulsoo Park¹, Jinyong Jung¹, Kiyoung Choi¹ - Show less +1 more•Institutions (1)

Seoul National University¹

25 Oct 2007-arXiv: Hardware Architecture

TL;DR: In this article, the authors proposed a reconfigurable array architecture template and design space exploration flow for domain-specific optimization, which can reduce the hardware cost and the delay without any performance degradation for some application domains.

...read moreread less

Abstract: Coarse-grained reconfigurable architectures aim to achieve both goals of high performance and flexibility. However, existing reconfigurable array architectures require many resources without considering the specific application domain. Functional resources that take long latency and/or large area can be pipelined and/or shared among the processing elements. Therefore the hardware cost and the delay can be effectively reduced without any performance degradation for some application domains. We suggest such reconfigurable array architecture template and design space exploration flow for domain-specific optimization. Experimental results show that our approach is much more efficient both in performance and area compared to existing reconfigurable architectures.

...read moreread less

91 citations

Proceedings Article•DOI•

Resource Sharing and Pipelining in Coarse-Grained Reconfigurable Architecture for Domain-Specific Optimization

[...]

Yoonjin Kim¹, Mary Kiemb¹, Chulsoo Park¹, Jinyong Jung¹, Kiyoung Choi¹ - Show less +1 more•Institutions (1)

Seoul National University¹

07 Mar 2005

TL;DR: A reconfigurable array architecture template and a design space exploration flow for domain-specific optimization are suggested and Experimental results show that this approach is much more efficient, in both performance and area, compared to existing reconfigured array architectures.

...read moreread less

Abstract: Coarse-grained reconfigurable architectures aim to achieve goals of both high performance and flexibility. However, existing reconfigurable array architectures require many resources without considering the specific application domain. Functional resources that take long latency and/or large area can be pipelined and/or shared among the processing elements. Therefore, the hardware cost and the delay can be effectively reduced without any performance degradation for some application domains. We suggest such a reconfigurable array architecture template and a design space exploration flow for domain-specific optimization. Experimental results show that our approach is much more efficient, in both performance and area, compared to existing reconfigurable architectures.

...read moreread less

86 citations

"Implementation of single precision ..." refers background in this paper

...Latency is 8 cycles and after 8 cycles, output is obtained for every cycle because of pipelining [5]....
[...]

Proceedings Article•DOI•

An efficient implementation of floating point multiplier

[...]

Mohamed Al-Ashrafy¹, Ashraf Salem¹, Wagdy R. Anis²•Institutions (2)

Mentor Graphics¹, Ain Shams University²

24 Apr 2011

TL;DR: An efficient implementation of an IEEE 754 single precision floating point multiplier targeted for Xilinx Virtex-5 FPGA using VHDL to implement a technology-independent pipelined design.

...read moreread less

Abstract: In this paper we describe an efficient implementation of an IEEE 754 single precision floating point multiplier targeted for Xilinx Virtex-5 FPGA. VHDL is used to implement a technology-independent pipelined design. The multiplier implementation handles the overflow and underflow cases. Rounding is not implemented to give more precision when using the multiplier in a Multiply and Accumulate (MAC) unit. With latency of three clock cycles the design achieves 301 MFLOPs. The multiplier was verified against Xilinx floating point multiplier core.

...read moreread less

83 citations