scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Implementation of single precision floating point multiplier using Karatsuba algorithm

01 Nov 2013-pp 254-256
TL;DR: An efficient floating point multiplier using Karatsuba algorithm that implements the significant multiplication along with sign bit and exponent computations is presented.
Abstract: This paper presents an efficient floating point multiplier using Karatsuba algorithm Digital signal processing algorithms and media applications use a large number of multiplications, which is both time and power consuming We have used IEEE 754 format for binary representation of the floating point numbers Verilog HDL is used to implement Karatsuba multiplication algorithm which is technology independent pipelined design This multiplier implements the significant multiplication along with sign bit and exponent computations Three stage pipelining is being used in the design with the latency of 8 clock cycles In this design, the mantissa bits are divided into three parts of particular bit width in such a way so that the multiplication can be done using the standard multipliers available in FPGA cyclone II device family and synthesized using Altera-Quartus II
Citations
More filters
Proceedings ArticleDOI
23 Apr 2015
TL;DR: An IEEE-754 based Vedic multiplier has been developed to carry out both single precision and double precision format floating point operations and its performance has been compared with Booth and Karatsuba based floating point multipliers.
Abstract: Most of the scientific operation involve floating point computations. It is necessary to implement faster multipliers occupying less area and consuming less power. Multipliers play a critical role in any digital design. Even though various multiplication algorithms have been in use, the performance of Vedic multipliers has not drawn a wider attention. Vedic mathematics involves application of 16 sutras or algorithms. One among these, the Urdhva tiryakbhyam sutra for multiplication has been considered in this work. An IEEE-754 based Vedic multiplier has been developed to carry out both single precision and double precision format floating point operations and its performance has been compared with Booth and Karatsuba based floating point multipliers. Xilinx FPGA has been made use of while implementing these algorithms and a resource utilization and timing performance based comparison has also been made.

19 citations


Cites methods from "Implementation of single precision ..."

  • ...Implementation of floating point multiplier using Karatsuba algorithm incorporating pipelining techniques with a latency of 8 cycles is implemented in [7]....

    [...]

  • ...The mantissa is multiplied using any of the multiplication algorithms [7]....

    [...]

Proceedings ArticleDOI
09 Jul 2015
TL;DR: A combination of Karatsuba algorithm and Urdhva-Tiryagbhyam algorithm is used to implement unsigned binary multiplier for mantissa multiplication which gives a better implementation in terms of delay and power.
Abstract: Floating point multiplication is a crucial operation in high power computing applications such as image processing, signal processing etc. And also multiplication is the most time and power consuming operation. This paper proposes an efficient method for IEEE 754 floating point multiplication which gives a better implementation in terms of delay and power. A combination of Karatsuba algorithm and Urdhva-Tiryagbhyam algorithm (Vedic Mathematics) is used to implement unsigned binary multiplier for mantissa multiplication. The multiplier is implemented using Verilog HDL, targeted on Spartan-3E and Virtex-4 FPGA.

17 citations


Cites methods from "Implementation of single precision ..."

  • ...Karatsuba Algorithm for multiplication Karatsuba multiplication algorithm [11, 12] multiplying very large numbers....

    [...]

Proceedings ArticleDOI
23 Apr 2015
TL;DR: A combination of Karatsuba algorithm and Urdhva-Tiryagbhyam algorithm is used to implement the proposed unsigned binary multiplier, which gives a better implementation in terms of delay and area.
Abstract: Binary multiplication is an important operation in many high power computing applications and floating point multiplier designs. And also multiplication is the most time, area and power consuming operation. This paper proposes an efficient method for unsigned binary multiplication which gives a better implementation in terms of delay and area. A combination of Karatsuba algorithm and Urdhva-Tiryagbhyam algorithm (Vedic Mathematics) is used to implement the proposed unsigned binary multiplier. Karatsuba algorithm is best suited for higher bits and Urdhva-Tiryagbhyam algorithm is best for lower bit multiplication. A new algorithm by combining both helps to reduce the drawbacks of both. The multiplier is implemented using Verilog HDL, targeted on Spartan-3E and Virtex-4 FPGA.

11 citations


Cites methods from "Implementation of single precision ..."

  • ...Karatsuba algorithm [10, 11] is the most used algorithm for higher bit length multipliers....

    [...]

Journal ArticleDOI
TL;DR: An algorithm for approximate multiplication based on Karatsuba multiplication method is proposed which is compared with an existing approximate hybrid Wallace tree multiplier and it is found that the proposed approximateKaratsuba multiplier is better than existing approximate Hybrid Wallace Tree multiplier in terms of hardware, latency as well as accuracy.
Abstract: Approximate computing is one of the most trending topics for research since the introduction of error-resilient applications. Approximate arithmetic helps reduce the power consumption, hardware utilization and delay time at the expense of accuracy. Out of all arithmetic operations, multiplication is the most widely used and it forms the crucial section in many applications. Therefore, it is necessary to optimize it as per the requirement of a system. This paper proposes an algorithm for approximate multiplication based on Karatsuba multiplication method which is compared with an existing approximate hybrid Wallace tree multiplier and it is found that the proposed approximate Karatsuba multiplier is better than existing approximate hybrid Wallace Tree multiplier in terms of hardware, latency as well as accuracy. The performance of proposed multiplier is also evaluated with the help of a application of image processing and it is found that proposed multiplier gives similar results as exact multiplier unit. Both the multipliers are implemented in Verilog HDL using Vivado 2018.3.

8 citations

Proceedings ArticleDOI
01 Dec 2016
TL;DR: A novel approach for single-precision floating multiplier is developed by using Urdhva Tiryagbhyam technique and different adders to decrease the complexity of mantissa multiplication.
Abstract: Floating-point arithmetic plays major role in computer systems. Many of the digital signal processing applications use floating-point algorithms for execution of the floating-point computations and every operating system is answerable practically for floating-point special cases like underflow and overflow. The single precision floating point arithmetic operations are multiplication, division, addition and subtraction among all these multiplication is extensively used and involves composite arithmetic functions. The single precision (32-bit) floating point number split into three parts, Sign part, and Exponent part and Mantissa part. The most significant bit of the number is a sign bit and it is a 1-bit length. Next 8-bits represent the exponent part of the number and next 23-bits represent the mantissa part of the number. Mantissa part needs large 24-bit multiplication. The performance of the single-precision floating point number mostly based on the occupied area and delay of the multiplier. In this paper, a novel approach for single-precision floating multiplier is developed by using Urdhva Tiryagbhyam technique and different adders to decrease the complexity of mantissa multiplication. This requires less hardware for multiplication compared to that conventional multipliers and used different regular adders like carry select, carry skip adders and parallel prefix adders for exponent addition. Further, the performance parameters comparison was done in terms of area and delay. All modules are coded by using Verilog HDL and simulated with Xilinx ISE tool.

7 citations


Cites background or methods from "Implementation of single precision ..."

  • ...The enhancement of the worst case delay is attained by incorporating more number of carry skip logics to form a block of carry skip adder [8]....

    [...]

  • ...The carry bit from the last stage that means previous least significant stage is used to select the computed values of the output carry and sum [8]....

    [...]

References
More filters
Journal ArticleDOI
David E. Goldberg1
TL;DR: This paper presents a tutorial on the aspects of floating-point that have a direct impact on designers of computer systems, and concludes with examples of how computer system builders can better support floating point.
Abstract: Floating-point arithmetic is considered as esoteric subject by many people. This is rather surprising, because floating-point is ubiquitous in computer systems: Almost every language has a floating-point datatype; computers from PCs to supercomputers have floating-point accelerators; most compilers will be called upon to compile floating-point algorithms from time to time; and virtually every operating system must respond to floating-point exceptions such as overflow. This paper presents a tutorial on the aspects of floating-point that have a direct impact on designers of computer systems. It begins with background on floating-point representation and rounding error, continues with a discussion of the IEEE floating point standard, and concludes with examples of how computer system builders can better support floating point.

1,372 citations

Journal ArticleDOI
TL;DR: An assessment of the strengths and weaknesses of using FPGA's for floating-point arithmetic.
Abstract: We present empirical results describing the implementation of an IEEE Standard 754 compliant floating-point adder/multiplier using field programmable gate arrays. The use of FPGA's permits fast and accurate quantitative evaluation of a variety of circuit design tradeoffs for addition and multiplication. PPGA's also permit accurate assessments of the area and time costs associated with various features of the IEEE floating-point standard, including rounding and gradual underflow. These costs are analyzed, along with the effects of architectural correlation, a phenomenon that occurs when the cost of combining architectural features exceeds the sum of separate implementation. We conclude with an assessment of the strengths and weaknesses of using FPGA's for floating-point arithmetic. >

93 citations


"Implementation of single precision ..." refers background in this paper

  • ...That is why to form a complete significand [2], we need to add one extra bit to the fractional part....

    [...]

Posted Content
Yoonjin Kim1, Mary Kiemb1, Chulsoo Park1, Jinyong Jung1, Kiyoung Choi1 
TL;DR: In this article, the authors proposed a reconfigurable array architecture template and design space exploration flow for domain-specific optimization, which can reduce the hardware cost and the delay without any performance degradation for some application domains.
Abstract: Coarse-grained reconfigurable architectures aim to achieve both goals of high performance and flexibility. However, existing reconfigurable array architectures require many resources without considering the specific application domain. Functional resources that take long latency and/or large area can be pipelined and/or shared among the processing elements. Therefore the hardware cost and the delay can be effectively reduced without any performance degradation for some application domains. We suggest such reconfigurable array architecture template and design space exploration flow for domain-specific optimization. Experimental results show that our approach is much more efficient both in performance and area compared to existing reconfigurable architectures.

91 citations

Proceedings ArticleDOI
Yoonjin Kim1, Mary Kiemb1, Chulsoo Park1, Jinyong Jung1, Kiyoung Choi1 
07 Mar 2005
TL;DR: A reconfigurable array architecture template and a design space exploration flow for domain-specific optimization are suggested and Experimental results show that this approach is much more efficient, in both performance and area, compared to existing reconfigured array architectures.
Abstract: Coarse-grained reconfigurable architectures aim to achieve goals of both high performance and flexibility. However, existing reconfigurable array architectures require many resources without considering the specific application domain. Functional resources that take long latency and/or large area can be pipelined and/or shared among the processing elements. Therefore, the hardware cost and the delay can be effectively reduced without any performance degradation for some application domains. We suggest such a reconfigurable array architecture template and a design space exploration flow for domain-specific optimization. Experimental results show that our approach is much more efficient, in both performance and area, compared to existing reconfigurable architectures.

86 citations


"Implementation of single precision ..." refers background in this paper

  • ...Latency is 8 cycles and after 8 cycles, output is obtained for every cycle because of pipelining [5]....

    [...]

Proceedings ArticleDOI
24 Apr 2011
TL;DR: An efficient implementation of an IEEE 754 single precision floating point multiplier targeted for Xilinx Virtex-5 FPGA using VHDL to implement a technology-independent pipelined design.
Abstract: In this paper we describe an efficient implementation of an IEEE 754 single precision floating point multiplier targeted for Xilinx Virtex-5 FPGA. VHDL is used to implement a technology-independent pipelined design. The multiplier implementation handles the overflow and underflow cases. Rounding is not implemented to give more precision when using the multiplier in a Multiply and Accumulate (MAC) unit. With latency of three clock cycles the design achieves 301 MFLOPs. The multiplier was verified against Xilinx floating point multiplier core.

83 citations


"Implementation of single precision ..." refers methods in this paper

  • ...Logic is reduced by simply implementing first seven one subtractors (OS) and two MSB zero subtractors (ZS) [6]....

    [...]