scispace - formally typeset
Search or ask a question
Author

V. Bharathi

Bio: V. Bharathi is an academic researcher from Anna University. The author has contributed to research in topics: Logic gate & Carry (arithmetic). The author has an hindex of 1, co-authored 3 publications receiving 3 citations.

Papers
More filters
Book ChapterDOI
01 Jan 2016
TL;DR: A modified architecture for Floating-Point Fused Multiply-Add (FMA) unit for low power and reduced area applications is presented and it is found that the proposed FMA architecture achieved 17 % improvement in power and 6% improvement in area when compared to the existing Bridge FMA unit.
Abstract: In this paper, a modified architecture for Floating-Point Fused Multiply-Add (FMA) unit for low power and reduced area applications is presented. FMA unit is the one which computes a floating-point (A × B) + C operation as a single instruction. In this paper a bridge unit has been used, which connects the existing floating-point multiplier (FMUL) and the FMUL’s add-round unit in the co-processor to perform FMA operation. The main objective of this modified FMA unit is to reuse as many components as possible to allow parallel floating-point addition and floating-point multiplication or floating-point fused multiply-add functionality by addition of little hardware into the FMUL’s add-round unit. In this paper each unit is designed using Verilog HDL. The design is simulated using Altera ModelSim and is synthesized using Cadence RTL compiler in 45 nm. All the floating-point arithmetics are implemented in IEEE-754 double precision format. It is found that the proposed FMA architecture achieved 17 % improvement in power and 6 % improvement in area when compared to the existing Bridge FMA unit.

5 citations

Book ChapterDOI
01 Jan 2016
TL;DR: 4 × 4 irreversible MAC has been compared with reversible MAC and it has been found that, there is 25.6 % reduction in the power consumption.
Abstract: Reversible quantum logic plays an important role in quantum computing. This paper proposes implementation of MAC unit using reversible logic. We have discussed all the elementary reversible logic gates which are used in the design. Here, 4 × 4 irreversible MAC has been compared with reversible MAC and it has been found that, there is 25.6 % reduction in the power consumption. The design has been simulated using ModelSim and synthesized using Cadence RTL compiler.
Book ChapterDOI
01 Jan 2016
TL;DR: The carry-select adder with K–S algorithm is found to be one of the fastest algorithms for addition and Urdhva-Tiryagbhyam Karastuba algorithm for multiplication, which are the most important operations in any central processing unit.
Abstract: There were no limits for speed of operation of arithmetic/logical circuits. One can always try to increase their speed. There were many proposed algorithms, which would work fast to specified arithmetic operations. So, there is the need for the implementation of a faster design by putting these fastest algorithms in a single ALU. The carry-select adder with K–S algorithm is found to be one of the fastest algorithms for addition and Urdhva-Tiryagbhyam Karastuba algorithm for multiplication, which are the most important operations in any central processing unit. We have used QUARTUS-II software. This design can be used where high speed computation is needed. This design would work for unsigned, fixed point, 8-bit operations. We have taken the different adder circuits and compared their performance. These circuits are the basic elements or building blocks of an ALU. The circuits have been simulated using 90 nm technology of Cadence and Quartus II EP2C20F484C7. Adders can be implemented using EX-OR/EX-XNOR gates, transmission gates, HSD (High Speed Domino) technique, domino logic. Parallel feedback carry adder, ripple carry adder, carry look ahead adder, carry-select adder are some of the adders that been implemented using Cadence and Quartus-II. We found that 10T PFCA is efficient compared to 11 T PFCA to some extent. Adders based on XOR and XNOR gates have the least delay compared to the other adders that we have used.

Cited by
More filters
Proceedings ArticleDOI
21 Apr 2022
TL;DR: In this article , a comparative study of hardware implementations of multiply and add (MAC), divider and square root units based on IEEE754, posit, bfloat16 and posit number representations is presented.
Abstract: Smart systems are enabled by artificial intelligence (AI), which is realized using machine learning (ML) techniques. ML algorithms are implemented in the hardware using fixed-point, integer, and floating-point representations. The performance of hardware implementation gets impacted due to very small or large values because of their limited word size. To overcome this limitation, various floating-point representations are employed, such as IEEE754, posit, bfloat16 etc. Moreover, for the efficient implementation of ML algorithms, one of the most intuitive solutions is to use a suitable number system. As we know, multiply and add (MAC), divider and square root units are the most common building blocks of various ML algorithms. Therefore, in this paper, we present a comparative study of hardware implementations of these units based on bfloat16 and posit number representations. It is observed that posit based implementations perform $1.50\times$ better in terms of accuracy, but consume $1.51\times$ more hardware resources as compared to bfloat16 based realizations. Thus, as per the trade-off between accuracy and resource utilization, it can be stated that the bfloat16 number representation may be preferred over other existing number representations in the hardware implementations of ML algorithms.

2 citations

Journal ArticleDOI
TL;DR: This work verifies the proposed approach that shows a decrement of 27 % in the combinational path delay with an increment of around 8% in the number of LUTs used and implements a fast and area efficient Carry Select Adder.
Abstract: Floating-point arithmetic operations on digital systems have become an important aspect of research in recent times. Many architecture have been proposed and implemented by various researchers and their merits and demerits are compared. Floating point numbers are first converted into the IEEE 754 single or double precision format in order to be used in the digital systems. The arithmetic operations require various steps to be followed for the correct and accurate steps. In the proposed approach a fast and area efficient Carry Select Adder are implemented along with the parallel processing of various units used in the architecture. The result also verifies the proposed approach that shows a decrement of 27 % in the combinational path delay with an increment of around 8% in the number of LUTs used.

2 citations

Proceedings ArticleDOI
21 Apr 2022
TL;DR: It can be stated that the bfloat16 number representation may be preferred over other existing number representations in the hardware implementations of ML algorithms.
Abstract: Smart systems are enabled by artificial intelligence (AI), which is realized using machine learning (ML) techniques. ML algorithms are implemented in the hardware using fixed-point, integer, and floating-point representations. The performance of hardware implementation gets impacted due to very small or large values because of their limited word size. To overcome this limitation, various floating-point representations are employed, such as IEEE754, posit, bfloat16 etc. Moreover, for the efficient implementation of ML algorithms, one of the most intuitive solutions is to use a suitable number system. As we know, multiply and add (MAC), divider and square root units are the most common building blocks of various ML algorithms. Therefore, in this paper, we present a comparative study of hardware implementations of these units based on bfloat16 and posit number representations. It is observed that posit based implementations perform $1.50\times$ better in terms of accuracy, but consume $1.51\times$ more hardware resources as compared to bfloat16 based realizations. Thus, as per the trade-off between accuracy and resource utilization, it can be stated that the bfloat16 number representation may be preferred over other existing number representations in the hardware implementations of ML algorithms.

1 citations

Journal Article
TL;DR: Modified square root modified carry select adder is implemented with optimized parameters and the proposed approach is designed which first extracts the significant and exponent value in the first unit and multiplication-addition operations on the second block.
Abstract: Clock gating is a well known method to reduce the power in synchronous design. Various arithmetic operations use floating point calculation and can be used for implementation of various computational and logical unit operations. In this proposed work a fused multiplyaddition unit is used, which utilizes the common addition block for both addition and multiplication operations. The floating point number is first converted into the IEEE 754 format and then the calculation for both addition and multiplication is performed. The significant is extracted from the number and the calculations are performed on the basis of the exponent difference between the numbers. In the proposed approach a parallel architecture is designed which first extracts the significant and exponent value in the first unit and multiplication-addition operations on the second block. The final output is carried out on the third block where normalization and zero detector operations are performed. In this paper modified square root modified carry select adder is implemented with optimized parameters.