scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Comparison of architectures of a coarse-grain reconfigurable multiply-accumulate unit

01 Dec 2013-pp 225-230
TL;DR: This paper presents the implementation of a multiplication accumulation (MAC) unit which is reconfigurable with respect to word lengths of the operands, capable of processing signed and unsigned numbers as per the requirement.
Abstract: The concept of reconfigurable computing facilitates flexibility of the software along with high performance of hardware. The FPGA based hardware provides bit level or fine grain re-configurability where as ASIC based hardware is capable of course grain reconfiguration that lead to accelerated (hardware) performances with lesser re-programming time. This paper presents the implementation of a multiplication accumulation (MAC) unit which is reconfigurable with respect to word lengths of the operands. The unit is capable of processing signed and unsigned numbers as per the requirement. The sub units- multiplier, adder and sign selection units are reconfigurable, can function as independent units and together as accumulation unit. Reconfiguration with word length, throughput or data type is implemented with the help of a set of multiplexers, de-multiplexers and pipeline registers. Two implementations using different adders were compared. One design uses carry save addition in adder module and the other uses ripple carry addition. The implementation using ripple carry adder shows significant improvement in area and power consumption over the other. However, the use of carry save adder gives about 2% improvement on speed than its counterpart.
Citations
More filters
Journal ArticleDOI
25 Jan 2021
TL;DR: In this article, the authors propose a resource-efficient Co-ordinate Rotation Digital Computer (CORDIC)-based neuron architecture (RECON) which can be configured to compute both multiply-accumulate (MAC) and non-linear activation function (AF) operations.
Abstract: Contemporary hardware implementations of artificial neural networks face the burden of excess area requirement due to resource-intensive elements such as multiplier and non-linear activation functions. The present work addresses this challenge by proposing a resource-efficient Co-ordinate Rotation Digital Computer (CORDIC)-based neuron architecture (RECON) which can be configured to compute both multiply-accumulate (MAC) and non-linear activation function (AF) operations. The CORDIC-based architecture uses linear and trigonometric relationships to realize MAC and AF operations respectively. The proposed design is synthesized and verified at 45nm technology using Cadence Virtuoso for all physical parameters. Implementation of the signed fixed-point 8-bit MAC using our design, shows 60% less area, latency, and power product (ALP) and shows improvement by 38% in area, 27% in power dissipation, and 15% in latency with respect to the state-of-the-art MAC design. Further, Monte-Carlo simulations for process-variations and device-mismatch are performed for both the proposed model and the state-of-the-art to evaluate expectations of functions of randomness in dynamic power variation. The dynamic power variation for our design shows that worst-case mean is $189.73\mu W$ which is 63% of the state-of-the-art.

20 citations

References
More filters
Posted Content
Yoonjin Kim1, Mary Kiemb1, Chulsoo Park1, Jinyong Jung1, Kiyoung Choi1 
TL;DR: In this article, the authors proposed a reconfigurable array architecture template and design space exploration flow for domain-specific optimization, which can reduce the hardware cost and the delay without any performance degradation for some application domains.
Abstract: Coarse-grained reconfigurable architectures aim to achieve both goals of high performance and flexibility. However, existing reconfigurable array architectures require many resources without considering the specific application domain. Functional resources that take long latency and/or large area can be pipelined and/or shared among the processing elements. Therefore the hardware cost and the delay can be effectively reduced without any performance degradation for some application domains. We suggest such reconfigurable array architecture template and design space exploration flow for domain-specific optimization. Experimental results show that our approach is much more efficient both in performance and area compared to existing reconfigurable architectures.

91 citations

Journal ArticleDOI
TL;DR: This paper shows that other schemes can be designed, based on the idea of pipelining a serial-input adder or a ripple-carry adder, to obtain pipelined adders for more than two numbers.
Abstract: A well-known scheme for obtaining high throughput adders is a pipeline in which each stage contains an array of half-adders performing a carry-save addition. This paper shows that other schemes can be designed, based on the idea of pipelining a serial-input adder or a ripple-carry adder. Such schemes offer a considerable savings of components while preserving high throughput. These schemes can be generalized by using (p,q) parallel counters to obtain pipelined adders for more than two numbers.

48 citations

Journal ArticleDOI
TL;DR: A virtual hardware mechanism, including the logic virtualization and the hardware device virtualization, is proposed, for dynamically partially reconfigurable systems, which can reduce up to 26% of the time required by using the conventional hardware reuse.
Abstract: The dynamic partial reconfiguration technology enables an embedded system to adapt its hardware functionalities at run-time to changing environment conditions. However, reconfigurable hardware functions are still managed as conventional hardware devices, and the enhancement of system performance using the partial reconfiguration technology is thus still limited. To further raise the utilization of reconfigurable hardware designs, we propose a virtual hardware mechanism, including the logic virtualization and the hardware device virtualization, for dynamically partially reconfigurable systems. Using the logic virtualization technique, a hardware function that has been configured in the field-programmable gate array (FPGA) can be virtualized to support more than one software application at run-time. Using the hardware device virtualization, a software application can access two or more different hardware functions through the same device node. In a network security reconfigurable system for multimedia applications, our experimental results also demonstrate that the utilization of reconfigurable hardware functions can be further raised using the virtual hardware mechanism. Furthermore, the virtual hardware mechanism can also reduce up to 26% of the time required by using the conventional hardware reuse.

39 citations

Proceedings ArticleDOI
A.F. Tenca1
08 Jun 2009
TL;DR: The design of a component to perform parallel addition of multiple floating-point (FP) operands is explored and the proposed design is more accurate than conventional FP addition using a network of 2-operand FP adders and it may have competitive area and delay depending on the number of input operands.
Abstract: The design of a component to perform parallel addition of multiple floating-point (FP) operands is explored in this work. In particular, a 3-input FP adder is discussed in more detail, but the main concepts and ideas presented in this work are valid for FP adders with more inputs. The proposed design is more accurate than conventional FP addition using a network of 2-operand FP adders and it may have competitive area and delay depending on the number of input operands. Implementation results of a 3-operand FP adder are presented to compare its performance to a network of 2-input FP adders.

35 citations

Proceedings ArticleDOI
27 Oct 2009
TL;DR: The modified carry skip adders presented in this paper provides better speed and power consumption as compare to conventional carryskip adder and other adders like ripple carry adder, carry lookahead adders, Ling adder), carry select adder.
Abstract: This paper presents performance analysis of different Fast Adders. The comparison is done on the basis of three performance parameters i.e. Area, Speed and Power consumption. Further, we present a design methodology of hybrid carry lookahead/carry skip adders (CLSKAs). This modified carry skip adder is modeled by using both fix and variable block size. In conventional carry skip adder, each block consists of ripple carry adder and skip logic is used after each block to generate carry for next block. The speed of operation depends on carry propagation from previous block to next block. In CLSKAs, we use carry lookahead logic in each block to generate carry for next block. The modified carry skip adders presented in this paper provides better speed and power consumption as compare to conventional carry skip adder and other adders like ripple carry adder, carry lookahead adder, Ling adder, carry select adder. The modified carry skip adders with fix block require few more CLB’s because of Carry lookahead logic, whereas with variable block scheme, area optimization is achieved.

34 citations