scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Design and Performance Evaluation of Approximate Floating-Point Multipliers

TL;DR: It is shown that the proposed approximate single precision FP multiplier design reduces power consumption, area and delay by up to 61, 55, and 49% respectively compared with its exact counterpart while incurring in a moderate error.
Abstract: Approximate/inexact computing has become an attractive approach for designing high performance and low power arithmetic circuits. Floating-point (FP) arithmetic is required in many applications, such as digital signal processing and machine learning. Different approximate FP multipliers are proposed in this paper, the accuracy and the circuit requirements of these designs are assessed to select the best approximate scheme as according to different metrics. It is shown that the proposed approximate single precision FP multiplier design reduces power consumption, area and delay by up to 61%, 55%, and 49% respectively compared with its exact counterpart while incurring in a moderate error, moreover this paper shows that the so-called IFPM24-15 multiplier is the most efficient design in terms of PDP and NMED compared with previous inexact FP multipliers. High dynamic range (HDR) images are processed using the proposed approximate FP multipliers to show the validity of the approximate design.
Citations
More filters
Proceedings ArticleDOI
18 Jun 2017
TL;DR: This paper proposes a novel approximate floating point multiplier, called CFPU, which significantly reduces energy and improves performance of multiplication at the expense of accuracy, and shows that it can outperforms a standard FPU when at least 4% of multiplications are performed in approximate mode.
Abstract: Many applications, such as machine learning and data sensing are statistical in nature and can tolerate some level of inaccuracy in their computation. Approximate computation is a viable method to save energy and increase performance by trading energy for accuracy. There are a number of proposed approximate solutions, however, they are limited to a small range of applications because they cannot control the error rate of their output. In this paper, we propose a novel approximate floating point multiplier, called CFPU, which significantly reduces energy and improves performance of multiplication at the expense of accuracy. Our design approximately models multiplication by replacing the most costly step of the operation with a lower energy alternative. In order to tune the level of approximation, CFPU dynamically identifies the inputs which will produce the largest approximation error and processes them in precise CFPU mode. We showed that our CFPU can outperforms a standard FPU when at least 4% of multiplications are performed in approximate mode. In our tested applications this percentage of multiplications is substantially higher, leading to significant energy savings. Our experimental evaluation on AMD Southern Island GPU shows that replacing the proposed CFPU with traditional FPUs results in 77% energy savings and 3.5× energy-delay product improvement over eight general OpenCL applications while providing acceptable quality of service. In addition, for the same level of accuracy, the CFPU provides 2.4× energy-delay product improvement compared to state-of-the-art approximate multipliers.

70 citations


Cites background from "Design and Performance Evaluation o..."

  • ...The multiplication of the mantissas is the most costly operation, taking 80% of the total energy of the multiply operation [23], so our approach removes it entirely....

    [...]

Journal ArticleDOI
18 Aug 2020
TL;DR: This work reviews methods devised to synthesize approximate circuits, given their exact functionality and an approximability threshold, and summarizes strategies for evaluating the error that circuit simplification can induce on the output, which guides synthesis techniques in choosing the circuit transformations that lead to a given amount of induced error.
Abstract: Approximate computing is an emerging paradigm that, by relaxing the requirement for full accuracy, offers benefits in terms of design area and power consumption This paradigm is particularly attractive in applications where the underlying computation has inherent resilience to small errors Such applications are abundant in many domains, including machine learning, computer vision, and signal processing In circuit design, a major challenge is the capability to synthesize the approximate circuits automatically without manually relying on the expertise of designers In this work, we review methods devised to synthesize approximate circuits, given their exact functionality and an approximability threshold We summarize strategies for evaluating the error that circuit simplification can induce on the output, which guides synthesis techniques in choosing the circuit transformations that lead to the largest benefit for a given amount of induced error We then review circuit simplification methods that operate at the gate or Boolean level, including those that leverage classical Boolean synthesis techniques to realize the approximations We also summarize strategies that take high-level descriptions, such as C or behavioral Verilog, and synthesize approximate circuits from these descriptions

43 citations


Cites background from "Design and Performance Evaluation o..."

  • ...Except for very few works on adders [28] and multipliers [66], most of existing research on approximate arithmetic have focused on integer arithmetic disregarding floating point operators....

    [...]

Journal ArticleDOI
TL;DR: A tiered approximate floating point multiplier, called CFPU, is proposed, which significantly reduces energy consumption and improves the performance of multiplication at a slight cost in accuracy.
Abstract: Many applications, such as machine learning and sensor data analysis, are statistical in nature and can tolerate some level of inaccuracy in their computation. Approximate computing is a viable method to save energy and increase performance by controllably trading off energy for accuracy. In this paper, we propose a tiered approximate floating point multiplier, called CFPU, which significantly reduces energy consumption and improves the performance of multiplication at a slight cost in accuracy. The floating point multiplication is approximated by replacing the costly mantissa multiplication step of the operation with lower energy alternatives. We process the data by using one of the three modes: a basic approximate mode, an intermediate approximate mode, or on the exact hardware, depending on the accuracy requirements. We evaluate the efficiency of the proposed CFPU on a wide range of applications including twelve general OpenCL ones and three machine learning applications. Our results show that using the first CFPU approximation mode results in $3.5\times $ energy-delay product (EDP) improvement, compared to a GPU using traditional floating point units (FPUs), while ensuring less than 10% average relative error. Adding the second mode further increases the EDP improvement to $4.1\times $ , compared to an unmodified FPU, for less than 10% error. In addition, our results show that the proposed CFPU can achieve $2.8\times $ EDP improvement for multiply operations as compared to state-of-the-art approximate multipliers.

12 citations


Cites background from "Design and Performance Evaluation o..."

  • ...total energy of the multiply operation [48], so the first stage approximation removes it entirely....

    [...]

Proceedings ArticleDOI
22 May 2021
TL;DR: This paper is the first work that proposes a hardware architecture of an approximate and iterative posit multiplier, and the appropriate balance between latency and accuracy can be finely determined at runtime.
Abstract: Recently, many applications have demanded cheaper and faster arithmetic while providing a wider dynamic range than the popular IEEE 754 floating-point (FP) arithmetic. As a result, a new number format called posit was proposed. As in a variety of number systems, multiplication in posit arithmetic is one of the most frequently used but expensive operations. To reduce the number of hardware resources in a posit multiplier, this paper applies an iterative approach to posit multiplication. To exploit the features of the posit format, the number of truncated bits in the fraction component is dynamically changed depending on the number of regime bits. In addition, architectures for fast parser and packer are proposed. Thanks to the posit format, the proposed design supports about 60 decades wider dynamic range than a single-precision FP multiplier design. Using the iterative approach, the proposed design also reduces the number of lookup tables used in a previous exact posit multiplier design by 44% while achieving a 51% higher maximum frequency, and the appropriate balance between latency and accuracy can be finely determined at runtime. To the best of the authors' knowledge, this paper is the first work that proposes a hardware architecture of an approximate and iterative posit multiplier.

11 citations

Journal ArticleDOI
TL;DR: It is shown that the proposed approximate FLP multiplier designs further reduce delay, area, power consumption and power-delay product (PDP) while incurring about half of the normalized mean error distance (NMED) compared with the previous designs.
Abstract: Approximate/inexact computing has become an attractive approach for designing high performance and low power arithmetic circuits. Floating-point (FLP) arithmetic is required in many applications, such as digital signal processing, image processing and machine learning. Approximate FLP multipliers with variable accuracy are proposed in this paper; the accuracy and the circuit requirements of these designs are analyzed and assessed according to different metrics. It is shown that the proposed approximate FLP multiplier designs further reduce delay, area, power consumption and power-delay product (PDP) while incurring about half of the normalized mean error distance (NMED) compared with the previous designs. The proposed IFLPM24–15 is the most efficient design when considering both PDP and NMED. Case studies with three error-tolerant applications show the validity of the proposed approximate designs.

9 citations

References
More filters
StandardDOI
01 Jan 2008

1,354 citations


"Design and Performance Evaluation o..." refers background in this paper

  • ...In a FP multiplier, the mantissa multiplier computes the product of two unsigned fixed-point numbers....

    [...]

Proceedings ArticleDOI
25 Jul 2011
TL;DR: The visibility metric is shown to provide much improved predictions as compared to the original HDR-VDP and VDP metrics, especially for low luminance conditions, and is comparable to or better than for the MS-SSIM, which is considered one of the most successful quality metrics.
Abstract: Visual metrics can play an important role in the evaluation of novel lighting, rendering, and imaging algorithms. Unfortunately, current metrics only work well for narrow intensity ranges, and do not correlate well with experimental data outside these ranges. To address these issues, we propose a visual metric for predicting visibility (discrimination) and quality (mean-opinion-score). The metric is based on a new visual model for all luminance conditions, which has been derived from new contrast sensitivity measurements. The model is calibrated and validated against several contrast discrimination data sets, and image quality databases (LIVE and TID2008). The visibility metric is shown to provide much improved predictions as compared to the original HDR-VDP and VDP metrics, especially for low luminance conditions. The image quality predictions are comparable to or better than for the MS-SSIM, which is considered one of the most successful quality metrics. The code of the proposed metric is available on-line.

691 citations


"Design and Performance Evaluation o..." refers background in this paper

  • ...The High Dynamic Range Visible Difference Predictor (HDRVDP) [19] is a visual metric to evaluate the inexact FP multiplier targeting high dynamic range image processing applications....

    [...]

  • ...The High Dynamic Range Visible Difference Predictor (HDRVDP) [19] is a visual metric to evaluate the inexact FP multiplier targeting high dynamic range image processing applications....

    [...]

Journal ArticleDOI
TL;DR: New metrics are proposed for evaluating the reliability as well as the power efficiency of approximate and probabilistic adders and it is shown that the MED is an effective metric for measuring the implementation accuracy of a multiple-bit adder and that the NED is a nearly invariant metric independent of the size of an adder.
Abstract: Addition is a fundamental function in arithmetic operation; several adder designs have been proposed for implementations in inexact computing. These adders show different operational profiles; some of them are approximate in nature while others rely on probabilistic features of nanoscale circuits. However, there has been a lack of appropriate metrics to evaluate the efficacy of various inexact designs. In this paper, new metrics are proposed for evaluating the reliability as well as the power efficiency of approximate and probabilistic adders. Reliability is analyzed using the so-called sequential probability transition matrices (SPTMs). Error distance (ED) is initially defined as the arithmetic distance between an erroneous output and the correct output for a given input. The mean error distance (MED) and normalized error distance (NED) are then proposed as unified figures that consider the averaging effect of multiple inputs and the normalization of multiple-bit adders. It is shown that the MED is an effective metric for measuring the implementation accuracy of a multiple-bit adder and that the NED is a nearly invariant metric independent of the size of an adder. The MED is, therefore, useful in assessing the effectiveness of an approximate or probabilistic adder implementation, while the NED is useful in characterizing the reliability of a specific design. Since inexact adders are often used for saving power, the product of power and NED is further utilized for evaluating the tradeoffs between power consumption and precision. Although illustrated using adders, the proposed metrics are potentially useful in assessing other arithmetic circuit designs for applications of inexact computing.

453 citations


"Design and Performance Evaluation o..." refers background or methods in this paper

  • ...New metrics for evaluating the reliability and power efficiency of inexact designs have been proposed in [3]....

    [...]

  • ...The NMED is defined as the normalization of the mean error distance (MED) by the maximum output of the accurate design [3]....

    [...]

Journal ArticleDOI
TL;DR: The results show that the proposed designs accomplish significant reductions in power dissipation, delay and transistor count compared to an exact design; moreover, two of the proposed multiplier designs provide excellent capabilities for image multiplication with respect to average normalized error distance and peak signal-to-noise ratio.
Abstract: Inexact (or approximate) computing is an attractive paradigm for digital processing at nanometric scales. Inexact computing is particularly interesting for computer arithmetic designs. This paper deals with the analysis and design of two new approximate 4-2 compressors for utilization in a multiplier. These designs rely on different features of compression, such that imprecision in computation (as measured by the error rate and the so-called normalized error distance) can meet with respect to circuit-based figures of merit of a design (number of transistors, delay and power consumption). Four different schemes for utilizing the proposed approximate compressors are proposed and analyzed for a Dadda multiplier. Extensive simulation results are provided and an application of the approximate multipliers to image processing is presented. The results show that the proposed designs accomplish significant reductions in power dissipation, delay and transistor count compared to an exact design; moreover, two of the proposed multiplier designs provide excellent capabilities for image multiplication with respect to average normalized error distance and peak signal-to-noise ratio (more than 50 dB for the considered image examples).

447 citations


"Design and Performance Evaluation o..." refers background or methods in this paper

  • ...For the approximate FP multipliers, an approximate modified Booth encoding (AMBE) algorithm [4] is used to generate the inexact partial product (PP); an inexact 4-2 compressor [6] is utilized to design the approximate array multiplier; a Wallace tree is used as tree structure....

    [...]

  • ...The inexact 4-2 compressors are the one designated as Design 2 in [6]....

    [...]

  • ...Although inexact fixed-point arithmetic circuits have been extensively studied [5-8] FP arithmetic circuits have not been fully considered for inexact computing....

    [...]

Proceedings ArticleDOI
02 Jan 2011
TL;DR: A novel multiplier architecture with tunable error characteristics, that leverages a modified inaccurate 2x2 building block, that can achieve 2X - 8X better Signal-Noise-Ratio (SNR) for the same power savings when compared to recent voltage over-scaling based power-error tradeoff methods is proposed.
Abstract: We propose a novel multiplier architecture with tunable error characteristics, that leverages a modified inaccurate 2x2 building block. Our inaccurate multipliers achieve an average power saving of 31.78% ? 45.4% over corresponding accurate multiplier designs, for an average error of 1.39%?3.32%. Using image filtering and JPEG compression as sample applications we show that our architecture can achieve 2X - 8X better Signal-Noise-Ratio (SNR) for the same power savings when compared to recent voltage over-scaling based power-error tradeoff methods. We project the multiplier power savings to bigger designs highlighting the fact that the benefits are strongly design dependent. We compare this circuit-centric approach to power quality tradeoffs with a pure software adaptation approach for a JPEG example. We also enhance the design to allow for correct operation of the multiplier using a residual adder, for non error resilient applications.

411 citations


"Design and Performance Evaluation o..." refers background in this paper

  • ...Although inexact fixed-point arithmetic circuits have been extensively studied [5-8] FP arithmetic circuits have not been fully considered for inexact computing....

    [...]