scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Trading Accuracy for Power with an Underdesigned Multiplier Architecture

TL;DR: A novel multiplier architecture with tunable error characteristics, that leverages a modified inaccurate 2x2 building block, that can achieve 2X - 8X better Signal-Noise-Ratio (SNR) for the same power savings when compared to recent voltage over-scaling based power-error tradeoff methods is proposed.
Abstract: We propose a novel multiplier architecture with tunable error characteristics, that leverages a modified inaccurate 2x2 building block. Our inaccurate multipliers achieve an average power saving of 31.78% ? 45.4% over corresponding accurate multiplier designs, for an average error of 1.39%?3.32%. Using image filtering and JPEG compression as sample applications we show that our architecture can achieve 2X - 8X better Signal-Noise-Ratio (SNR) for the same power savings when compared to recent voltage over-scaling based power-error tradeoff methods. We project the multiplier power savings to bigger designs highlighting the fact that the benefits are strongly design dependent. We compare this circuit-centric approach to power quality tradeoffs with a pure software adaptation approach for a JPEG example. We also enhance the design to allow for correct operation of the multiplier using a residual adder, for non error resilient applications.
Citations
More filters
Proceedings ArticleDOI
27 May 2013
TL;DR: This paper reviews recent progress in the area, including design of approximate arithmetic blocks, pertinent error and quality measures, and algorithm-level techniques for approximate computing.
Abstract: Approximate computing has recently emerged as a promising approach to energy-efficient design of digital systems. Approximate computing relies on the ability of many systems and applications to tolerate some loss of quality or optimality in the computed result. By relaxing the need for fully precise or completely deterministic operations, approximate computing techniques allow substantially improved energy efficiency. This paper reviews recent progress in the area, including design of approximate arithmetic blocks, pertinent error and quality measures, and algorithm-level techniques for approximate computing.

921 citations

Journal ArticleDOI
TL;DR: A survey of techniques for approximate computing (AC), which discusses strategies for finding approximable program portions and monitoring output quality, techniques for using AC in different processing units, processor components, memory technologies, and so forth, as well as programming frameworks for AC.
Abstract: Approximate computing trades off computation quality with effort expended, and as rising performance demands confront plateauing resource budgets, approximate computing has become not merely attractive, but even imperative. In this article, we present a survey of techniques for approximate computing (AC). We discuss strategies for finding approximable program portions and monitoring output quality, techniques for using AC in different processing units (e.g., CPU, GPU, and FPGA), processor components, memory technologies, and so forth, as well as programming frameworks for AC. We classify these techniques based on several key characteristics to emphasize their similarities and differences. The aim of this article is to provide insights to researchers into working of AC techniques and inspire more efforts in this area to make AC the mainstream computing approach in future systems.

890 citations

Journal ArticleDOI
TL;DR: This paper proposes logic complexity reduction at the transistor level as an alternative approach to take advantage of the relaxation of numerical accuracy, and demonstrates the utility of these approximate adders in two digital signal processing architectures with specific quality constraints.
Abstract: Low power is an imperative requirement for portable multimedia devices employing various signal processing algorithms and architectures. In most multimedia applications, human beings can gather useful information from slightly erroneous outputs. Therefore, we do not need to produce exactly correct numerical outputs. Previous research in this context exploits error resiliency primarily through voltage overscaling, utilizing algorithmic and architectural techniques to mitigate the resulting errors. In this paper, we propose logic complexity reduction at the transistor level as an alternative approach to take advantage of the relaxation of numerical accuracy. We demonstrate this concept by proposing various imprecise or approximate full adder cells with reduced complexity at the transistor level, and utilize them to design approximate multi-bit adders. In addition to the inherent reduction in switched capacitance, our techniques result in significantly shorter critical paths, enabling voltage scaling. We design architectures for video and image compression algorithms using the proposed approximate arithmetic units and evaluate them to demonstrate the efficacy of our approach. We also derive simple mathematical models for error and power consumption of these approximate adders. Furthermore, we demonstrate the utility of these approximate adders in two digital signal processing architectures (discrete cosine transform and finite impulse response filter) with specific quality constraints. Simulation results indicate up to 69% power savings using the proposed approximate adders, when compared to existing implementations using accurate adders.

637 citations

Proceedings ArticleDOI
29 May 2013
TL;DR: This work analysis and characterization of inherent application resilience present in a suite of 12 widely used applications from the domains of recognition, data mining, and search and proposes a systematic framework for Application Resilience Characterization (ARC), which characterizes the resilient parts using approximation models that abstract a wide range of approximate computing techniques.
Abstract: Approximate computing is an emerging design paradigm that enables highly efficient hardware and software implementations by exploiting the inherent resilience of applications to in-exactness in their computations. Previous work in this area has demonstrated the potential for significant energy and performance improvements, but largely consists of ad hoc techniques that have been applied to a small number of applications. Taking approximate computing closer to mainstream adoption requires (i) a deeper understanding of inherent application resilience across a broader range of applications (ii) tools that can quantitatively establish the inherent resilience of an application, and (iii) methods to quickly assess the potential of various approximate computing techniques for a given application. We make two key contributions in this direction. Our primary contribution is the analysis and characterization of inherent application resilience present in a suite of 12 widely used applications from the domains of recognition, data mining, and search. Based on this analysis, we present several new insights into the nature of resilience and its relationship to various key application characteristics. To facilitate our analysis, we propose a systematic framework for Application Resilience Characterization (ARC) that (a) partitions an application into resilient and sensitive parts and (b) characterizes the resilient parts using approximation models that abstract a wide range of approximate computing techniques. We believe that the key insights that we present can help shape further research in the area of approximate computing, while automatic resilience characterization frameworks such as ARC can greatly aid designers in the adoption approximate computing.

464 citations

Journal ArticleDOI
TL;DR: New metrics are proposed for evaluating the reliability as well as the power efficiency of approximate and probabilistic adders and it is shown that the MED is an effective metric for measuring the implementation accuracy of a multiple-bit adder and that the NED is a nearly invariant metric independent of the size of an adder.
Abstract: Addition is a fundamental function in arithmetic operation; several adder designs have been proposed for implementations in inexact computing. These adders show different operational profiles; some of them are approximate in nature while others rely on probabilistic features of nanoscale circuits. However, there has been a lack of appropriate metrics to evaluate the efficacy of various inexact designs. In this paper, new metrics are proposed for evaluating the reliability as well as the power efficiency of approximate and probabilistic adders. Reliability is analyzed using the so-called sequential probability transition matrices (SPTMs). Error distance (ED) is initially defined as the arithmetic distance between an erroneous output and the correct output for a given input. The mean error distance (MED) and normalized error distance (NED) are then proposed as unified figures that consider the averaging effect of multiple inputs and the normalization of multiple-bit adders. It is shown that the MED is an effective metric for measuring the implementation accuracy of a multiple-bit adder and that the NED is a nearly invariant metric independent of the size of an adder. The MED is, therefore, useful in assessing the effectiveness of an approximate or probabilistic adder implementation, while the NED is useful in characterizing the reliability of a specific design. Since inexact adders are often used for saving power, the product of power and NED is further utilized for evaluating the tradeoffs between power consumption and precision. Although illustrated using adders, the proposed metrics are potentially useful in assessing other arithmetic circuit designs for applications of inexact computing.

453 citations


Cites background from "Trading Accuracy for Power with an ..."

  • ...Although not discussed and beyond the scope of this paper, the proposed metrics may also be useful in assessing other arithmetic circuits [20] for inexact computing and/or fault-tolerant designs [21] in nanocomputing applications....

    [...]

References
More filters
Book
01 Jan 1993
TL;DR: The principles of the algorithms available for performing arithmetic operations in digital computers, described independently of specific implementation technology and within the same framework, are explained.
Abstract: This text explains the fundamental principles of algorithms available for performing arithmetic operations on digital computers. These include basic arithmetic operations like addition, subtraction, multiplication, and division in fixed-point and floating-point number systems as well as more complex operations such as square root extraction and evaluation of exponential, logarithmic, and trigonometric functions. The algorithms described are independent of the particular technology employed for their implementation.

1,174 citations


"Trading Accuracy for Power with an ..." refers background or methods in this paper

  • ...Larger multipliers are built by using the inaccurate 2x2 block to produce partial products and then adding the shifted partial products [16]....

    [...]

  • ...This can sometimes be restrictive for optimization of the adder tree [16]....

    [...]

Proceedings ArticleDOI
17 Aug 1999
TL;DR: A framework for low-energy digital signal processing (DSP) where the supply voltage is scaled beyond the critical voltage required to match the critical path delay to the throughput is proposed and a prediction based error-control scheme is proposed to enhance the performance of the filtering algorithm in presence of errors due to soft computations.
Abstract: In this paper, we propose a framework for low-energy digital signal processing (DSP) where the supply voltage is scaled beyond the critical voltage required to match the critical path delay to the throughput. This deliberate introduction of input-dependent errors leads to degradation in the algorithmic performance, which is compensated for via algorithmic noise-tolerance (ANT) schemes. The resulting setup comprised of the DSP architecture operating at sub-critical voltage and the error control scheme is referred to as soft DSP. It is shown that technology scaling renders the proposed scheme more effective as the delay penalty suffered due to voltage scaling reduces due to short channel effects. The effectiveness of the proposed scheme is also enhanced when arithmetic units with a higher "delay-imbalance" are employed. A prediction based error-control scheme is proposed to enhance the performance of the filtering algorithm in presence of errors due to soft computations. For a frequency selective filter, it is shown that the proposed scheme provides 60%-81% reduction in energy dissipation for filter bandwidths up to 0.5 /spl pi/ (where 2 /spl pi/ corresponds to the sampling frequency f/sub s/) over that achieved via conventional voltage scaling, with a maximum of 0.5 dB degradation in the output signal-to-noise ratio (SNR/sub o/). It is also shown that the proposed algorithmic noise-tolerance schemes can be used to improve the performance of DSP algorithms in presence of bit-error rates of up to 10/sup -3/ due to deep submicron (DSM) noise.

211 citations


"Trading Accuracy for Power with an ..." refers background in this paper

  • ...8 (e) and 8 (d) show that our approach results in 2X - 8X better SNR when compared to simple voltage over-scaling [6]....

    [...]

  • ...In [6], the authors scale the voltage below the minimum voltage supply value needed, so as to trade accuracy for power....

    [...]

Journal ArticleDOI
Shih-Lien Lu1
TL;DR: Approximation circuits can increase clock frequency by reducing the number of cycles a function requires by implementing the complete logic function using rough calculations to predict results.
Abstract: Current microprocessors employ a global timing reference to synchronize data transfer. A synchronous system must know the maximum time needed to compute a function, but a circuit usually finishes computation earlier than the worst-case delay. The system nevertheless waits for the maximum time bound to guarantee a correct result. As a first step in achieving variable pipeline delays based on data values, approximation circuits can increase clock frequency by reducing the number of cycles a function requires. Instead of implementing the complete logic function, a simplified circuit mimics it using rough calculations to predict results. The results are correct most of the time, and simulations show improvements in overall performance in spite of the overhead needed to recover from mistakes.

210 citations


"Trading Accuracy for Power with an ..." refers background in this paper

  • ...Though, most existing work [5], [7], [8] introduces errors by over-scaling voltage supplies, there has been some work in the direction of introducing error into a system via manipulation of its logic-function, for adders [9]–[11] as well as combinational logic [12]....

    [...]

Journal ArticleDOI
TL;DR: The main result in this paper establishes the energy savings derived by using probabilistic AND as well as NOT gates constructed from an idealized switch that produces a Probabilistic bit (PBIT).
Abstract: The main result in this paper establishes the energy savings derived by using probabilistic AND as well as NOT gates constructed from an idealized switch that produces a probabilistic bit (PBIT). A probabilistic switch produces the desired value as an output that is 0 or 1 with probability p, represented as a PBIT, and, hence, can produce the wrong output value with a probability of (1-p). In contrast with a probabilistic switch, a conventional deterministic switch produces a BIT whose value is always correct. Our switch-based gate constructions are a particular case of a systematic methodology developed for building energy-aware networks for computing, using PBITS. Interesting examples of such networks include AND, OR, and NOT gates (or, as functions, Boolean conjunction, disjunction, and negation, respectively). To quantify the energy savings, novel measures of "technology independent" energy complexity are also introduced - these measures parallel conventional machine-independent notions of computational complexity such as the algorithm's running time and space. Networks of switches can be related to Turing machines and to Boolean circuits, both of which are widely known and well-understood models of computation. Our gate and network constructions lend substance to the following thesis (established for the first time by K.V. Palem): the mathematical technique referred to as randomization yielding probabilistic algorithms results in energy savings through a physical interpretation based on statistical thermodynamics and, hence, can serve as a basis for energy-aware computing. While the estimates of the energy saved through PBIT-based probabilistic computing switches and networks developed rely on the constructs and thermodynamic models due to Boltzmann, Gibbs, and Planck, this work has also led to the innovation of probabilistic CMOS-based devices and computing frameworks. Thus, for completeness, the relationship between the physical models on which this work is based and the electrical domain of CMOS-based switching is discussed.

194 citations


"Trading Accuracy for Power with an ..." refers background in this paper

  • ...The authors in [7] improved on this by using the observation that errors in bits of higher value affect the quality of the solution more as compared to lower bits, hence they operate adders at more significant bits with a higher voltage and over-scale the voltages for lower bits....

    [...]

  • ...Though, most existing work [5], [7], [8] introduces errors by over-scaling voltage supplies, there has been some work in the direction of introducing error into a system via manipulation of its logic-function, for adders [9]–[11] as well as combinational logic [12]....

    [...]

Journal ArticleDOI
TL;DR: This paper explores ways of reducing FP power consumption by minimizing the bitwidth representation of FP data by showing that up to 66% reduction in multiplier energy/operation can be achieved in the FP unit by this bitwidth reduction technique without sacrificing any program accuracy.
Abstract: Low-power systems often find the power cost of floating-point (FP) hardware prohibitively expensive. This paper explores ways of reducing FP power consumption by minimizing the bitwidth representation of FP data. Analysis of several FP programs that manipulate low-resolution human sensory data shows that these programs suffer no loss of accuracy even with a significant reduction in bitwidth. Most FP programs in our benchmark suite maintain the same output even when the mantissa bitwidth is reduced by half. This FP bitwidth reduction can deliver a significant power saving through the use of a variable bitwidth FP unit. Our results show that up to 66% reduction in multiplier energy/operation can be achieved in the FP unit by this bitwidth reduction technique without sacrificing any program accuracy.

168 citations


"Trading Accuracy for Power with an ..." refers background in this paper

  • ...[14] reports power savings of up to 66% without affecting accuracy of programs that manipulate low resolution data, by reducing the bitwidth of floating point multipliers....

    [...]