scispace - formally typeset
Open AccessProceedings ArticleDOI

Approximate computing: An emerging paradigm for energy-efficient design

TLDR
This paper reviews recent progress in the area, including design of approximate arithmetic blocks, pertinent error and quality measures, and algorithm-level techniques for approximate computing.
Abstract
Approximate computing has recently emerged as a promising approach to energy-efficient design of digital systems. Approximate computing relies on the ability of many systems and applications to tolerate some loss of quality or optimality in the computed result. By relaxing the need for fully precise or completely deterministic operations, approximate computing techniques allow substantially improved energy efficiency. This paper reviews recent progress in the area, including design of approximate arithmetic blocks, pertinent error and quality measures, and algorithm-level techniques for approximate computing.

read more

Content maybe subject to copyright    Report

Approximate Computing: An Emerging Paradigm For
Energy-Efficient Design
Jie Han
Department of Electrical and Computer Engineering
University of Alberta
Edmonton, AB, Canada
Email: jhan8@ualberta.ca
Michael Orshansky
Department of Electrical and Computer Engineering
University of Texas at Austin
Austin, TX, USA
Email: orshansky@utexas.edu
Abstract— Approximate computing has recently emerged as a
promising approach to energy-efficient design of digital systems.
Approximate computing relies on the ability of many systems and
applications to tolerate some loss of quality or optimality in the
computed result. By relaxing the need for fully precise or
completely deterministic operations, approximate computing
techniques allow substantially improved energy efficiency. This
paper reviews recent progress in the area, including design of
approximate arithmetic blocks, pertinent error and quality
measures, and algorithm-level techniques for approximate
computing.
Keywords—approximate computing, probabilistic computing,
stochastic computation, adder, multiplier, low-energy design
I. IMPRECISION TOLERANCE AND ENERGY REDUCTION
Energy-efficiency has become the paramount concern in
design of computing systems. At the same time, as the
computing systems become increasingly embedded and
mobile, computational tasks include a growing set of
applications that involve media processing (audio, video,
graphics, and image), recognition, and data mining. A
common characteristic of the above class of applications is
that often a perfect result is not necessary and an approximate
or less-than-optimal result is sufficient. It is a familiar feature
of image processing, for example, that a range of image
sharpness/resolution is acceptable. In data mining, simply a
good output of, say, a search result, is hard to distinguish from
the best result. Such applications are imprecision-tolerant.
There may be multiple sources of imprecision-tolerance [1]:
(1) perceptual limitations: these are determined by the ability
of the human brain to ‘fill in’ missing information and filter
out high-frequency patterns; (2) redundant input data: this
redundancy means that an algorithm can be lossy and still be
adequate; and (3) noisy inputs.
The primary purpose of this paper is to review the recent
developments in the area of approximate computing (AC). The
term spans a wide set of research activities ranging from
programming languages [2] to transistor level [3]. The
common underlying thread in these disparate efforts is the
search for solutions that allow computing systems to trade
energy for quality of the computed result. In this paper we
focus on the solutions that involve rethinking of how
hardware needs to be designed. To this end, we start with an
overview of several related computing paradigms and review
some recently proposed approximate arithmetic circuits. Then,
error metrics are introduced and algorithm-level designs are
discussed. Finally, a brief summary is given.
II. O
VERVIEW OF ERROR-RESILIENT PARADIGMS
A. Approximate Computing
Here we distinguish the work on approximate computing
from related but conceptually distinct efforts in
probabilistic/stochastic computing. The distinctive feature of
AC is that it does not involve assumptions on the stochastic
nature of any underlying processes implementing the system.
It does, however, often utilize statistical properties of data and
algorithms to trade quality for energy reduction. Approximate
computing, hence, employs deterministic designs that produce
imprecise results.
B. Stochastic/Probabilistic Computing
Stochastic computing (SC) is a different paradigm that
uses random binary bit streams for computation. SC was first
introduced in the 1960s for logic circuit design [4, 5], but its
origin can be traced back to von Neumann’s seminal work on
probabilistic logic [6]. In SC, real numbers are represented by
random binary bit streams that are usually implemented in
series and in time. Information is carried on the statistics of
the binary streams. von Neumann’s gate multiplexing
technique is a special type of SC, in which redundant binary
signals are implemented in parallel and in space. Both forms
of SC have been the focus of recent studies [7 - 15]. SC offers
advantages such as hardware simplicity and fault tolerance [8].
Its promise in data processing has been shown in several
applications including neural computation [8], stochastic
decoding [9, 10], fault-tolerance and image processing [11],
spectral transforms [12], linear finite state machines [13] and
reliability analysis [14, 15]. The notions of stochastic
computation have been extended to the regime of error-
resilient designs at the system, architecture and application
levels [16, 17, 18]. A recent review on SC is given in [19].
A related body of work has been called probabilistic
computing. This approach proposes exploiting intrinsic
probabilistic behavior of the underlying circuit fabric, most
explicitly, the stochastic behavior of a binary switch under the
influence of thermal noise. Based on this principle, in [20, 21],
probabilistic CMOS (PCMOS) family of circuits is proposed.
An introduction to the philosophy of probabilistic computing
is given in [22].

III. APPROXIMATE ARITHMETIC CIRCUITS
A. Approximate Full Adders
In several approximate implementations, multiple-bit
adders are divided into two modules: the (accurate) upper part
of more significant bits and the (approximate) lower part of
less significant bits. For each lower bit, a single-bit
approximate adder implements a modified, thus inexact
function of the addition. This is often accomplished by
simplifying a full adder design at the circuit level, equivalent
to a process that alters some entries in the truth table of a full
adder at the functional level.
1) Approximate mirror adders (AMAs): A mirror adder
(MA) is a common yet efficient adder design. Five
approximate MAs (AMAs) have been obtained from a logic
reduction at the transistor level, i.e., by removing some
transistors to attain a lower power dissipation and circuit
complexity [3]. A faster charging/discharging of the node
capacitance in an AMA also incurs a shorter delay. Hence, the
AMAs tradeoff accuracy for energy, area and performance.
2) Approximate XOR/XNOR-based adders (AXAs): The
AXAs are based on a 10-transistor adder using XOR/XNOR
gates with multiplexers implemented by pass transistors. The
three AXAs in [23] show attractive operational profiles in
performance, hardware efficiency and power-delay product
(PDP), while with a high accuracy (AXA3 in Fig.1). Although
the use of pass transistors causes a reduced noise margin, the
AXAs are useful when a lower accuracy can be tolerated, with
significant improvements in other design metrics.
Cin
Cin
Cout
Fig. 1. Approximate XNOR-based Adder 3 (AXA3) with 8 transistors [23].
3) Lower-part-OR adder (LOA): In the LOA [24], an OR
gate is used to estimate the sum of each bit at the approximate
lower part, while an AND gate is used to generate the carry-in
for the accurate upper part when both inputs to the most
significant bit adder in the lower part are ‘1.’ The LOA
achieves an approximate but efficient operation by ignoring
most carries in the less-significant lower part of an adder.
B. Multiple-Bit Approximate Adders
Current microprocessors use one of the fast parallel adders
such as the carry look-ahead (CLA). The performance of
parallel adders, however, is bounded by a logarithmic delay,
that is, the critical path delay is asymptotically proportional to
log(n) in an n-bit adder [25, 26]. Sub-logarithmic delays can
however be achieved by the so-called speculative adders.
1) Speculative and variable latency adders: A speculative
adder exploits the fact that the typical carry propagation chain
is significantly shorter than the worst-case carry chain by
using a limited number of previous input bits to calculate the
sum (e.g. look-ahead k bits) [25]. If k is the square root (or
independent) of n, the delay of this adder is reduced to the
order of half of the logarithmic delay (or asymptotic constant).
In [26], this design is treated in more detail as an almost
correct adder (ACA) and developed into a variable latency
speculative adder (VLSA) with error detection and recovery.
2) Error tolerant adders: A series of the so-called error
tolerant adders (ETAs) are proposed in [27-30]. ETAII
truncates the carry propagation chain by dividing the adder
into several sub-adders; its accuracy is improved in ETAIIM
by connecting carry chains in a few most significant sub-
adders [27]. ETAIV further enhances the design by using an
alternating carry select process in the sub-adder chain [29].
3) Speculative carry select and accuracy-configurable
adders: The speculative adder in [31] employs carry chain
truncation and carry select addition as a basis in a reliable
variable latency carry select adder (VLCSA). The accuracy-
configurable adder (ACA) enables an adaptive operation,
either approximate or accurate, configurable at runtime [32].
4) Dithering adder: The result produced by the ETA
adder is a bound on the accurate result. Depending on the
fixed carry-in value, an upper or lower bound can be
produced. That led to the idea of a dithering adder (Fig. 2),
useful in accumulation, in which subsequent additions produce
opposite-direction bounds such that the final result has a
smaller overall error variance (Fig. 3) [33].
Fig. 2. Dithering adder produces alternating uppper or lower bounds on the
accurate sum, resulting in reduced error variance in accumulation [33].
Fig. 3. At the same energy: bounding (left) and dithering (right) adders.

C. Approximate Multipliers
In contrast to the study of adders, the design of
approximate multipliers has received relatively little attention.
In [25, 34] approximate multipliers are considered by using
the speculative adders to compute the sum of partial products;
however, the straightforward application of approximate
adders in a multiplier may not be efficient in terms of trading
off accuracy for savings in energy and area. For an
approximate multiplier, a key design aspect is to reduce the
critical path of adding the partial products. Since
multiplication is usually implemented by a cascaded array of
adders, some less significant bits in the partial products are
simply omitted in [24] and [35] (with some error
compensation mechnisms), and thus some adders can be
removed in the array for a faster operation. In [36], a
simplified 22 multiplier is used as the building block in a
larger multiplier for an efficient computation. An efficient
design using input pre-processing and additional error
compensation is proposed for reducing the critical path delay
in a multiplier [37].
D. Approximate Logic Synthesis
Approximate logic synthesis has been considered for the
design of low-overhead error-detection circuits [38]. In [33],
approximate adders are synthesized for optimizing the quality-
energy tradeoff. For a given function, a two-level synthesis
approach is used in [39] to reduce circuit area for an error rate
threshold. In [40], a multi-level logic minimization algorithm
is developed to simplify the design and minimize the area of
approximate circuits. Automated synthesis of approximate
circuits is recently discussed in [41] for large and complex
circuits under error constraints.
IV. M
ETRICS FOR APPROXIMATE COMPUTING
A. Error Rate/Frequency and Error Significance/Magnitute
In light of the advances in approximate computing,
performance metrics are needed to evaluate the efficacy of
approximate designs. Due to the deterministic nature of
approximate circuits, the traditional metric of reliability,
defined as the probability of system survival, is not appropriate
for use in evaluating the quality of a design. To address this,
several metrics have been used for quantifying errors in
approximate designs. Error rate (ER) is the fraction of
incorrect outputs out of a total number of inputs in an
approximate circuit [42]; it is sometimes referred to as error
frequency [33]. Error significance (ES) refers to the degree of
error severity due to the approximate operation of a circuit
[42]. ES has been considered as the numerical deviation of an
incorrect output from a correct one [39], the Hamming distance
of the two vectors [32], and the maximum error magnitude of
circuit outputs [33]. The product of ER and ES is used in [40]
and [43] as a composite quality metric for approximate designs.
Other common metrics include the relative error, average error
and error distribution.
B. Error Distance for Approximate Adders
Recently, the above metrics have been generalized to a new
figure of merit, error distance (ED), for assessing the quality of
approximate adders [44]. For an approximate design, ED is
defined as the arithmetic distance between an inexact output
and the correct output for a given input. For example, the two
erroneous values ‘01’ and ‘00’ have an ED of 1 and 2 with
respect to the correct number ‘10’. The mean error distance
(MED) (or mean absolute error in [45]) considers the averaging
effect of multiple inputs, while the normalized error distance
(NED) is the normalization of MED for multiple-bit adders.
The MED is useful in measuring the implementation accuracy
of a multiple-bit adder, while the NED is a nearly invariant
metric, that is, independent of the size of an adder, so it is
useful when characterizing the reliability of a specific design.
Moreover, the product of power and NED can be utilized for
evaluating the tradeoff between power consumption and
precision in an approximate design (Fig. 4). To emphasize the
significance of a particular metric (such as the power or
precision), a different measure with more weight on this metric
can be used for a better assessment of a design according to the
specific requirement of an application. These metrics are also
applicable to probabilistic adders such as those in [46, 47, 48],
and provide effective alternatives to an application-specific
metric such as the peak signal-to-noise ratio (PSNR).
Fig. 4. Power and precision tradeoffs as given by the power consumption per
bit and the NED of a full adder design [44]. The product of power per bit and
NED is shown by a dashed curve. A better design with a more efficient power
and precision tradeoff is along the direction pointed by the arrow.
V. A
LGORITHM-LEVEL APPROXIMATE COMPUTING
TECHNIQUES
Significant potential exists in using the techniques of
approximate computing at the algorithm level.
A. Approximate Computing and Incremental Refinement
The notion of approximate signal processing was
developed in [49, 50]. The authors introduce a central concept
of incremental refinement, which is the property of certain
algorithms, such that the iterations of an algorithm can be
terminated earlier to save energy in exchange for
incrementally lower quality. The principle is demonstrated on
the FFT-based maximum-likelihood detection algorithm [49,
50]. It was further shown in [51-54] that several signal
processing algorithms – that include filtering, frequency
domain transforms and classification – can be modified to
exhibit the incremental refinement property and allow
favorable energyquality tradeoffs, i.e. the ones that permit
energy savings in exchange for small quality degradation.

Similar principles are applied in [55] to trading energy and
result optimality in an implementation of a widely used
machine learning algorithm of support vector machines
(SVMs). It is found that the number of support vectors
correlates well with the quality of the algorithm while also
impacting the algorithm’s energy consumption. Reducing the
number of support vectors reduces the number of dot product
computations per classification while the dimensionality of the
support vectors determines the number of multiply-accumulate
operations per dot product. An approximate output can be
computed by ordering the dimensions (features) in terms of
their importance, computing the dot product in that order and
stopping the computation at the proper point.
B. Dynamic Bit-Width Adaptation
For many computing and signal processing applications,
one of the most powerful and easily available knobs for
controlling the energy-quality trade-off is changing the
operand bit-width. Dynamic, run-time adaptation of effective
bit-width is thus an effective tool of approximate computing.
For example, in [56] it is used for dynamic adaptation of
energy costs in the discrete-cosine transform (DCT) algorithm.
By exploiting the properties of the algorithm, namely, the fact
that high-frequency DCT coefficients are typically small after
quantization and do not impact the image quality as much as
the low-frequency coefficients, lower bit-width can be used
for operations on high frequency coefficients. That allows
significant, e.g. 60%, power savings at the cost of only a
slight, 3dB of PSNR, degradation in image quality.
C. Energy Reduction via Voltage Overscaling
In a conventional design methodology, driven by static
timing analysis, timing correctness of all operations is
guaranteed by construction. The design methodology
guarantees that every circuit path regardless of its likelihood
of excitation must meet timing. When V
DD is scaled even
slightly, large timing errors occur and rapidly degrade the
output signal quality. This rapid quality loss under voltage
scaling significantly reduces the potential for energy
reduction. However, because voltage scaling is the most
effective way to reduce digital circuit energy consumption,
many techniques of approximate computing seek ways to
over-scale voltage below a circuit’s safe lower voltage. They
differ in how they deal with the fact that the voltage is not
sufficient to guarantee timing correctness on all paths.
One possible strategy is to introduce correction
mechanisms such that the system is able to tolerate timing
errors induced by voltage-overscaling (VOS). In [57-60], this
approach is developed under the name algorithmic noise
tolerance, specifically targeting DSP-type circuits, such as
filters. The energy reduction is enabled by using lower voltage
on a main computing block and employing a simpler error
correcting block that runs at a higher voltage and is thus,
error-free, to improve the results impacted by timing errors of
the main block. For instance, in [57] the simpler block is a
linear forward predictor that estimates the current sample of
the filter output based on its past samples.
Another class of approaches focuses on modifying the base
implementation of a DSP algorithm to be more VOS-friendly.
This can be done at several levels of design hierarchy. The
principle behind most efforts is to identify computations that
need to be protected and those that can tolerate some errors.
In [61], the idea of identifying hardware building blocks
that demonstrate more graceful degradation under voltage
overs-scaling is pursued. The work studies several commonly-
encountered algorithms used in multimedia, recognition, and
mining to identify their underlying computational kernels as
meta-functions. For each such application, it is found that
there exists a computational kernel where the algorithm
spends up to 95% of its computation time, and therefore
consumes the corresponding amount of energy. The following
meta-functions (computational kernels) were identified: (1) for
the motion estimation, it is the L1-norm, or the sum of
absolute differences computation, (2) for support vector
machine classification algorithm it is the dot product, and (3)
for the mining algorithm of K-means clustering, it is a L1-
norm or L2-norm computation. Importantly, all identified
meta-functions use the accumulator which becomes the first
block to experience time-starvation under voltage over-
scaling. By making the accumulator more VOS-friendly, using
dynamic segmentation and delay budgeting of chained units,
the quality-energy trade-offs are improved for each of the
above meta-functions.
Even in the same block not all computations may be
equally important for the final output. In [62], the
significant/insignificant computations of the sum of absolute
difference algorithm, which is a part of a video motion
estimation block, are identified directly based on their PSNR
impact. The significant computations are then protected under
VOS, by allowing them two clock cycles for completion,
while the insignificant computations are allowed to produce an
occasional error. A delay predictor block is used to predict the
input patterns with a higher probability of launching critical
paths.
It is also crucial to control sources of errors that have the
potential to be spread and amplified within the flow of the
algorithm [63]. For example, the 2-D inverse discrete cosine
transform (IDCT) algorithm has two nearly identical
sequentially executed matrix-multiplication steps. A timing
error in step 1 will generate multiple output errors in the
second step because each element is used in multiple
computations of step 2. Therefore, it is important to prevent
errors in the early steps under scaled V
DD
. This can be
achieved by allocating extra timing margins to critical steps. If
the overall latency for the design needs to remain constant, an
important element of protecting the early algorithm steps is a
re-allocation strategy that shifts timing budgets between steps.
Different strategies are possible for dealing with errors that
result from overscaling. In some designs, the results produced
by blocks subject to timing errors are not directly accepted.
Rather, computation is terminated early and intermediate
results impacted by timing errors are ignored entirely [64, 65].
From the point of view of gate-level design, such techniques
still guarantee timing correctness of all digital operations.
Alternatively, a design may directly accept the results of
erroneous computation, providing, of course, that the
magnitude of error is carefully controlled [63]. This timing
error acceptance strategy gives up on guaranteeing the worst-
case timing correctness but aims to keep global signal quality
from severe degradation. A significant reduction of quality

loss under V
DD
scaling is enabled by reducing the occurrence
of early timing errors with large impact on quality by using
operand statistics and by reducing error by dynamic reordering
of accumulations. The first innovation of this effort is enabling
error control through knowledge of operand statistics. When
V
DD
is scaled down, large magnitude timing errors are very
likely to happen in the addition of small numbers with
opposing signs. Such additions lead to long carry chains and
are the timing-critical paths in the adder. The worst case for
carry propagation occurs in the addition of 1 and 1. In 2’s
complement representation, this operation triggers the longest
possible carry chain and, thus, experiences timing errors first.
In the 2D-IDCT algorithm, the additions that involve small
valued, opposite-sign operands occur in the processing of
high-frequency components. This is because the first 20 low
frequency components contain about 85% or more of the
image energy. The introduced technique uses an adder with a
bit-width smaller than required by other considerations to
process high-frequency small-magnitude operands. Two
objectives are achieved by using such adders: the magnitude
of quality loss is reduced and its onset is delayed. Large-
valued operands, of course, require a regular-width adder. The
second technique is based on a reduction of the cumulative
quality loss resulting from multiple additions, such as
accumulations, which are a key component and optimization
target of many DSP algorithms, and, specifically, of the IDCT.
The key observation is that if positive and negative operands
are accumulated separately and added only in the last step, the
number of error-producing operations is reduced to one last
addition that involves operands with opposite sign. At the
same time, the operands involved in this last addition are
guaranteed to be larger in absolute value than any individual
opposite-sign operands involved in the original sequence of
additions. This guarantees that the reordered accumulation
will result in a smaller quality loss under scaled timing. The
results of using the introduced techniques on two test images
are shown in Fig. 5.
Fig. 5. Upper images are produced by a conventional IDCT with scaled V
DD
.
Techniques of [63] improve image quality for the same scaled V
DD
in the
lower images.
VI. SUMMARY
In this paper, recent progress on approximate computing is
reviewed, with a focus on approximate circuit design, pertinent
error metrics, and algorithm-level techniques. As an emerging
paradigm, approximate computing shows great promise for
implementing energy-efficient and error-tolerant systems.
R
EFERENCES
[1] R. Venkatesan, A. Agarwal, K. Roy, and A. Raghunathan, “MACACO:
Modeling and analysis of circuits for approximate computing,” in Proc.
ICCAD, pp. 667–673, November 2011.
[2] H. Esmaeilzadeh, A. Sampson, L. Ceze and D. Burger, “Architecture
support for disciplined approximate programming,” in Proc. Intl. Conf.
Architectural Support for Programming Languages and Operating
Systems, pp. 301-312, 2012.
[3] V. Gupta, D. Mohapatra, A. Raghunathan and K. Roy, “Low-Power
Digital Signal Processing Using Approximate Adders,” IEEE Trans.
CAD of Integrated Circuits and Systems, 32(1), pp. 124-137, 2013.
[4] W.J. Poppelbaump, C. Afuso and J.W. Esch, “Stochastic computing
elements and systems,” Proc. Fall Joint Comp. Conf., pp. 631-644, 1967.
[5] B.R. Gaines, “Stochastic computing systems,” Advances in Information
Systems Science, vol. 2, pp. 37-172, 1969.
[6] J. von Neumann, “Probabilistic logics and the synthesis of reliable
organisms from unreliable components,” Automata Studies, Shannon
C.E. & McCarthy J., eds., Princeton University Press, pp. 43-98, 1956.
[7] J. Han, J. Gao, Y. Qi, P. Jonker, J.A.B. Fortes. “Toward Hardware-
Redundant, Fault-Tolerant Logic for Nanoelectronics," IEEE Design and
Test of Computers, vol. 22, no. 4, pp. 328-339, July/August 2005.
[8] B. Brown and H. Card, “Stochastic neural computation I: Computational
elements,” IEEE Trans. Computers, vol. 50, pp. 891–905, Sept. 2001.
[9] C. Winstead, V.C. Gaudet, A. Rapley and C.B. Schlegel, “Stochastic
iterative decoders,” Proc. Intl. Symp. Info. Theory, pp. 1116-1120, 2005.
[10] S.S. Tehrani, S. Mannor and W.J. Gross, “Fully parallel stochastic
LDPC decoders,” IEEE Trans. Signal Processing, vol. 56, no. 11, pp.
5692-5703, 2008.
[11] W. Qian, X. Li, M.D. Riedel, K. Bazargan and D.J. Lilja, “An
architecture for fault-tolerant computation with stochastic logic,” IEEE
Trans. Computers, vol. 60, pp. 93–105, Jan. 2011.
[12] A. Alaghi and J.P. Hayes. “A spectral transform approach to stochastic
circuits,” in Proc. ICCD, pp. 315-321, 2012.
[13] P. Li, D. Lilja, W. Qian, M. Riedel and K. Bazargan, “Logical
computation on stochastic bit streams with linear finite state machines."
IEEE Trans. Computers, in press.
[14] J. Han, H. Chen, J. Liang, P. Zhu, Z. Yang and F. Lombardi,A
stochastic computational approach for accurate and efficient reliability
evaluation," IEEE Trans. Computers, in press.
[15] H. Aliee and H.R. Zarandi, “A fast and accurate fault tree analysis based
on stochastic logic implemented on field-programmable gate arrays,"
IEEE Trans. Reliability, vol. 62, pp. 13–22, Mar. 2013.
[16] N. Shanbhag, R. Abdallah, R. Kumar and D. Jones, “Stochastic
computation,” in Proc. DAC, pp. 859-864, 2010.
[17] J. Sartori, J. Sloan and R. Kumar, “Stochastic computing: embracing
errors in architecture and design of processors and applications,” in
Proc. 14th IEEE Intl. Conf. on Compilers, Architectures and Synthesis
for Embedded Systems (CASES), pp. 135-144, 2011.
[18] H. Cho, L. Leem, and S. Mitra, “ERSA: Error resilient system
architecture for probabilistic applications,” IEEE Trans. CAD of
Integrated Circuits and Systems, vol. 31, no. 4, pp. 546-558, 2012.
[19] A. Alaghi and J.P. Hayes, “Survey of stochastic computing,” ACM
Trans. Embedded Computing Systems, 2012.
[20] S. Cheemalavagu, P. Korkmaz, K.V. Palem, B.E.S. Akgul and L.N.
Chakrapani, “A probabilistic CMOS switch and its realization by
exploiting noise,” in Proc. IFIP-VLSI SoC, pp. 452-457, Oct. 2005.

Citations
More filters
Journal ArticleDOI

Approximate Computing: A Survey

TL;DR: This paper presents a survey of state-of-the-art work in all aspects of approximate computing and highlights future research challenges in this field.
Journal ArticleDOI

Mobile Augmented Reality Survey: From Where We Are to Where We Go

TL;DR: This survey discusses advances in tracking and registration, since their functionality is crucial to any MAR application and the network connectivity of the devices that run MAR applications together with its importance to the performance of the application.
Proceedings ArticleDOI

Approximate computing and the quest for computing efficiency

TL;DR: The vision and key principles that have guided the work in approximate computing are described, a holistic cross-layer framework for approximate computing is outlined, and the emergence of workloads such as recognition, mining, search, data analytics, inference and vision are greatly increasing the opportunities for approximation computing.
Proceedings ArticleDOI

DRUM: A Dynamic Range Unbiased Multiplier for Approximate Applications

TL;DR: This paper designs a novel approximate multiplier to have an unbiased error distribution, which leads to lower computational errors in real applications because errors cancel each other out, rather than accumulate, as the multiplier is used repeatedly for a computation.
Proceedings ArticleDOI

A low-power, high-performance approximate multiplier with configurable partial error recovery

TL;DR: It is shown that by utilizing an appropriate error recovery, the proposed approximate multiplier achieves similar processing accuracy as traditional exact multipliers but with significant improvements in power and performance.
References
More filters
Book ChapterDOI

Stochastic Computing Systems

TL;DR: The invention of the stored-program digital computer during the second world war made it possible to replace the lower-level mental processes of man by electronic data-processing in machines, but the authors lack the "steam engine" or "digital computer" which will provide the necessary technology for learning and pattern recognition by machines.
Journal ArticleDOI

Low-Power Digital Signal Processing Using Approximate Adders

TL;DR: This paper proposes logic complexity reduction at the transistor level as an alternative approach to take advantage of the relaxation of numerical accuracy, and demonstrates the utility of these approximate adders in two digital signal processing architectures with specific quality constraints.
Journal ArticleDOI

Survey of Stochastic Computing

TL;DR: Stochastic computing is surveyed from a modern perspective where the small size, error resilience, and probabilistic features of SC may compete successfully with conventional methodologies in certain applications.
Journal ArticleDOI

Stochastic neural computation. I. Computational elements

TL;DR: The primary contribution of this paper is in introducing several state machine-based computational elements for performing sigmoid nonlinearity mappings, linear gain, and exponentiation functions, and describing an efficient method for the generation of, and conversion between, stochastic and deterministic binary signals.
Related Papers (5)
Frequently Asked Questions (15)
Q1. What are the contributions mentioned in the paper "Approximate computing: an emerging paradigm for energy-efficient design" ?

This paper reviews recent progress in the area, including design of approximate arithmetic blocks, pertinent error and quality measures, and algorithm-level techniques for approximate computing. The primary purpose of this paper is to review the recent developments in the area of approximate computing ( AC ). In this paper the authors focus on the solutions that involve rethinking of how hardware needs to be designed. To this end, the authors start with an overview of several related computing paradigms and review some recently proposed approximate arithmetic circuits. Then, error metrics are introduced and algorithm-level designs are discussed. 

Reducing the number of support vectors reduces the number of dot product computations per classification while the dimensionality of the support vectors determines the number of multiply-accumulate operations per dot product. 

The MED is useful in measuring the implementation accuracy of a multiple-bit adder, while the NED is a nearly invariant metric, that is, independent of the size of an adder, so it is useful when characterizing the reliability of a specific design. 

An efficient design using input pre-processing and additional error compensation is proposed for reducing the critical path delay in a multiplier [37]. 

Error rate (ER) is the fraction of incorrect outputs out of a total number of inputs in an approximate circuit [42]; it is sometimes referred to as error frequency [33]. 

Automated synthesis of approximate circuits is recently discussed in [41] for large and complex circuits under error constraints. 

It is also crucial to control sources of errors that have the potential to be spread and amplified within the flow of the algorithm [63]. 

METRICS FOR APPROXIMATE COMPUTINGIn light of the advances in approximate computing, performance metrics are needed to evaluate the efficacy of approximate designs. 

When VDD is scaled down, large magnitude timing errors are very likely to happen in the addition of small numbers with opposing signs. 

Due to the deterministic nature of approximate circuits, the traditional metric of reliability, defined as the probability of system survival, is not appropriate for use in evaluating the quality of a design. 

In this paper, recent progress on approximate computing is reviewed, with a focus on approximate circuit design, pertinent error metrics, and algorithm-level techniques. 

That led to the idea of a dithering adder (Fig. 2), useful in accumulation, in which subsequent additions produce opposite-direction bounds such that the final result has a smaller overall error variance (Fig. 3) [33]. 

These metrics are also applicable to probabilistic adders such as those in [46, 47, 48], and provide effective alternatives to an application-specific metric such as the peak signal-to-noise ratio (PSNR). 

The introduced technique uses an adder with a bit-width smaller than required by other considerations to process high-frequency small-magnitude operands. 

It is found that the number of support vectors correlates well with the quality of the algorithm while also impacting the algorithm’s energy consumption.