scispace - formally typeset
Open AccessProceedings ArticleDOI

Accuracy-configurable adder for approximate arithmetic designs

TLDR
This paper proposes an accuracy-configurable approximate adder for which the accuracy of results is configurable during runtime, and can be used in accuracy- configurable applications, and improves the achievable tradeoff between performance/power and quality.
Abstract
Approximation can increase performance or reduce power consumption with a simplified or inaccurate circuit in application contexts where strict requirements are relaxed. For applications related to human senses, approximate arithmetic can be used to generate sufficient results rather than absolutely accurate results. Approximate design exploits a tradeoff of accuracy in computation versus performance and power. However, required accuracy varies according to applications, and 100% accurate results are still required in some situations. In this paper, we propose an accuracy-configurable approximate (ACA) adder for which the accuracy of results is configurable during runtime. Because of its configurability, the ACA adder can adaptively operate in both approximate (inaccurate) mode and accurate mode. The proposed adder can achieve significant throughput improvement and total power reduction over conventional adder designs. It can be used in accuracy-configurable applications, and improves the achievable tradeoff between performance/power and quality. The ACA adder achieves approximately 30% power reduction versus the conventional pipelined adder at the relaxed accuracy requirement.

read more

Content maybe subject to copyright    Report

Accuracy-Config urable Adder for Approx i mate Arithmetic Designs
Andrew B. Kahng
and Seokhyeong Kang
ECE and
CSE D epartments, University of California at San Diego
abk@cs.ucsd.edu, shkang@vlsicad.ucsd. edu
ABSTRACT
Approximation can increase performance or reduce power consump-
tion with a simplified or inaccurate circuit in application contexts
where strict requirements are relaxed. For applications related t o
human senses, approximate arithmetic can be used to generate suf-
ficient results r at her than absolutely accurate r esults. Approximate
design exploits a tradeoff of accuracy in computation versus per-
formance and power. However, required accuracy varies according
to applications, and 100% accurate results are still required in some
situations. In this paper, we propose an accuracy-configurable ap-
proximate (ACA) adder for which the accuracy of r esults is con-
figurable during runtime. Because of its configurability, the ACA
adder can adaptively operate in both approximate (inaccurate) mode
and accurate mode. The proposed adder can achieve significant
throughput improvement and total power reduction over conven-
tional adder designs. It can be used in accuracy-configurable ap-
plications, and improves the achievable tr adeoff between perfor-
mance/power and quality. T he ACA adder achieves approximately
30% power reduction versus the conventional pipelined adder at the
relaxed accuracy requirement.
Categories and Subject Descriptors
B.7.2 [Hardware]: INTEGR ATED CIRCUITS—Design Aids; J.6
[Computer Applications]: COMPUTER-AIDED ENGINEERING
General Terms
Algorithms, Design, Performance
Keywords
Approximate Arithmetic, Error-Tolerance, Power Minimization,
Accuracy-Configurable Adder
1. INTRODUCTION
Guardbands for dynamic variations severely limit performance
and energy efficiency of conventional IC designs. To overcome
consequences of overdesign, several recent mechanisms for vari-
ation-resilient design [4] allow timing errors and manage design
reliability dynamically. Relaxing the r equirement of correctness for
designs may dramatically reduce costs of manufacturing, verifica-
tion and test [16]. In resilient designs, errors can be corrected with
redundancy techniques (error-tolerance), or accepted in some ap-
plications relating to human senses such as hearing and sight (error-
acceptance). In the error-acceptance regime, approximation via a
simplified or inaccurate circuit can increase performance and/or re-
duce power consumption.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
DAC 2012, June 3-7, 2012, San Francisco, California, USA.
Copyright 2012 ACM ACM 978-1-4503-1199-1/12/06 ...$10.00.
Various approximate arithmetic designs have been previously
proposed. Lu [7] introduces a faster adder which has shorter carry
chains and considers only t he previous k bits of input i n computing
a carry bit. Verma et al. [12] provide a variable latency specula-
tive adder ( V LSA), which is a reliable version of the Lu adder [7]
with error detection and correction. Shin et al. [10] also propose
a data path redesign technique for various adders which cuts the
critical path in the carry chain. Zhu et al . [14] [13] propose three
approximate adders ET AI, ET AII and ET AIIM. ETAI is
divided into an accurate part and an inaccurate part to achieve ap-
proximate results. ETAII cuts carry propagation to speed up the
adder, and ETAIIM modifies ETAII by connecting carry chains
in accurate MSB parts. Kulkarni et al. [5] present a 2x2 under-
designed multiplier, and use it to build large power-efficient ap-
proximate multipliers. George et al. [3] define the concept of prob-
abilistic CMOS (PCMOS), and implement efficient arithmetic us-
ing P CMOS. Shin et al. [11] propose a logic synthesis approach
to design an approximate circuit.
The approximate designs produce al most-correct results at the
given required accuracy, and obtain power reductions or perfor-
mance improvements in return. In some applications, however,
more accurate or totally accurate r esults are required under cer-
tain conditions e.g., image processing in security cameras would
require cleaner images after detecting a motion. In contexts where
the required accuracy changes during runtime, the accuracy of re-
sults should be configurable to maximize the benefit of approximate
operations. Figure 1 illustrates how power benefits can be achieved
with an accuracy-configurable design. The accuracy-configurable
design can adapt to changing accuracy constraints by using differ-
ent modes in each situation. To our knowledge, no previous work
can configure the output accuracy during runtime, and each is thus
restricted (or, best-suited) to particular application contexts. In con-
texts where the accuracy requirement can change dynamically, the
previous methods’ benefits from the accuracy tradeoff are reduced
since the implementation must be targeted to the maximum accu-
racy requirement.
time
normalized power
1.0
required accuracy
80% 100% 90%
80%
accurate
design
accuracy
configurable
design
event occurred
accurate mode
approximate mode
Figure 1: Power benefits from acc uracy-configurable design.
I
n this paper, we propose an accuracy-configurable approximate
(ACA) adder, which can configure the accuracy of results during
runtime. The main contributions of our work are the following.
820
35.1

The proposed ACA adder has runtime-configurable accuracy
to better enable tradeoff of accuracy in computation versus
performance and power.
We provide quantitative metrics for an approximate arith-
metic design. We compare the ACA adder to previous ap-
proximate adders based on these metrics.
We demonstrate the power benefits of the ACA adder over
previous approximate and conventional adder designs for ac-
curacy-configurable applications.
The rest of the paper is organized as follows. Section 2 presents
the proposed ACA adder design. Section 3 provides experimen-
tal results and analysis. Section 4 summarizes and concludes the
paper.
2. ACCURACY-CONFIGURABLE ADDER
2.1 Approximate A dder Implementation
A[15:0]
8-bit
adder
8-bit
adder
8-bit
adder
SUM[16]
SUM
SUM[3:0]
SUM[15:12]
A
H
+B
H
A
M
+B
M
A
L
+B
L
SUM[7:4]
SUM[11:8]
carry
A
H
=A[15:8],
A
M
=A[11:4],
A
L
=A[7:0]
B[15:0]
A[0]
A[15]
SUM
H
SUM
M
SUM
L
Figure 2: Proposed approximate adder 16-bit adder case.
Previous approximate adders [7] [10] [14] have difficulty detect-
ing and correcting errors since they are designed for error-accept-
able applications with a target accuracy. However, accurate com-
putations are still required at certain ti mes, according to the appli-
cation. VLSA [12] can provide accurate results, but has large delay
and area overhead for the error detection and correction. The cen-
tral contribution of our present work is to propose an approximate
adder which supports both accurate and inaccurate computation
with error-correction and accuracy-configuration capability. Figure
2 shows our proposed approximate circuit for the case of a 16-bit
adder. In the adder, the carry chain is cut to reduce cri tical-path
delay, and three sub-adders generate results of partial summations.
With the reduced critical-path delay, high performance (by increas-
ing the clock frequency) or low power consumption (by decreasing
the operating voltage) is obtained. A middle sub-adder ( A
M
+B
M
)
is introduced to increase accuracy. Without the middle sub-adder
(as in ETAII [13]), error occurs when the eighth carry bit is high,
and for random input patterns the error rate is 50.1%. On the
other hand, with the introduction of the middle sub-adder, error rate
for random input patterns is reduced to 5.5%. (In the real imple-
mentation, all redundant parts (four-LSB output of A
H
+ B
H
and
A
M
+ B
M
sub-adders) are optimized only for carry-generation.)
A [N-1:N-k]
B [N-1:N-k]
A [N-k-1:N-2k]
B [N-k-1:N-2k]
A [N-2k-1:N-3k]
B [N-2k-1:N-3k]
SUM [N-1:N-k] SUM [N-k-1:N-2k]
A [N-2k-1:N-3k]
B [N-2k-1:N-3k]
SUM [N-2k-1:N-3k]carry
k N: bit width, k: ½ carry-chain depth
Figure 3: General implementation for the proposed adder.
We can generalize the implementation of the proposed approxi-
mate adder. Figure 3 shows the general implementation of an N-bit
adder with a parameter k, which is the bit-width of the sub-adder
result. In the adder, each divided sub-module produces a k-bit re-
sult except for the last sub-module, which produces a 2k-bit result.
The approximate adder thus consists of the (N/k 1) sub-modules
as described in Equation (1).
SU M[N ik 1 : N (i + 1)k] =
A[N ik 1 : N (i + 2)k] +
B[N ik 1 : N (i + 2)k],
where i = 0, ..., N/k 2 (1)
In modern adder designs, such as carry-lookahead (CLA), carry-
select and Kogge-Stone adders, the path depth and area are asymp-
totically proportional to log
2
N and N log
2
N respectively, where
N is the bit-width of the adder [15]. Based on this, we can ex-
press delay, area and power consumption of the proposed adder in
terms of t he parameters N and k. The proposed ACA adder has
(N/k 1) sub-adders, each of which is a 2k-bit adder. Therefore,
delay of the critical path can be expressed with Equation (2) and
area can be estimated with Equation (3), where C
delay
and C
area
are constants for delay and area, respectively.
delay = C
delay
(log
2
k + 1) (2)
area = C
area
(N 2k)(log
2
k + 1) (3)
P ower
dyn
= C
power
(N 2k)(log
2
k + 1)
2
(4)
Power consumption of the ACA adder can be roughly estimated
as follows. Dynamic power consumption with voltage scaling at
a fixed frequency is proportional to capacitance · V
dd
2
, where
the capacitance is proportional to the area. Cell delay is pro-
portional to 1/(V
dd
V
t
)
β
, and V
2
dd
is roughly proportional t o
1/(cell delay) if we assume that β is 2. Since (cell delay) ×
(path depth) is constant at a fixed frequency, V
2
dd
is proportional to
the path depth, which is log
2
k + 1. Consequently, dynamic power
with voltage scaling can be expressed using Equation (4), where
C
power
is a constant fixed for given V
dd
for dynamic power con-
sumption. Static power consumption of the adder can be roughly
estimated as proportional to the area in Equation (3).
In our proposed adder design, the output of each sub-adder (ex-
cept the last sub-adder) is incorrect when a carry input should be
propagated to the results. In Figure 2, when the carry[4] (carry
bit from A
L
+ B
L
) is ‘1’ and SUM
M
[3 : 0] is 1111
(2)
, the
output result has an error in S U M[11 : 8]. In the general im-
plementation, the output result will be correct when there are no
errors in all (N/k 1) sub-adders. In the i
th
sub-adder, errors
occur when (1) the LSB part of the result (SUM
i
[k 1 : 0])
has all ‘1’ values (probability P =
1
2
k
) and (2) the LSB part
([k 1 : 0]) of the (i + 1)
th
sub-adder produces a carry bit (prob-
ability P =
1
4
+
1
2
·
1
4
+
1
2
·
1
2
·
1
4
+ ...). Therefore, with a random
input vector, the probability of having a correct result in the pro-
posed adder is
P (N, k) = (1
1
2
k
·
2
k
1
2
k+1
)
N
k
2
(5)
Table 1 shows the estimated results of 16-bit ACA adders with
different parameter values k. With smaller k value, the minimum
clock period and dynamic power can be reduced, but the pass rate
(probability of having a correct result) will be decreased. The esti-
mations come from Equations (2), (3), (4) and (5). In Section 3.3
below, we validate the above estimation with real implementations.
Table 1: Estimated minimum clock cycle, area, dynamic power and pass rate for
each k value when N = 16 (normalized to the conventional CLA 16-bit adder).
k=2 k=3 k=4 k=5 k=6
min. clock period 0.5 0.65 0.75 0.83 0.89
area 0.87 1.05 1.12 1.15 1.12
dynamic power 0.44 0.68 0.84 0.95 1.00
pass rate 0.554 0.829 0.942 0.982 0.995
821
35.1

2.2 Error Detection and Correction for Accurate
Computation
As described in Section 2.1, our proposed adder is incorrect when
a carry bit is propagated between sub-adders. However, the error
can be detected and corrected with a small overhead. We detect an
error for each sub-adder by checking the output of the sub-adder
and the carry-in signal that comes from the previous sub-adder. Er-
ror detection can be implemented with several and gates. To cor-
rect the error, ‘1’ should be added to the approximate (inaccurate)
output, and the error correction can be implemented with an incre-
mentor circuit.
SUM
approx
OUTIN
sub-adder
i
sub-adder
i+1
approximate adder
SUM
correct
carry
i+1
error
EDC circuit
data stall
sum
i
error
i
incrementor
Figure 4: Error detection and cor rection with the approximate adder.
With these simple error detection and correction circuits, our
proposed adder can be implemented to have variable latency like
the previous VL SA adder [12], with a small overhead for an er-
ror detection and correction (EDC) system. Figure 4 shows an
EDC system with our proposed adder. The error detection cir-
cuit (‘and gates) checks the carry propagation and generates an
error signal. The error correction (incrementor) circuit produces
an error-free output by adding compensation data, and requires an
additional clock cycle. When errors are detected f rom input pat-
terns, the e rror signal is activated. The error signal holds the input
pattern during the error correction and chooses the error-corrected
value (SU M
correct
) as an output. With this approach, our approxi-
mate adder can provide accurate results at a higher clock frequency
than that of conventional adders (e.g., CLA). According to the esti-
mated results in Table 1, clock period can be reduced by 25% with
6% (= error rate) recovery-cycle overhead (16-bit ACA, k = 4).
2.3 Accuracy Configuration with Pipelined Archi-
tecture
When our proposed adder is combined with a pipelined architec-
ture, we can obtain accurate results with the same throughput as a
conventional adder. In the pipelined architecture, approximate ad-
ditions are computed at the rst pipeline stage, and error correction
can be completed at the second stage. F igure 5 shows the conven-
tional pipelined adder (above) and the approximate adder (below).
The pipelined implementation of approximate adder has a struc-
tural analogy with the pipelined adder of the 2006 U.S. patent [8] in
which partial summations are performed at the rst stage and carry
bits are added at the later stages. However, the patent is clearly
directed to accurate operations, not approximate computations. In
addition, we use our approximate adder (Figure 3) in t he rst stage.
In the pipelined approach, there is no improvement of the clock fre-
quency since the achievable clock period is the same as that of the
conventional adder. However, power benefits are obtained through
configuration of accuracy: in the approximate mode, the error cor-
rection stage is power-gated with foot (or, head) switches in Figure
5, and power reduction over the conventional adder design can be
achieved. We compare t he conventional and approximate pipelined
adders in Section 3.
In the proposed adder implementation, to achieve higher perfor-
mance or lower power consumption, we can reduce the carry chain
depth (k) of sub-adders (see Table 1). However, w hen k is less than
N/4, it is impossible to correct all errors and achieve 100% cor-
rect results within one clock cycle since the error-correction paths
become critical. To achieve correct results in the pipelined imple-
mentation, the error-correction stage should be extended to mul-
tiple stages. Figure 6 shows the pipelined adder implementation
(k = N/8 case), in which four pipeline stages are required to
achieve a 100% accurate result. In the pipelined adder, each stage
generates a result with different accuracy; t he output accuracy in-
creases as the number of pipeline stages increases. According to
the accuracy requirement, we can turn off the l at er stages with a
power gating technique, and we can reduce the power consumption
further with the accuracy tr adeoff.
Since the proposed adder supports both approximate and accu-
rate results, it can be used in applications that require accurate re-
sults only under certain conditions. Conventional accurate designs
are energy-inefficient in the error-acceptable application context,
because they always compute the exact function. Previous approx-
imate designs cannot handle a varying accuracy requirement, and
this limits the benefit of the accuracy tradeoff: as noted above, the
approximate function must meet the maximum accuracy threshold
across all applications. Moreover, if the application requests an ex-
act computation, additional accurate circuits must be added to the
previous approximate designs. By contrast, t he ACA design effi-
ciently exploits a tr adeoff between accuracy and power/performance
with its runtime accuracy configurability.
approximate adder
A
B
SUM
approx
error
SUM
correct
error correction
N/2-bit adder
A
L
B
L
Stage 1
SUM
L
Stage 2
carry
N/2-bit adder
A
H
B
H
SUM
H
accurate
mode
power gating
switches
Figure 5: Pipelined adder implementation conventional adder (above) and ap-
proximate adder (below). In approximate operation, the error correction stage is
power-gated.
3. EXPERIMENTAL SETUP AND RESULTS
3.1 Experimental Setup
To test approximate designs, we have written each design in Ver-
ilog and synthesized it to a TSMC 65GP cell library with Synopsys
DesignCompiler [17]. We then perform gate-level simulati ons us-
ing Cadence NC-Sim [18]. In the simulation, gate delay is taken
from an SDF (standard delay format) file. For voltage scaling ex-
periments, we prepare Synopsys Liberty (.lib) les for each voltage
from 1.00V to 0.60V in 0.01V increments, using Cadence Library
Characterizer v9.1 [19]. The prepared libraries are used f or SDF
file generation and power estimation at each voltage. Each simula-
tion is performed with input patterns for one million cycles. During
the simulation, each output value is compared with a reference (cor-
rect) value to produce the accuracy metrics. For the input patterns,
we use random data, as well as actual data fr om SPEC 2006 [ 20]
benchmarks. We extract operand data from ADD instructions in
the SPEC benchmarks.
3.2 Metric for Approximate Design
To quantify errors in approximate designs, two metrics have been
previously proposed [1]. Error rate (ER) is the percentage of cy-
cles in which output value is different from the correct value. Error
significance (ES) is the numerical difference between correct and
output results; this quantifies the amount of error. In image/video
applications, [2] uses the product of ES and ER as a metric of
error tolerance. [10] introduces a criterion for acceptability: ES
× ER acceptance threshold, where the acceptance threshold is
specified according to the application. For the error significance
(ES) metric, [14] considers only amplitude of error. This is use-
ful for many digital signal processing (DSP ) systems that process,
e.g., sound and image data. However, in communication systems
that mainly handle information data, the number of incorrect bits
822
35.1

approximate
adder
A
B
Stage 1 Stage 2
errors
on S1
SUM
correct
correction on S1
S3 S2 S1 S0
SUM
approximate
correct
S3 S2 S1 S0
approximate
correct
Stage 3
correction on S2
Stage 4
correction on S3
S3 S2 S1 S0
correctapprox.
S3 S2 S1 S0
correct
errors
on S2
errors
on S3
Figure 6: Accuracy-configurable implementation for pipelined adder.
(Hamming distance) is a more meaningful metric for accuracy
e.g. a (32,28) Reed-Solomon code can correct up to 2-byte errors.
This consideration for the ES metric is required when approximate
arithmetic is applied t o error-tolerant systems with a redundancy
technique.
Table 2 shows two accuracy metrics for amplitude data and in-
formation data. ACC
amp
used in [14] quantifies the amplitude of
errors, where R
c
and R
e
are the correct and obtained results, re-
spectively. We propose another accuracy metric, ACC
inf
, which
measures error significance as Hamming distance, where B
e
is the
number of error bits and B
w
is the bit-width of the data. For ex-
ample, when t he correct (reference) data is 1000_0000
(2)
and the
result data is 1100_0000
(2)
, accuracy with A CC
amp
and ACC
inf
will be
1
2
and
7
8
, respectively. To evaluate the approximate cir-
cuits, we obtain average values of accuracy metrics ACC
amp
and
ACC
inf
over the entire simulation to consider both ER and ES.
Table 2: Accuracy metrics for error significance (ES).
metric definition data type
ACC
amp
1 |R
c
R
e
|/R
c
amplitude data
ACC
inf
1 B
e
/B
w
information data
Table 3: ACA adder results with different k values.
k 2 3 4 5
min. clock period (ps) 180 190 220 230
area (um
2
) 550 990 920 840
pass rate (%) 55.3 82.8 94.0 98.1
throughput improvement (%) 11.3 24.6 22.3 21.4
Table 4: Design comparison for each adder design.
CLA LU ACA ETAI ETAIIM
area (um
2
) 910 1356 923 576 678
min. clock period (ps) 280 210 200 200 260
pass rate (%) 100 99.2 94.1 10.0 97.0
ACC
amp
(maximum) 1.000 0.998 0.997 0.999 0.999
ACC
inf
(maximum) 1.000 0.999 0.993 0.694 0.996
area overhead for EDC N/A 75% 28% N/A 15%
3.3 Approximate Adder w ith Different Parameters
We explore the proposed adder with different parameters (k: half
of carry-chain depth). Table 3 summarizes results minimum clock
period, area, error rate and throughput i mprovements for each im-
plementation of the 16-bit adder with different k values. According
to the results, with smaller k, the maximum operating frequency in-
creases, but the error rate i ncreases as well. With higher k, the er-
ror rate is reduced significantly, but the benefit of the approximate
circuit, i.e., clock period reduction, is small. In the table, through-
put improvement over conventional design is calculated including
error recovery overhead. From the implementations, a maximum
throughput improvement is achieved when k = 3. If we correct
erroneous results with EDC as in Figure 4, then 17.2% additional
clock cycles are required for error correction. With this overhead,
ACA adder can improve data throughput by 24.6% over the con-
ventional CLA adder.
3.4 Approximate A dder Comparison
We evaluate each approximate adder with respect to the pass
rate and the accuracy metri cs which we have proposed. We use
gate-level simulation at each possible clock period to compare five
adders: CL A , Lu’s adder [7], ETAI, ETAIIM [14] and the pro-
posed ACA adder (without error correction). In the experiment,
the same carry-chain width (8-bit) is selected for the four approxi-
mate adders. In the implementation, a register (flip-flop) is inserted
in each output port to detect timing errors.
Table 4 shows area, pass rate, accuracy, minimum clock period
and ED C overhead for each adder design. According to the re-
sults, the ETAI adder has the smallest design area, but has a low
pass rate and limited accuracy with respect to the ACC
inf
metric.
Therefore, the ETAI adder is preferred for applications which allow
low accuracy in results. The ETAIIM adder shows fairly high ac-
curacy, but does not have speed (clock period) benefit. Lu’s adder
shows a smaller error rate and high accuracy with respect to both
ACC
amp
and ACC
inf
metrics. However, it requires larger area
than the other designs. The proposed adder shows similar results
for both metrics as Lu’s adder. However, the area of the ACA adder
is smaller than that of Lu’s adder, and EDC is possible with small
area overhead (28%). With the ACA adder, the minimum clock
period can be reduced by 26% compared to the accurate CLA.
0.400
0.500
0.600
0.700
0.800
0.900
1.000
2.00E-04 4.00E-04 6.00E-04 8.00E-04 1.00E-03 1.20E-03
ACC
amp
total power (W)
ACA adder CLA
Lu's adder ETAI
ETAIIM
0.990
0.995
1.000
3.00E-04 8.00E-04
0.400
0.500
0.600
0.700
0.800
0.900
1.000
2.00E-04 4.00E-04 6.00E-04 8.00E-04 1.00E-03 1.20E-03
ACC
inf
total power (W)
ACA adder CLA
Lu's adder ETAI
ETAIIM
0.980
0.990
1.000
4.00E-04 8.00E-04
Voltage scaling
(1.0V~0.6V)
Voltage scaling
(1.0V~0.6V)
Figure 7: Accuracy (y-axis) vs. power consumption (x-axis) under fixed clock
period (0.25ns) and scaled voltage (from 1.0V to 0.6V ).
Figure 7 shows a power vs. accuracy tradeoff in a voltage scaling
scenario: the x-axis shows total power consumption, and the y-axis
shows the accuracy (ACC
amp
, ACC
inf
). The power consumption
and the accuracy are measured with different voltage libraries char-
acterized using Cadence Library Characterizer [19]. The clock
period is fixed at 0.30ns during the simulations. In the results,
Lu’s adder does not show power benefits due to its design size.
ETAI shows low power consumption and high ACC
amp
accuracy,
but has low ACC
inf
accuracy, and cannot detect and correct er-
rors. ETAIIM shows similar characteristics to ACA in the voltage
scaling case, but the adder cannot be used for a high-performance
(high-frequency) design, as shown i n Table 4. The results in Figure
7 imply that our proposed adder can provide a significant power
823
35.1

reduction with small accuracy penalty. When the r equired accu-
racy is 0.970 (ACC
amp
), the ACA adder shows 37.0%, 36.4% and
15.9% total power reduction over CLA, Lu’s adder and ETAIIM,
respectively.
We have tested our approximate adder on a real application a
Gaussian smoothing lter used in [6]. Gaussian smoothing is per-
formed on the input image by convolving wit h a matrix in the spa-
tial domain. In the convolution, the addition operation is done with
approximate 16-bit adders. Other operations, such as multiplication
and division, are accurate computations. Figure 8 shows results for
various approximate adders when they consume 50% of the power
of accurate CLA. From the results, the ACA adder has PSNR of
24.5dB, and this suggests that image processing/filtering applica-
tions could employ our proposed adder with significant power sav-
ings and only small loss in image quality.
(a)
(b) (c)
(d) (e) (f)
Figure 8: Image smoothing: (a) original image with noise; (b) accurate adder; (c)
ACA, PSNR: 24.5 dB; (d) ETAI, PSNR: 25.3 dB; (e) ETAIIM, PSNR: 16.2 dB; (f)
Lu’s adder, P SNR: 11.1dB.
Table 5: Comparison between conventional and approximate (2-stage) pipelined
adders at the accurate mode.
conventional pipelined approximate pipelined
adder area clock total area clock total
width (um
2
) period power k (um
2
) period power
(N) (ns) (mW ) (ns) (mW )
8 459 0.313 0.557 2 576 0.312 0.564
16 1082 0.357 1.558 4 1171 0.358 1.669
32 2252 0.404 2.860 8 2420 0.414 2.914
Table 6: Implementation results of 32-bit ACA adder with 4-stage pipeline (power
consumption of each mode and power reduction over conventional pipelined
adder).
config.
power- ACC
amp
ACC
inf
total power reduction
gating (max.) (max.) (mW) (%)
mode-1 none 1.000 1.000 5.962 -11.5%
mode-2 stage-4 0.998 0.960 4.683 12.4%
mode-3 stage-3, 4 0.991 0.925 3.691 31.0%
mode-4 stage-2, 3, 4 0.983 0.900 2.588 51.6%
3.5 Accuracy Configuration a nd Power Savings
When the architecture allows pipelining for addition, our pro-
posed adder can be implemented as shown in Figure 5. We imple-
ment both the conventional pipelined adder and the approximate
pipelined adder to compare the designs in terms of area, timing and
power. In the implementation, registers (flip-flops) are included at
each pipeline stage (before stage-1, between stage-1 and stage-2,
and after stage-2).
Table 5 shows the implementation results for the conventional
and approximate pipelined adders. The parameter k has been se-
lected as N/4 for a t wo-stage pipelined implementation. In the
table, minimum clock period is measured at a fixed voltage (1.0V ),
and t otal power is measured at a fixed frequency (2.5GHz) with
voltage scaling. In the ACA adder case, timing and power over-
heads from power gating cells, output MUXes, and IR drop are
included. We can see that area, timing and power of both designs
are similar when the ACA adder operates in the accurate mode.
Total power of the approximate adder is comparable to that of the
conventional adder, even though ACA has additional EDC circuits.
This is because ACA has fewer registers between stage-1 and stage-
2 than the conventional pipelined adder. (In Figure 5, the conven-
tional adder requires registers f or A
H
, B
H
, SUM
L
and carry at
the rst stage. For a 16-bit adder, 25 registers (8 + 8 + 8 + 1) are
required. On the other hand, ACA r equires 18 registers (16 for
SU M
approx
and 2 for error indication).)
Figure 9: Accuracy metric ACC
amp
(above) and ACC
inf
(below) vs. power
consumption for conventional pipelined adder, ACA adder in accurate mode, and
ACA adder in approximate mode (4-stage, 32-bit adder).
In the pipelined architecture, the ACA adder can provide various
configurable modes according to the pipeline depth. To improve
the design performance, we increase the pipeline depth; the deeper
pipeline reduces the path depth of the design. In the conventional
pipelined adder, bit-width of the adder in each stage can be reduced
to N/#stage, where N is the entire bit-width and #stage is the
depth (number) of the pipeline stages. In the ACA adder, we can re-
duce the value of parameter k with deeper pipeline depth as shown
in Figure 6. To show the benefit of accuracy configuration, we have
implemented a 32-bit ACA adder (N = 32, k = 4) with 4-stage
pipeline, and compared it with a conventional pipelined adder with
an 8-bit CLA in each stage. Table 6 shows the implemented results
for the 32-bit ACA adder. For the accuracy estimation, one million
cycles of random patterns are used. The ACA adder can operate
in four different modes, based on the power gating of each st age.
We can see that the modes show different power consumptions and
different achievable accuracies. The ACA adder consumes 11.5%
more power than the conventional adder i n accurate mode (mode-1)
due to the presence of recovery circuits. At the same time, it shows
a significant power reduction in the approximate modes: 12.4%,
31.0% and 51.6% in mode-2, mode-3 and mode-4, respectively.
Figure 9 shows detailed results for power consumption versus ac-
curacy metrics in each configuration. From the results, we can see
that accuracy configuration with the mode change is much more ef-
fective than with voltage scaling, in terms of the tradeoff between
accuracy and power.
824
35.1

Citations
More filters
Proceedings ArticleDOI

Approximate computing: An emerging paradigm for energy-efficient design

TL;DR: This paper reviews recent progress in the area, including design of approximate arithmetic blocks, pertinent error and quality measures, and algorithm-level techniques for approximate computing.
Journal ArticleDOI

A Survey of Techniques for Approximate Computing

TL;DR: A survey of techniques for approximate computing (AC), which discusses strategies for finding approximable program portions and monitoring output quality, techniques for using AC in different processing units, processor components, memory technologies, and so forth, as well as programming frameworks for AC.
Proceedings ArticleDOI

A low latency generic accuracy configurable adder

TL;DR: A low-latency generic accuracy configurable adder to support variable approximation modes that provides a higher number of potential configurations compared to state-of-the-art, thus enabling a high degree of design flexibility and trade-off between performance and output quality.
Proceedings ArticleDOI

Quality programmable vector processors for approximate computing

TL;DR: Quality programmable processors, in which the notion of quality is explicitly codified in the HW/SW interface, are suggested to be a significant step towards bringing approximate computing to the mainstream.
Journal ArticleDOI

A Review, Classification, and Comparative Evaluation of Approximate Arithmetic Circuits

TL;DR: A review and classification are presented for the current designs of approximate arithmetic circuits including adders, multipliers, and dividers including improvements in delay, power, and area for the detection of differences in images by using approximate dividers.
References
More filters
Proceedings ArticleDOI

Trading Accuracy for Power with an Underdesigned Multiplier Architecture

TL;DR: A novel multiplier architecture with tunable error characteristics, that leverages a modified inaccurate 2x2 building block, that can achieve 2X - 8X better Signal-Noise-Ratio (SNR) for the same power savings when compared to recent voltage over-scaling based power-error tradeoff methods is proposed.
Proceedings ArticleDOI

Variable latency speculative addition: a new paradigm for arithmetic circuit design

TL;DR: A novel adder design is presented that is exponentially faster than traditional adders; however, it produces incorrect results, deterministically, for a very small fraction of input combinations.
Journal ArticleDOI

Design of Low-Power High-Speed Truncation-Error-Tolerant Adder and Its Application in Digital Signal Processing

TL;DR: A novel error-tolerant adder (ETA) is proposed that is able to ease the strict restriction on accuracy, and at the same time achieve tremendous improvements in both the power consumption and speed performance.
Proceedings ArticleDOI

Enhanced low-power high-speed adder for error-tolerant application

TL;DR: In this paper, the tradeoff between power consumption and speed performance has become a major design consideration when devices approach the sub-100 nm regime, especially when dealing with large data set, whereby the system is degraded in terms of power and speed.
Journal ArticleDOI

Speeding up processing with approximation circuits

Shih-Lien Lu
- 01 Mar 2004 - 
TL;DR: Approximation circuits can increase clock frequency by reducing the number of cycles a function requires by implementing the complete logic function using rough calculations to predict results.
Related Papers (5)
Frequently Asked Questions (13)
Q1. What have the authors contributed in "Accuracy-configurable adder for approximate arithmetic designs" ?

In this paper, the authors propose an accuracy-configurable approximate ( ACA ) adder for which the accuracy of results is configurable during runtime. 

The error signal holds the input pattern during the error correction and chooses the error-corrected value (SUMcorrect) as an output. 

The approximate designs produce almost-correct results at the given required accuracy, and obtain power reductions or performance improvements in return. 

The authors propose another accuracy metric, ACCinf , which measures error significance as Hamming distance, where Be is the number of error bits and Bw is the bit-width of the data. 

delay of the critical path can be expressed with Equation (2) and area can be estimated with Equation (3), where Cdelay and Carea are constants for delay and area, respectively. 

With these simple error detection and correction circuits, their proposed adder can be implemented to have variable latency like the previous VLSA adder [12], with a small overhead for an error detection and correction (EDC) system. 

From the results, the ACA adder has PSNR of 24.5dB, and this suggests that image processing/filtering applications could employ their proposed adder with significant power savings and only small loss in image quality. 

In the ith sub-adder, errors occur when (1) the LSB part of the result (SUMi[k − 1 : 0]) has all ‘1’ values (probability P = 12k ) and (2) the LSB part([k − 1 : 0]) of the (i + 1)th sub-adder produces a carry bit (probability P = 14 + 1 2 · 1 4 + 1 2 · 1 2 · 1 4+ ...). 

when k is less than N/4, it is impossible to correct all errors and achieve 100% correct results within one clock cycle since the error-correction paths become critical. 

in communication systems that mainly handle information data, the number of incorrect bits(Hamming distance) is a more meaningful metric for accuracy – e.g. a (32,28) Reed-Solomon code can correct up to 2-byte errors. 

In the proposed adder implementation, to achieve higher performance or lower power consumption, the authors can reduce the carry chain depth (k) of sub-adders (see Table 1). 

To show the benefit of accuracy configuration, the authors have implemented a 32-bit ACA adder (N = 32, k = 4) with 4-stage pipeline, and compared it with a conventional pipelined adder with an 8-bit CLA in each stage. 

To overcome consequences of overdesign, several recent mechanisms for variation-resilient design [4] allow timing errors and manage design reliability dynamically.