What have the authors contributed in "Accuracy-configurable adder for approximate arithmetic designs" ?

In this paper, the authors propose an accuracy-configurable approximate ( ACA ) adder for which the accuracy of results is configurable during runtime.

What accuracy metric is used to measure error significance?

The authors propose another accuracy metric, ACCinf , which measures error significance as Hamming distance, where Be is the number of error bits and Bw is the bit-width of the data.

What is the PSNR of the ACA adder?

From the results, the ACA adder has PSNR of 24.5dB, and this suggests that image processing/filtering applications could employ their proposed adder with significant power savings and only small loss in image quality.

What is the probability of having a correct result in the ith sub-adder?

In the ith sub-adder, errors occur when (1) the LSB part of the result (SUMi[k − 1 : 0]) has all ‘1’ values (probability P = 12k ) and (2) the LSB part([k − 1 : 0]) of the (i + 1)th sub-adder produces a carry bit (probability P = 14 + 1 2 · 1 4 + 1 2 · 1 2 · 1 4+ ...).

What is the metric of accuracy for a DSP?

in communication systems that mainly handle information data, the number of incorrect bits(Hamming distance) is a more meaningful metric for accuracy – e.g. a (32,28) Reed-Solomon code can correct up to 2-byte errors.

How can the authors reduce the carry chain depth of sub-adders?

In the proposed adder implementation, to achieve higher performance or lower power consumption, the authors can reduce the carry chain depth (k) of sub-adders (see Table 1).

(Open Access) Accuracy-configurable adder for approximate arithmetic designs (2012) | Andrew B. Kahng

Q: What is the effect of a metric on the accuracy of an approximate circuit?

The approximate designs produce almost-correct results at the given required accuracy, and obtain power reductions or performance improvements in return.

Q: What is the overhead for an error detection and correction system?

With these simple error detection and correction circuits, their proposed adder can be implemented to have variable latency like the previous VLSA adder [12], with a small overhead for an error detection and correction (EDC) system.

Q: How can the authors achieve 100% correct results when k is less than N/4?

when k is less than N/4, it is impossible to correct all errors and achieve 100% correct results within one clock cycle since the error-correction paths become critical.

Accuracy-Conﬁg urable Adder for Approx i mate Arithmetic Designs

Andrew B. Kahng

†‡

and Seokhyeong Kang

†

ECE and

‡

CSE D epartments, University of California at San Diego

abk@cs.ucsd.edu, shkang@vlsicad.ucsd. edu

ABSTRACT

Approximation can increase performance or reduce power consump-

tion with a simpliﬁed or inaccurate circuit in application contexts

where strict requirements are relaxed. For applications related t o

human senses, approximate arithmetic can be used to generate suf-

ﬁcient results r at her than absolutely accurate r esults. Approximate

design exploits a tradeoff of accuracy in computation versus per-

formance and power. However, required accuracy varies according

to applications, and 100% accurate results are still required in some

situations. In this paper, we propose an accuracy-conﬁgurable ap-

proximate (ACA) adder for which the accuracy of r esults is con-

ﬁgurable during runtime. Because of its conﬁgurability, the ACA

adder can adaptively operate in both approximate (inaccurate) mode

and accurate mode. The proposed adder can achieve signiﬁcant

throughput improvement and total power reduction over conven-

tional adder designs. It can be used in accuracy-conﬁgurable ap-

plications, and improves the achievable tr adeoff between perfor-

mance/power and quality. T he ACA adder achieves approximately

30% power reduction versus the conventional pipelined adder at the

relaxed accuracy requirement.

Categories and Subject Descriptors

B.7.2 [Hardware]: INTEGR ATED CIRCUITS—Design Aids; J.6

[Computer Applications]: COMPUTER-AIDED ENGINEERING

General Terms

Algorithms, Design, Performance

Keywords

Approximate Arithmetic, Error-Tolerance, Power Minimization,

Accuracy-Conﬁgurable Adder

1. INTRODUCTION

Guardbands for dynamic variations severely limit performance

and energy efﬁciency of conventional IC designs. To overcome

consequences of overdesign, several recent mechanisms for vari-

ation-resilient design [4] allow timing errors and manage design

reliability dynamically. Relaxing the r equirement of correctness for

designs may dramatically reduce costs of manufacturing, veriﬁca-

tion and test [16]. In resilient designs, errors can be corrected with

redundancy techniques (error-tolerance), or accepted in some ap-

plications relating to human senses such as hearing and sight (error-

acceptance). In the error-acceptance regime, approximation via a

simpliﬁed or inaccurate circuit can increase performance and/or re-

duce power consumption.

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for proﬁt or commercial advantage and that copies

bear this notice and the full citation on the ﬁrst page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior speciﬁc

permission and/or a fee.

DAC 2012, June 3-7, 2012, San Francisco, California, USA.

Various approximate arithmetic designs have been previously

proposed. Lu [7] introduces a faster adder which has shorter carry

chains and considers only t he previous k bits of input i n computing

a carry bit. Verma et al. [12] provide a variable latency specula-

tive adder ( V LSA), which is a reliable version of the Lu adder [7]

with error detection and correction. Shin et al. [10] also propose

a data path redesign technique for various adders which cuts the

critical path in the carry chain. Zhu et al . [14] [13] propose three

approximate adders – ET AI, ET AII and ET AIIM. ETAI is

divided into an accurate part and an inaccurate part to achieve ap-

proximate results. ETAII cuts carry propagation to speed up the

adder, and ETAIIM modiﬁes ETAII by connecting carry chains

in accurate MSB parts. Kulkarni et al. [5] present a 2x2 under-

designed multiplier, and use it to build large power-efﬁcient ap-

proximate multipliers. George et al. [3] deﬁne the concept of prob-

abilistic CMOS (PCMOS), and implement efﬁcient arithmetic us-

ing P CMOS. Shin et al. [11] propose a logic synthesis approach

to design an approximate circuit.

The approximate designs produce al most-correct results at the

given required accuracy, and obtain power reductions or perfor-

mance improvements in return. In some applications, however,

more accurate or totally accurate r esults are required under cer-

tain conditions – e.g., image processing in security cameras would

require cleaner images after detecting a motion. In contexts where

the required accuracy changes during runtime, the accuracy of re-

sults should be conﬁgurable to maximize the beneﬁt of approximate

operations. Figure 1 illustrates how power beneﬁts can be achieved

with an accuracy-conﬁgurable design. The accuracy-conﬁgurable

design can adapt to changing accuracy constraints by using differ-

ent modes in each situation. To our knowledge, no previous work

can conﬁgure the output accuracy during runtime, and each is thus

restricted (or, best-suited) to particular application contexts. In con-

texts where the accuracy requirement can change dynamically, the

previous methods’ beneﬁts from the accuracy tradeoff are reduced

since the implementation must be targeted to the maximum accu-

racy requirement.

time

normalized power

1.0

required accuracy

80% 100% 90%

80%

accurate

design

accuracy

configurable

design

event occurred

accurate mode

approximate mode

Figure 1: Power beneﬁts from acc uracy-conﬁgurable design.

n this paper, we propose an accuracy-conﬁgurable approximate

(ACA) adder, which can conﬁgure the accuracy of results during

runtime. The main contributions of our work are the following.

820

35.1

• The proposed ACA adder has runtime-conﬁgurable accuracy

to better enable tradeoff of accuracy in computation versus

performance and power.

• We provide quantitative metrics for an approximate arith-

metic design. We compare the ACA adder to previous ap-

proximate adders based on these metrics.

• We demonstrate the power beneﬁts of the ACA adder over

previous approximate and conventional adder designs for ac-

curacy-conﬁgurable applications.

The rest of the paper is organized as follows. Section 2 presents

the proposed ACA adder design. Section 3 provides experimen-

tal results and analysis. Section 4 summarizes and concludes the

paper.

2. ACCURACY-CONFIGURABLE ADDER

2.1 Approximate A dder Implementation

A[15:0]

8-bit

adder

8-bit

adder

8-bit

adder

SUM[16]

SUM

SUM[3:0]

SUM[15:12]

SUM[7:4]

SUM[11:8]

carry

=A[15:8],

=A[11:4],

=A[7:0]

B[15:0]

A[0]

A[15]

SUM

Figure 2: Proposed approximate adder – 16-bit adder case.

Previous approximate adders [7] [10] [14] have difﬁculty detect-

ing and correcting errors since they are designed for error-accept-

able applications with a target accuracy. However, accurate com-

putations are still required at certain ti mes, according to the appli-

cation. VLSA [12] can provide accurate results, but has large delay

and area overhead for the error detection and correction. The cen-

tral contribution of our present work is to propose an approximate

adder which supports both accurate and inaccurate computation

with error-correction and accuracy-conﬁguration capability. Figure

2 shows our proposed approximate circuit for the case of a 16-bit

adder. In the adder, the carry chain is cut to reduce cri tical-path

delay, and three sub-adders generate results of partial summations.

With the reduced critical-path delay, high performance (by increas-

ing the clock frequency) or low power consumption (by decreasing

the operating voltage) is obtained. A middle sub-adder ( A

)

is introduced to increase accuracy. Without the middle sub-adder

(as in ETAII [13]), error occurs when the eighth carry bit is high,

and for random input patterns the error rate is 50.1%. On the

other hand, with the introduction of the middle sub-adder, error rate

for random input patterns is reduced to 5.5%. (In the real imple-

mentation, all redundant parts (four-LSB output of A

+ B

and

+ B

sub-adders) are optimized only for carry-generation.)

A [N-1:N-k]

B [N-1:N-k]

A [N-k-1:N-2k]

B [N-k-1:N-2k]

A [N-2k-1:N-3k]

B [N-2k-1:N-3k]

SUM [N-1:N-k] SUM [N-k-1:N-2k]

A [N-2k-1:N-3k]

B [N-2k-1:N-3k]

SUM [N-2k-1:N-3k]carry

k N: bit width, k: ½ carry-chain depth

Figure 3: General implementation for the proposed adder.

We can generalize the implementation of the proposed approxi-

mate adder. Figure 3 shows the general implementation of an N-bit

adder with a parameter k, which is the bit-width of the sub-adder

result. In the adder, each divided sub-module produces a k-bit re-

sult except for the last sub-module, which produces a 2k-bit result.

The approximate adder thus consists of the (N/k − 1) sub-modules

as described in Equation (1).

SU M[N − ik − 1 : N − (i + 1)k] =

A[N − ik − 1 : N − (i + 2)k] +

B[N − ik − 1 : N − (i + 2)k],

where i = 0, ..., N/k − 2 (1)

In modern adder designs, such as carry-lookahead (CLA), carry-

select and Kogge-Stone adders, the path depth and area are asymp-

totically proportional to log

N and N log

N respectively, where

N is the bit-width of the adder [15]. Based on this, we can ex-

press delay, area and power consumption of the proposed adder in

terms of t he parameters N and k. The proposed ACA adder has

(N/k − 1) sub-adders, each of which is a 2k-bit adder. Therefore,

delay of the critical path can be expressed with Equation (2) and

area can be estimated with Equation (3), where C

delay

and C

area

are constants for delay and area, respectively.

delay = C

delay

(log

k + 1) (2)

area = C

area

(N − 2k)(log

k + 1) (3)

P ower

dyn

= C

power

(N − 2k)(log

k + 1)

(4)

Power consumption of the ACA adder can be roughly estimated

as follows. Dynamic power consumption with voltage scaling at

a ﬁxed frequency is proportional to capacitance · V

, where

the capacitance is proportional to the area. Cell delay is pro-

portional to 1/(V

− V

)

, and V

is roughly proportional t o

1/(cell delay) if we assume that β is 2. Since (cell delay) ×

(path depth) is constant at a ﬁxed frequency, V

is proportional to

the path depth, which is log

k + 1. Consequently, dynamic power

with voltage scaling can be expressed using Equation (4), where

power

is a constant ﬁxed for given V

for dynamic power con-

sumption. Static power consumption of the adder can be roughly

estimated as proportional to the area in Equation (3).

In our proposed adder design, the output of each sub-adder (ex-

cept the last sub-adder) is incorrect when a carry input should be

propagated to the results. In Figure 2, when the carry[4] (carry

bit from A

+ B

) is ‘1’ and SUM

[3 : 0] is 1111

(2)

, the

output result has an error in S U M[11 : 8]. In the general im-

plementation, the output result will be correct when there are no

errors in all (N/k − 1) sub-adders. In the i

sub-adder, errors

occur when (1) the LSB part of the result (SUM

[k − 1 : 0])

has all ‘1’ values (probability P =

) and (2) the LSB part

([k − 1 : 0]) of the (i + 1)

sub-adder produces a carry bit (prob-

ability P =

+ ...). Therefore, with a random

input vector, the probability of having a correct result in the pro-

posed adder is

P (N, k) = (1 −

− 1

k+1

)

−2

(5)

Table 1 shows the estimated results of 16-bit ACA adders with

different parameter values k. With smaller k value, the minimum

clock period and dynamic power can be reduced, but the pass rate

(probability of having a correct result) will be decreased. The esti-

mations come from Equations (2), (3), (4) and (5). In Section 3.3

below, we validate the above estimation with real implementations.

Table 1: Estimated minimum clock cycle, area, dynamic power and pass rate for

each k value when N = 16 (normalized to the conventional CLA 16-bit adder).

k=2 k=3 k=4 k=5 k=6

min. clock period 0.5 0.65 0.75 0.83 0.89

area 0.87 1.05 1.12 1.15 1.12

dynamic power 0.44 0.68 0.84 0.95 1.00

pass rate 0.554 0.829 0.942 0.982 0.995

821

35.1

2.2 Error Detection and Correction for Accurate

Computation

As described in Section 2.1, our proposed adder is incorrect when

a carry bit is propagated between sub-adders. However, the error

can be detected and corrected with a small overhead. We detect an

error for each sub-adder by checking the output of the sub-adder

and the carry-in signal that comes from the previous sub-adder. Er-

ror detection can be implemented with several ‘and’ gates. To cor-

rect the error, ‘1’ should be added to the approximate (inaccurate)

output, and the error correction can be implemented with an incre-

mentor circuit.

SUM

approx

OUTIN

sub-adder

i+1

approximate adder

SUM

correct

carry

i+1

error

EDC circuit

data stall

sum

error

incrementor

Figure 4: Error detection and cor rection with the approximate adder.

With these simple error detection and correction circuits, our

proposed adder can be implemented to have variable latency like

the previous VL SA adder [12], with a small overhead for an er-

ror detection and correction (EDC) system. Figure 4 shows an

EDC system with our proposed adder. The error detection cir-

cuit (‘and’ gates) checks the carry propagation and generates an

error signal. The error correction (incrementor) circuit produces

an error-free output by adding compensation data, and requires an

additional clock cycle. When errors are detected f rom input pat-

terns, the e rror signal is activated. The error signal holds the input

pattern during the error correction and chooses the error-corrected

value (SU M

correct

) as an output. With this approach, our approxi-

mate adder can provide accurate results at a higher clock frequency

than that of conventional adders (e.g., CLA). According to the esti-

mated results in Table 1, clock period can be reduced by 25% with

6% (= error rate) recovery-cycle overhead (16-bit ACA, k = 4).

2.3 Accuracy Conﬁguration with Pipelined Archi-

tecture

When our proposed adder is combined with a pipelined architec-

ture, we can obtain accurate results with the same throughput as a

conventional adder. In the pipelined architecture, approximate ad-

ditions are computed at the ﬁrst pipeline stage, and error correction

can be completed at the second stage. F igure 5 shows the conven-

tional pipelined adder (above) and the approximate adder (below).

The pipelined implementation of approximate adder has a struc-

tural analogy with the pipelined adder of the 2006 U.S. patent [8] in

which partial summations are performed at the ﬁrst stage and carry

bits are added at the later stages. However, the patent is clearly

directed to accurate operations, not approximate computations. In

addition, we use our approximate adder (Figure 3) in t he ﬁrst stage.

In the pipelined approach, there is no improvement of the clock fre-

quency since the achievable clock period is the same as that of the

conventional adder. However, power beneﬁts are obtained through

conﬁguration of accuracy: in the approximate mode, the error cor-

rection stage is power-gated with foot (or, head) switches in Figure

5, and power reduction over the conventional adder design can be

achieved. We compare t he conventional and approximate pipelined

adders in Section 3.

In the proposed adder implementation, to achieve higher perfor-

mance or lower power consumption, we can reduce the carry chain

depth (k) of sub-adders (see Table 1). However, w hen k is less than

N/4, it is impossible to correct all errors and achieve 100% cor-

rect results within one clock cycle since the error-correction paths

become critical. To achieve correct results in the pipelined imple-

mentation, the error-correction stage should be extended to mul-

tiple stages. Figure 6 shows the pipelined adder implementation

(k = N/8 case), in which four pipeline stages are required to

achieve a 100% accurate result. In the pipelined adder, each stage

generates a result with different accuracy; t he output accuracy in-

creases as the number of pipeline stages increases. According to

the accuracy requirement, we can turn off the l at er stages with a

power gating technique, and we can reduce the power consumption

further with the accuracy tr adeoff.

Since the proposed adder supports both approximate and accu-

rate results, it can be used in applications that require accurate re-

sults only under certain conditions. Conventional accurate designs

are energy-inefﬁcient in the error-acceptable application context,

because they always compute the exact function. Previous approx-

imate designs cannot handle a varying accuracy requirement, and

this limits the beneﬁt of the accuracy tradeoff: as noted above, the

approximate function must meet the maximum accuracy threshold

across all applications. Moreover, if the application requests an ex-

act computation, additional accurate circuits must be added to the

previous approximate designs. By contrast, t he ACA design efﬁ-

ciently exploits a tr adeoff between accuracy and power/performance

with its runtime accuracy conﬁgurability.

approximate adder

SUM

approx

error

SUM

correct

error correction

N/2-bit adder

Stage 1

SUM

Stage 2

carry

N/2-bit adder

SUM

accurate

mode

power gating

switches

Figure 5: Pipelined adder implementation – conventional adder (above) and ap-

proximate adder (below). In approximate operation, the error correction stage is

power-gated.

3. EXPERIMENTAL SETUP AND RESULTS

3.1 Experimental Setup

To test approximate designs, we have written each design in Ver-

ilog and synthesized it to a TSMC 65GP cell library with Synopsys

DesignCompiler [17]. We then perform gate-level simulati ons us-

ing Cadence NC-Sim [18]. In the simulation, gate delay is taken

from an SDF (standard delay format) ﬁle. For voltage scaling ex-

periments, we prepare Synopsys Liberty (.lib) ﬁles for each voltage

from 1.00V to 0.60V in 0.01V increments, using Cadence Library

Characterizer v9.1 [19]. The prepared libraries are used f or SDF

ﬁle generation and power estimation at each voltage. Each simula-

tion is performed with input patterns for one million cycles. During

the simulation, each output value is compared with a reference (cor-

rect) value to produce the accuracy metrics. For the input patterns,

we use random data, as well as actual data fr om SPEC 2006 [ 20]

benchmarks. We extract operand data from ADD instructions in

the SPEC benchmarks.

3.2 Metric for Approximate Design

To quantify errors in approximate designs, two metrics have been

previously proposed [1]. Error rate (ER) is the percentage of cy-

cles in which output value is different from the correct value. Error

signiﬁcance (ES) is the numerical difference between correct and

output results; this quantiﬁes the amount of error. In image/video

applications, [2] uses the product of ES and ER as a metric of

error tolerance. [10] introduces a criterion for acceptability: ES

× ER ≤ acceptance threshold, where the acceptance threshold is

speciﬁed according to the application. For the error signiﬁcance

(ES) metric, [14] considers only amplitude of error. This is use-

ful for many digital signal processing (DSP ) systems that process,

e.g., sound and image data. However, in communication systems

that mainly handle information data, the number of incorrect bits

822

35.1

approximate

adder

Stage 1 Stage 2

errors

on S1

SUM

correct

correction on S1

S3 S2 S1 S0

SUM

approximate

correct

S3 S2 S1 S0

approximate

correct

Stage 3

correction on S2

Stage 4

correction on S3

S3 S2 S1 S0

correctapprox.

S3 S2 S1 S0

correct

errors

on S2

errors

on S3

Figure 6: Accuracy-conﬁgurable implementation for pipelined adder.

(Hamming distance) is a more meaningful metric for accuracy –

e.g. a (32,28) Reed-Solomon code can correct up to 2-byte errors.

This consideration for the ES metric is required when approximate

arithmetic is applied t o error-tolerant systems with a redundancy

technique.

Table 2 shows two accuracy metrics for amplitude data and in-

formation data. ACC

amp

used in [14] quantiﬁes the amplitude of

errors, where R

and R

are the correct and obtained results, re-

spectively. We propose another accuracy metric, ACC

inf

, which

measures error signiﬁcance as Hamming distance, where B

is the

number of error bits and B

is the bit-width of the data. For ex-

ample, when t he correct (reference) data is 1000_0000

(2)

and the

result data is 1100_0000

(2)

, accuracy with A CC

amp

and ACC

inf

will be

and

, respectively. To evaluate the approximate cir-

cuits, we obtain average values of accuracy metrics ACC

amp

and

ACC

inf

over the entire simulation to consider both ER and ES.

Table 2: Accuracy metrics for error signiﬁcance (ES).

metric deﬁnition data type

ACC

amp

1 − |R

− R

|/R

amplitude data

ACC

inf

1 − B

information data

Table 3: ACA adder results with different k values.

k 2 3 4 5

min. clock period (ps) 180 190 220 230

area (um

) 550 990 920 840

pass rate (%) 55.3 82.8 94.0 98.1

throughput improvement (%) 11.3 24.6 22.3 21.4

Table 4: Design comparison for each adder design.

CLA LU ACA ETAI ETAIIM

area (um

) 910 1356 923 576 678

min. clock period (ps) 280 210 200 200 260

pass rate (%) 100 99.2 94.1 10.0 97.0

ACC

amp

(maximum) 1.000 0.998 0.997 0.999 0.999

ACC

inf

(maximum) 1.000 0.999 0.993 0.694 0.996

area overhead for EDC N/A 75% 28% N/A 15%

3.3 Approximate Adder w ith Different Parameters

We explore the proposed adder with different parameters (k: half

of carry-chain depth). Table 3 summarizes results – minimum clock

period, area, error rate and throughput i mprovements – for each im-

plementation of the 16-bit adder with different k values. According

to the results, with smaller k, the maximum operating frequency in-

creases, but the error rate i ncreases as well. With higher k, the er-

ror rate is reduced signiﬁcantly, but the beneﬁt of the approximate

circuit, i.e., clock period reduction, is small. In the table, through-

put improvement over conventional design is calculated including

error recovery overhead. From the implementations, a maximum

throughput improvement is achieved when k = 3. If we correct

erroneous results with EDC as in Figure 4, then 17.2% additional

clock cycles are required for error correction. With this overhead,

ACA adder can improve data throughput by 24.6% over the con-

ventional CLA adder.

3.4 Approximate A dder Comparison

We evaluate each approximate adder with respect to the pass

rate and the accuracy metri cs which we have proposed. We use

gate-level simulation at each possible clock period to compare ﬁve

adders: CL A , Lu’s adder [7], ETAI, ETAIIM [14] and the pro-

posed ACA adder (without error correction). In the experiment,

the same carry-chain width (8-bit) is selected for the four approxi-

mate adders. In the implementation, a register (ﬂip-ﬂop) is inserted

in each output port to detect timing errors.

Table 4 shows area, pass rate, accuracy, minimum clock period

and ED C overhead for each adder design. According to the re-

sults, the ETAI adder has the smallest design area, but has a low

pass rate and limited accuracy with respect to the ACC

inf

metric.

Therefore, the ETAI adder is preferred for applications which allow

low accuracy in results. The ETAIIM adder shows fairly high ac-

curacy, but does not have speed (clock period) beneﬁt. Lu’s adder

shows a smaller error rate and high accuracy with respect to both

ACC

amp

and ACC

inf

metrics. However, it requires larger area

than the other designs. The proposed adder shows similar results

for both metrics as Lu’s adder. However, the area of the ACA adder

is smaller than that of Lu’s adder, and EDC is possible with small

area overhead (28%). With the ACA adder, the minimum clock

period can be reduced by 26% compared to the accurate CLA.

0.400

0.500

0.600

0.700

0.800

0.900

1.000

2.00E-04 4.00E-04 6.00E-04 8.00E-04 1.00E-03 1.20E-03

ACC

amp

total power (W)

ACA adder CLA

Lu's adder ETAI

ETAIIM

0.990

0.995

1.000

3.00E-04 8.00E-04

0.400

0.500

0.600

0.700

0.800

0.900

1.000

2.00E-04 4.00E-04 6.00E-04 8.00E-04 1.00E-03 1.20E-03

ACC

inf

total power (W)

ACA adder CLA

Lu's adder ETAI

ETAIIM

0.980

0.990

1.000

4.00E-04 8.00E-04

Voltage scaling

(1.0V~0.6V)

Voltage scaling

(1.0V~0.6V)

Figure 7: Accuracy (y-axis) vs. power consumption (x-axis) under ﬁxed clock

period (0.25ns) and scaled voltage (from 1.0V to 0.6V ).

Figure 7 shows a power vs. accuracy tradeoff in a voltage scaling

scenario: the x-axis shows total power consumption, and the y-axis

shows the accuracy (ACC

amp

, ACC

inf

). The power consumption

and the accuracy are measured with different voltage libraries char-

acterized using Cadence Library Characterizer [19]. The clock

period is ﬁxed at 0.30ns during the simulations. In the results,

Lu’s adder does not show power beneﬁts due to its design size.

ETAI shows low power consumption and high ACC

amp

accuracy,

but has low ACC

inf

accuracy, and cannot detect and correct er-

rors. ETAIIM shows similar characteristics to ACA in the voltage

scaling case, but the adder cannot be used for a high-performance

(high-frequency) design, as shown i n Table 4. The results in Figure

7 imply that our proposed adder can provide a signiﬁcant power

823

35.1

reduction with small accuracy penalty. When the r equired accu-

racy is 0.970 (ACC

amp

), the ACA adder shows 37.0%, 36.4% and

15.9% total power reduction over CLA, Lu’s adder and ETAIIM,

respectively.

We have tested our approximate adder on a real application – a

Gaussian smoothing ﬁlter used in [6]. Gaussian smoothing is per-

formed on the input image by convolving wit h a matrix in the spa-

tial domain. In the convolution, the addition operation is done with

approximate 16-bit adders. Other operations, such as multiplication

and division, are accurate computations. Figure 8 shows results for

various approximate adders when they consume 50% of the power

of accurate CLA. From the results, the ACA adder has PSNR of

24.5dB, and this suggests that image processing/ﬁltering applica-

tions could employ our proposed adder with signiﬁcant power sav-

ings and only small loss in image quality.

(a)

(b) (c)

(d) (e) (f)

Figure 8: Image smoothing: (a) original image with noise; (b) accurate adder; (c)

ACA, PSNR: 24.5 dB; (d) ETAI, PSNR: 25.3 dB; (e) ETAIIM, PSNR: 16.2 dB; (f)

Lu’s adder, P SNR: 11.1dB.

Table 5: Comparison between conventional and approximate (2-stage) pipelined

adders at the accurate mode.

conventional pipelined approximate pipelined

adder area clock total area clock total

width (um

) period power k (um

) period power

(N) (ns) (mW ) (ns) (mW )

8 459 0.313 0.557 2 576 0.312 0.564

16 1082 0.357 1.558 4 1171 0.358 1.669

32 2252 0.404 2.860 8 2420 0.414 2.914

Table 6: Implementation results of 32-bit ACA adder with 4-stage pipeline (power

consumption of each mode and power reduction over conventional pipelined

adder).

conﬁg.

power- ACC

amp

ACC

inf

total power reduction

gating (max.) (max.) (mW) (%)

mode-1 none 1.000 1.000 5.962 -11.5%

mode-2 stage-4 0.998 0.960 4.683 12.4%

mode-3 stage-3, 4 0.991 0.925 3.691 31.0%

mode-4 stage-2, 3, 4 0.983 0.900 2.588 51.6%

3.5 Accuracy Conﬁguration a nd Power Savings

When the architecture allows pipelining for addition, our pro-

posed adder can be implemented as shown in Figure 5. We imple-

ment both the conventional pipelined adder and the approximate

pipelined adder to compare the designs in terms of area, timing and

power. In the implementation, registers (ﬂip-ﬂops) are included at

each pipeline stage (before stage-1, between stage-1 and stage-2,

and after stage-2).

Table 5 shows the implementation results for the conventional

and approximate pipelined adders. The parameter k has been se-

lected as N/4 for a t wo-stage pipelined implementation. In the

table, minimum clock period is measured at a ﬁxed voltage (1.0V ),

and t otal power is measured at a ﬁxed frequency (2.5GHz) with

voltage scaling. In the ACA adder case, timing and power over-

heads from power gating cells, output MUXes, and IR drop are

included. We can see that area, timing and power of both designs

are similar when the ACA adder operates in the accurate mode.

Total power of the approximate adder is comparable to that of the

conventional adder, even though ACA has additional EDC circuits.

This is because ACA has fewer registers between stage-1 and stage-

2 than the conventional pipelined adder. (In Figure 5, the conven-

tional adder requires registers f or A

, B

, SUM

and carry at

the ﬁrst stage. For a 16-bit adder, 25 registers (8 + 8 + 8 + 1) are

required. On the other hand, ACA r equires 18 registers (16 for

SU M

approx

and 2 for error indication).)

0.00E+00

1.00E-03

2.00E-03

3.00E-03

4.00E-03

5.00E-03

6.00E-03

7.00E-03

0.80 0.85 0.90 0.95 1.00

total power consumption (W)

ACC

inf

Conventional pipelined adder ACA adder (mode 1)

ACA adder (mode 2) ACA adder (mode 3)

ACA adder (mode 4)

0.00E+00

1.00E-03

2.00E-03

3.00E-03

4.00E-03

5.00E-03

6.00E-03

7.00E-03

0.95 0.96 0.97 0.98 0.99 1.00

total power consumption (W)

ACC

amp

Conventional pipelined adder ACA adder (mode 1)

ACA adder (mode 2) ACA adder (mode 3)

ACA adder (mode 4)

accurate result

mode change

voltage scaling

Figure 9: Accuracy metric ACC

amp

(above) and ACC

inf

(below) vs. power

consumption for conventional pipelined adder, ACA adder in accurate mode, and

ACA adder in approximate mode (4-stage, 32-bit adder).

In the pipelined architecture, the ACA adder can provide various

conﬁgurable modes according to the pipeline depth. To improve

the design performance, we increase the pipeline depth; the deeper

pipeline reduces the path depth of the design. In the conventional

pipelined adder, bit-width of the adder in each stage can be reduced

to N/#stage, where N is the entire bit-width and #stage is the

depth (number) of the pipeline stages. In the ACA adder, we can re-

duce the value of parameter k with deeper pipeline depth as shown

in Figure 6. To show the beneﬁt of accuracy conﬁguration, we have

implemented a 32-bit ACA adder (N = 32, k = 4) with 4-stage

pipeline, and compared it with a conventional pipelined adder with

an 8-bit CLA in each stage. Table 6 shows the implemented results

for the 32-bit ACA adder. For the accuracy estimation, one million

cycles of random patterns are used. The ACA adder can operate

in four different modes, based on the power gating of each st age.

We can see that the modes show different power consumptions and

different achievable accuracies. The ACA adder consumes 11.5%

more power than the conventional adder i n accurate mode (mode-1)

due to the presence of recovery circuits. At the same time, it shows

a signiﬁcant power reduction in the approximate modes: 12.4%,

31.0% and 51.6% in mode-2, mode-3 and mode-4, respectively.

Figure 9 shows detailed results for power consumption versus ac-

curacy metrics in each conﬁguration. From the results, we can see

that accuracy conﬁguration with the mode change is much more ef-

fective than with voltage scaling, in terms of the tradeoff between

accuracy and power.

824

35.1

Accuracy-configurable adder for approximate arithmetic designs

Figures

Citations

Approximate computing: An emerging paradigm for energy-efficient design

A Survey of Techniques for Approximate Computing

A low latency generic accuracy configurable adder

Quality programmable vector processors for approximate computing

A Review, Classification, and Comparative Evaluation of Approximate Arithmetic Circuits

References

Trading Accuracy for Power with an Underdesigned Multiplier Architecture

Variable latency speculative addition: a new paradigm for arithmetic circuit design

Design of Low-Power High-Speed Truncation-Error-Tolerant Adder and Its Application in Digital Signal Processing

Enhanced low-power high-speed adder for error-tolerant application

Speeding up processing with approximation circuits

Related Papers (5)

Low-Power Digital Signal Processing Using Approximate Adders

Trading Accuracy for Power with an Underdesigned Multiplier Architecture

Variable latency speculative addition: a new paradigm for arithmetic circuit design

Design of Low-Power High-Speed Truncation-Error-Tolerant Adder and Its Application in Digital Signal Processing

Approximate computing: An emerging paradigm for energy-efficient design

Frequently Asked Questions (13)

Q1. What have the authors contributed in "Accuracy-configurable adder for approximate arithmetic designs" ?

Q2. What is the error signal in the pipelined approach?

Q3. What is the effect of a metric on the accuracy of an approximate circuit?

Q4. What accuracy metric is used to measure error significance?

Q5. How can the authors estimate the area of the critical path?

Q6. What is the overhead for an error detection and correction system?

Q7. What is the PSNR of the ACA adder?

Q8. What is the probability of having a correct result in the ith sub-adder?

Q9. How can the authors achieve 100% correct results when k is less than N/4?

Q10. What is the metric of accuracy for a DSP?

Q11. How can the authors reduce the carry chain depth of sub-adders?

Q12. How many stages are required to add a ACA adder?

Q13. What mechanisms allow timing errors and manage design reliability dynamically?