What are the future works mentioned in the paper "Design and analysis of approximate compressors for multiplication" ?

Current and future research addresses the tradeoffs of the different figures of merit in the proposed designs to establish conditions by which combined metrics can be attained. Moreover, physical designs of the approximate multipliers are being pursued to further confirm the analysis presented in this paper. In conclusion, this paper has shown that by an appropriate design of an approximate compressor, multipliers can be designed for inexact computing ; these multipliers offer significant advantages in terms of both circuit-level and error 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 For Peer Review O nly > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER ( DOUBLE-CLICK HERE TO EDIT ) < 10 figures of merit. Although not discussed and beyond the scope of this manuscript, the proposed designs may also be useful in other arithmetic circuits for applications in which inexact computing can be used.

(Open Access) Design and Analysis of Approximate Compressors for Multiplication (2015) | Amir Momeni

Q: What are the contributions mentioned in the paper "Design and analysis of approximate compressors for multiplication" ?

This paper deals with the analysis and design of two new approximate 4-2 compressors for utilization in a multiplier. Extensive simulation results are provided and an application of the approximate multipliers to image processing is presented. The results show that the proposed designs accomplish significant reductions in power dissipation, delay and transistor count compared to an exact design ; moreover, two of the proposed multiplier designs provide excellent capabilities for image multiplication with respect to average normalized error distance and peak signal-to-noise ratio ( more than 50dB for the considered image examples ).

For Peer Review Only

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

Design and Analysis of

Approximate Compressors for Multiplication

A. Momeni, J. Han, Member, P.Montuschi, Senior Member and F. Lombardi, Fellow

Abstract—Inexact (or approximate) computing is an attractive

paradigm for digital processing at nanometric scales. Inexact

computing is particularly interesting for computer arithmetic

designs. This paper deals with the analysis and design of two new

approximate 4-2 compressors for utilization in a multiplier.

These designs rely on different features of compression, such that

imprecision in computation (as measured by the error rate and

the so-called normalized error distance) can meet with respect to

circuit-based figures of merit of a design (number of transistors,

delay and power consumption). Four different schemes for

utilizing the proposed approximate compressors are proposed

and analyzed for a Dadda multiplier. Extensive simulation results

are provided and an application of the approximate multipliers

to image processing is presented. The results show that the

proposed designs accomplish significant reductions in power

dissipation, delay and transistor count compared to an exact

design; moreover, two of the proposed multiplier designs provide

excellent capabilities for image multiplication with respect to

average normalized error distance and peak signal-to-noise ratio

(more than 50dB for the considered image examples).

Index Terms—Compressor, Dadda Multiplier, Inexact

Computing, Approximate Circuits

I. INTRODUCTION

OST

computer arithmetic applications are

implemented using digital logic circuits, thus

operating with a high degree of reliability and

precision. However, many applications such as in multimedia

and image processing can tolerate errors and imprecision in

computation and still produce meaningful and useful results.

Accurate and precise models and algorithms are not always

suitable or efficient for use in these applications. The

paradigm of inexact computation relies on relaxing fully

precise and completely deterministic building modules when

for example, designing energy-efficient systems. This allows

imprecise computation to redirect the existing design process

of digital circuits and systems by taking advantage of a

decrease in complexity and cost with possibly a potential

increase in performance and power efficiency. Approximate

(or inexact) computing relies on using this property to design

simplified, yet approximate circuits operating at higher

performance and/or lower power consumption compared with

precise (exact) logic circuits [1].

___________________________________________

A Momeni and F. Lombardi are with the Department of Electrical and

Computer Engineering, Northeastern University, Boston, MA 02115, USA;

{lombardi@ece.neu.edu, momeni.a@husky.neu.edu}. J. Han is with the

Department of Electrical and Computer Engineering, University of Alberta,

Edmonton, Canada; {jhan8@ualberta.ca}, P. Montuschi is withthe

Department of Control and Computer Engineering, Politecnico di Torino,

Turin, Italy;{paolo.montuschi@polito.it)

Addition and multiplication are widely used operations in

computer arithmetic; for addition full-adder cells have been

extensively analyzed for approximate computing [2-4]. [1] has

compared these adders and proposed several new metrics for

evaluating approximate and probabilistic adders with respect

to unified figures of merit for design assessment for inexact

computing applications. For each input to a circuit, the error

distance (ED) is defined as the arithmetic distance between an

erroneous output and the correct one [1]. The mean error

distance (MED) and normalized error distance (NED) are

proposed by considering the averaging effect of multiple

inputs and the normalization of multiple-bit adders. The NED

is nearly invariant with the size of an implementation and is

therefore useful in the reliability assessment of a specific

design. The tradeoff between precision and power has also

been quantitatively evaluated in [1].

However, the design of approximate multipliers has

received less attention. Multiplication can be thought as the

repeated sum of partial products; however, the straightforward

application of approximate adders when designing an

approximate multiplier is not viable, because it would be very

inefficient in terms of precision, hardware complexity and

other performance metrics. Several approximate multipliers

have been proposed in the literature [4] [5] [6] [7]. Most of

these designs use a truncated multiplication method; they

estimate the least significant columns of the partial products as

a constant. In [4], an imprecise array multiplier is used for

neural network applications by omitting some of the least

significant bits in the partial products (and thus removing

some adders in the array). A truncated multiplier with a

correction constant is proposed in [5]. For an n×n multiplier,

this design calculates the sum of the n+k most significant

columns of the partial products and truncates the other n-k

columns. The n+k bit result is then rounded to n bits. The

reduction error (i.e. the error generated by truncating then-k

least significant bits) and rounding error (i.e. the error

generated by rounding the result to n bits) are found in the

next step. The correction constant (n+k bits) is selected to be

as close as possible to the estimated value of the sum of these

errors to reduce the error distance.

A truncated multiplier with constant correction has the

maximum error if the partial products in the n-k least

significant columns are all ones or all zeros. A variable

correction truncated multiplier has been proposed in [6].This

method changes the correction term based on column n-k-1. If

all partial products in columnn-k-1 are one, then the correction

term is increased. Similarly, if all partial products in this

column are zero, the correction term is decreased.

In [7], a simplified (and thus inaccurate) 2x2 multiplier

block is proposed for building larger multiplier arrays. In the

design of a fast multiplier, compressors have been widely used

Page 1 of 13 Transactions on Computers

For Peer Review Only

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

[8-10] to speed up the partial product reduction tree and

decrease power dissipation. Optimized designs of 4-2 exact

compressors have been proposed in [8, 11 - 16]. [17] [18] have

also considered compression for approximate multiplication.

In [17], an approximate signed multiplier has been proposed

for use in arithmetic data value speculation (AVDS);

multiplication is performed using the Baugh-Wooley

algorithm. However, no new design is proposed for the

compressors for the inexact computation. Designs of

approximate compressors have been proposed in [18];

however, these designs do not target multiplication. It should

be noted that the approach of [7] improves over [17] [18] by

utilizing a simplified multiplier block that is amenable to

approximate multiplication.

Initially in this paper, two novel approximate 4-2

compressors are proposed and analyzed. It is shown that these

simplified compressors have better delay and power

consumption than the optimized (exact) 4-2 compressor

designs found in the technical literature [8]. These

approximate compressors are then used in the restoration

module of a Dadda multiplier; four different schemes are

proposed for inexact multiplication. Extensive simulation

results are provided at circuit-level for figures of merit, such

as delay, transistor count, power dissipation, error rate and

normalized error distance under CMOS feature sizes of 32, 22

and 16 nm. The application of these multipliers to image

processing is then presented. The results of two examples of

multiplication of two images are reported; these results show

that the third and fourth approximate multipliers yield an

output product image that has a very high quality and

resemblance to the image generated by an exact multiplier, i.e.

excellent values for the average NED and the Peak Signal-to-

Noise Ratio (PSNR) are found (for the PSNR more than

50db). The analysis and simulation results show that the

proposed approximate designs for both the compressor and the

multiplier are viable candidates for inexact computing.

This paper is organized as follows. Section 2 is a review of

existing schemes for (exact) compressors. The two new

designs of an approximate 4-2 compressor are presented in

Section 3.Multiplication and four different approximate

multipliers are proposed in Section 4. Simulation results for

the approximate compressors and multipliers are provided in

Section 5. The application of the proposed approximate

multipliers to image processing is presented in Section 6.

Section 7 concludes the manuscript.

II.

XACT COMPRESSORS

The main goal of either multi-operand carry-save addition

or parallel multiplication is to reduce n numbers to two

numbers; therefore, n-2 compressors (or n-2 counters) have

been widely used in computer arithmetic. An-2 compressor

(Figure 1) is usually a slice of a circuit that reduces n numbers

to two numbers when properly replicated. In slice i of the

circuit, the n-2 compressor receives n bits in position i and one

or more carry bits from the positions to the right, such as i – 1

or i – 2. It produces two output bits in positions i and i + 1 and

one or more carry bits into the higher positions, such as i + 1

or c n hown in

Fig th e

i + 2.For the orrect operatio of the circuit s

ure 1, e following inequality must be satisfi d













 











 (1)

Figure 1.Schematic diagram of n-2 compressors in a multi operand addition

circuit [13]

Where 



denotes the number of carry bits from slice ito

slice i+ j.

A widely used structure for compression is the 4-2

compressor; a 4-2 compressor (Figure 2) can be implemented

with a carry bit between adjacent slices (





). The carry bit

from the position to the right is denoted as c

while the carry

bit into the higher position is denoted as c

out

. The two output

bits in positions i and i + 1are also referred to as the sum and

carry respectively.

Figure2.4-2 compressor

The following equations give the outputs of the 4-2

r, e e truth table. compresso whil Tabl 1 shows its

           (2)

 

󰇛

  

󰇜

 

󰇛

  

󰇜



 (3)

 

󰇛

      

󰇜

󰇛󰇜



 (4)

The common implementation of a 4-2 compressor is

accomplished by utilizing two full-adder (FA) cells (Figure 3)

[8]. Different designs have been proposed in the literature for

4-2 compressor [8, 11-16].

Figure 4 shows the optimized design of an exact4-2

compressor based on the so-called XOR-XNOR gates [8]; a

XOR-XNOR gate simultaneously generates the XOR and

XNOR output signals. The design of [8] consists of three

XOR-XNOR (denoted by XOR

) gates, one XOR and two 2-1

MUXes. The critical path of this design has a delay of 3Δ,

where Δ is the unitary delay through any gate in the design.

Page 2 of 13Transactions on Computers

For Peer Review Only

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

Figure 3. Implementation of 4-2 Compressor

TABLE

RUTH TABLE OF 4-2 COMPRESSOR

out

carry sum

0 0 0 0 0 0 0 0

0 0 0 0 1 0 0 1

0 0 0 1 0 0 0 1

0 0 0 1 1 1 0 0

0 0 1 0 0 0 0 1

0 0 1 0 1 1 0 0

0 0 1 1 0 1 0 0

0 0 1 1 1 1 0 1

0 1 0 0 0 0 0 1

0 1 0 0 1 0 1 0

0 1 0 1 0 0 1 0

0 1 0 1 1 1 0 1

0 1 1 0 0 0 1 0

0 1 1 0 1 1 0 1

0 1 1 1 0 1 0 1

0 1 1 1 1 1 1 0

1 0 0 0 0 0 0 1

1 0 0 0 1 0

1 0

1 0 0 1 0 0 1 0

1 0 0 1 1 1 0 1

1 0 1 0 0 0 1 0

1 0 1 0 1 1 0 1

1 0 1 1 0 1 0 1

1 0 1 1 1 1 1 0

1 1 0 0 0 0 1 0

1 1 0 0 1 0 1 1

1 1 0 1 0 0 1 1

1 1 0 1 1 1 1 0

1 1 1 0 0 0 1 1

1 1 1 0 1 1 1 0

1 1 1 1 0 1 1 0

1 1 1 1 1 1 1 1

III. PROPOSED APPROXIMATE COMPRESSORS

In this section, two designs of an approximate compressor

are proposed. Intuitively to design an approximate 4-2

compressor, it is possible to substitute the exact full-adder

cells in Figure3 by an approximate full-adder cell (such as the

first design proposed in [2]). However, this is not very

efficient, because it produces at least 17 incorrect results out

of 32 possible outputs, i.e. the error rate of this inexact

compressor is more than 53% (where the error rate is given

by the ratio of the number of erroneous outputs over the total

number of outputs). Two different designs are proposed next

to reduce the error rate; these designs offer significant

performance improvement compared to an exact compressor

with respect to delay, number of transistors and power

consumption.

Figure4. Optimized 4-2 compressor of [8]

A. Design 1

As shown in Table I, the carry output in an exact

compressor has the same value of the input c

in 24 out of 32

states. Therefore, an approximate design must consider this

feature. In Design 1, the carry is simplified to c

by changing

o e other 8 outputs. the value f th

   (5)

Since the Carry output has the higher weight of a binary bit,

an erroneous value of this signal will produce a difference

value of two in the output. For example, if the input pattern is

“01001” (row 10 of Table II), the correct output is “010” that

is equal to 2. By simplifying the carry output to c

, the

approximate compressor will generate the “000” pattern at the

output (i.e. a value of 0). This substantial difference may not

be acceptable; however, it can be compensated or reduced by

simplifying the c

out

and sum signals. In particular, the

simplification of sum to a value of 0 (second half of Table II)

reduces the difference between the approximate and the exact

outputs as well as the complexity of its design. Also, the

presence of some errors in the sum signal will results in a

reductions of the delay of producing the approximate sum and

the overall delay of the design (because it is on the critical

path).

  



󰇛  







󰇜 (6)

In the last step, the change of the value of c

out

in some

states, may reduce the error distance provided by approximate

carry and sum and also more simplification in the proposed

design.

 

󰇛

















󰇜



(7)

Page 3 of 13 Transactions on Computers

For Peer Review Only

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

Although the above mentioned simplifications of carry and

sum increase the error rate in the proposed approximate

compressor, its design complexity and therefore the power

consumption are considerably decreased. This can be realized

by comparing (2)-(4) and (5)-(7).Table II shows the truth table

of the first proposed approximate compressor. It also shows

the difference between the inexact output of the proposed

approximate compressor and the output of the exact

compressor. As shown in Table II, the proposed design has 12

incorrect outputs out of 32 outputs (thus yielding an error rate

of 37.5%). This is less than the error rate using the best

approximate full-adder cell of [2].

TABLE II

RUTH TABLE OF THE FIRSTAPPROXIMATE 4-2 COMPRESSOR

out

’ carry’ sum'

Difference

0 0 0 0 0 0 0 1 1

0 0 0 0 1 0 0 1 0

0 0 0 1 0 0 0 1 0

0 0 0 1 1 0 0 1 -1

0 0 1 0 0 0 0 1 0

0 0 1 0 1 1 0 0 0

0 0 1 1 0 1 0 0 0

0 0 1 1 1 1 0 1 0

0 1 0 0 0 0 0 1 0

0 1 0 0 1 1 0 0 0

0 1 0 1 0 1 0 0 0

0 1 0 1 1 1 0 1 0

0 1 1 0 0 0 0 1 -1

0 1 1 0 1 1 0 1 0

0 1 1 1 0 1 0 1 0

0 1 1 1 1 1 0 1

-1

1 0 0 0 0 0 1 0 1

1 0 0 0 1 0 1 0 0

1 0 0 1 0 0 1 0 0

1 0 0 1 1 0 1 0 -1

1 0 1 0 0 0 1 0 0

1 0 1 0 1 1 1 0 1

1 0 1 1 0 1 1 0 1

1 0 1 1 1 1 1 0 0

1 1 0 0 0 0 1 0 0

1 1 0 0 1 1 1 0 1

1 1 0 1 0 1 1 0 1

1 1 0 1 1 1 1 0 0

1 1 1 0 0 0 1 0 -1

1 1 1 0 1 1 1 0 0

1 1 1 1 0 1 1

0 0

1 1 1 1 1 1 1 0 -1

(5)-(7) are the logic expressions for the outputs of the first

design of the approximate 4-2 compressor proposed in this

manuscript.

The gate level structure of the first proposed design (Figure

6) shows that the critical path of this compressor has still a

delay of 3Δ, so it is the same as for the exact compressor of

Figure 5. However, the propagation delay through the gates of

this design is lower than the one for the exact compressor. For

example, the propagation delay in the XOR* gate that

generates both the XOR and XNOR signals in [8], is higher

than the delay through a XNOR gate of the proposed design.

Therefore, the critical path delay in the proposed design is

lower than in the exact design and moreover, the total number

of gates in the proposed design is significantly less than that in

the optimized exact compressor of [8].

B. Design 2

A second design of an approximate compressor is proposed

to further increase performance as well as reducing the error

rate. Since the carry and c

out

outputs have the same weight,

the proposed equations for the approximate carry and c

out

the previous part can be interchanged. In this new design,

carry uses the right hand side of (7) and c

out

is always equal to

; since c

is zero in the first stage, c

out

and c

will be zero in

all stages. So, c

and c

out

can be ignored in the hardware

design. Figure 7shows the block diagram of this approximate

p ons below describe its outputs. 4-2 com ressor and the expressi

  󰇛  







󰇜 (8)

 

󰇛

















󰇜



(9)

Figure 6. Gate level implementation of Design 1

Figure7. Approximate 4-2 compressor, Design 2

Note that (9) is the same as (7) and (8) is the same as (6) for

= 0. Figure 8 shows the gate level implementation of the

second proposed design. The delay of the critical path of this

approximate design is 2Δ, so it is 1Δ less than the previous

designs; moreover, a further reduction in the number of gates

is accomplished.

Figure 8. Gate level implementation of Design 2

Table III shows the truth table of the second approximate

design for a 4-2 compressor; this Table also shows the

difference between the exact decimal value of the addition of

the inputs and the decimal value of the outputs produced by

the approximate compressor. For example when all inputs are

Page 4 of 13Transactions on Computers

For Peer Review Only

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

1, the decimal value of the addition of the inputs is 4.

However, the approximate compressor produces a 1 for the

carry and sum. The decimal value of the outputs in this case is

3; Table II shows that the difference is -1.

TABLE III

RUTH TABLE OF SECOND PROPOSED 4-2 COMPRESSOR

carry’ sum'

difference

0 0 0 0 0 1 1

0 0 0 1 0 1 0

0 0 1 0 0 1 0

0 0 1 1 0 1 -1

0 1 0 0 0 1 0

0 1 0 1 1 0 0

0 1 1 0 1 0 0

0 1 1 1 1 1 0

1 0 0 0 0 1 0

1 0 0 1 1 0 0

1 0 1 0 1 0 0

1 0 1 1 1 1 0

1 1 0 0 0 1 -1

1 1 0 1 1 1 0

1 1 1 0 1 1 0

1 1 1 1 1 1 -1

This design has therefore 4 incorrect outputs out of 16

outputs, so its error rate is now reduced to 25%. This is a very

positive feature, because it shows that on a probabilistic basis,

the imprecision of the proposed design is smaller than the

other available schemes.

IV. M

ULTIPLICATION

In this section, the impact of using the proposed

compressors for multiplication is investigated. A fast (exact)

multiplier is usually composed of three parts (or modules) [8].

• Partial product generation.

• A Carry Save Adder (CSA) tree to reduce the partial

products’ matrix to an addition of only two operands

• A Carry Propagation Adder (CPA) for the final

computation of the binary result.

In the design of a multiplier, the second module plays a

pivotal role in terms of delay, power consumption and circuit

complexity. Compressors have been widely used [9, 10] to

speed up the CSA tree and decrease its power dissipation, so

to achieve fast and low-power operation. The use of

approximate compressors in the CSA tree of a multiplier

results in an approximate multiplier.

A 8×8 unsigned Dadda tree multiplier is considered to

assess the impact of using the proposed compressors in

approximate multipliers. The proposed multiplier uses in the

first part AND gates to generate all partial products. In the

second part, the approximate compressors proposed in the

previous section are utilized in the CSA tree to reduce the

partial products. The last part is an exact CPA to compute the

final binary result. Figure 9(a) shows the reduction circuitry of

an exact multiplier for n=8. In this figure, the reduction part

uses half-adders, full-adders and 4-2 compressors; each partial

product bit is represented by a dot. In the first stage, 2 half-

adders, 2 full-adders and 8 compressors are utilized to reduce

the partial products into at most four rows. In the second or

final stage, 1 half-adder, 1 full-adder and 10 compressors are

used to compute the two final rows of partial products.

Therefore, two stages of reduction and 3 half-adders, 3 full-

adders and 18 compressors are needed in the reduction

circuitry of an 8×8Dadda multiplier.

In this paper, four cases are considered for designing an

approximate multiplier.

Figure 9. Reduction circuitry of an 8×8Dadda multiplier, (a) using Design

1 compressors, (b) using Design 2 compressors

• In the first case (Multiplier 1), Design 1 is used for all 4-2

compressors in Figure 9(a).

• In the second case (Multiplier 2), Design 2 is used for the

4-2 compressors. Since Design 2 does not have c

and

out

, the reduction circuitry of this multiplier requires a

lower number of compressors (Figure 9(b)). Multiplier 2

uses 6 half-adders, 1 full-adder and 17 compressors.

• In the third case (Multiplier 3), Design 1 is used for the

compressors in then-1 least significant columns. The other

n most significant columns in the reduction circuitry use

exact 4-2 compressors.

Page 5 of 13 Transactions on Computers

Design and Analysis of Approximate Compressors for Multiplication

Figures

Citations

Design of Power and Area Efficient Approximate Multipliers

Design of Approximate Radix-4 Booth Multipliers for Error-Tolerant Computing

A Review, Classification, and Comparative Evaluation of Approximate Arithmetic Circuits

Dual-Quality 4:2 Compressors for Utilizing in Dynamic Accuracy Configurable Multipliers

Approximate Multipliers Based on New Approximate Compressors

References

Computer Arithmetic: Algorithms and Hardware Designs

Digital arithmetic

Bio-Inspired Imprecise Computational Blocks for Efficient VLSI Implementation of Soft-Computing Applications

New Metrics for the Reliability of Approximate and Probabilistic Adders

IMPACT: imprecise adders for low-power approximate computing

Related Papers (5)

New Metrics for the Reliability of Approximate and Probabilistic Adders

Trading Accuracy for Power with an Underdesigned Multiplier Architecture

Approximate computing: An emerging paradigm for energy-efficient design

Low-Power Digital Signal Processing Using Approximate Adders

Bio-Inspired Imprecise Computational Blocks for Efficient VLSI Implementation of Soft-Computing Applications

Frequently Asked Questions (2)

Q1. What are the contributions mentioned in the paper "Design and analysis of approximate compressors for multiplication" ?

Q2. What are the future works mentioned in the paper "Design and analysis of approximate compressors for multiplication" ?