Error detection in arrays via dependency graphs

doi:10.1007/BF00930644

Journal of VLSI Signal Processing, 4, 331-342 (1992)

9 1992 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.

Error Detection in Arrays via Dependency Graphs*

EDWIN HSING-MEAN SHA AND KENNETH STEIGLITZ

Department of Computer Science, Princeton University, Princeton, NJ 08544

Received August 9, 1991; Revised December 3, 1991.

Abstract. This paper describes a methodology based on dependency graphs for doing concurrent run-time error

detection in systolic arrays and wavefront processors. It combines the projection method of deriving systolic arrays

from dependency graphs with the idea of input-triggered testing. We call the method ITRED, for Input-driven

7~me-Redundancy Error Detection. Tests are triggered by inserting special symbols in the input, and so the approach

gives the user flexibility in trading off throughput for error coverage. Correctness of timing is proved at the dependency

graph level. The method requires no extra PEs and little extra hardware. We propose several variations of the

general approach and derive corresponding constraints on the modified dependency graphs that guarantee correct-

ness. One variation performs run-time error correction using majority voting. Examples are given, including a

dynamic programming algorithm, convolution, and matrix multiplication.

1. Introduction

Reliability is often a critical issue in applications of

high-performance systolic or wavefront array processors,

and for that reason much recent work has addressed

the problems of on-line error detection (see, for exam-

ple, [1]). We consider in this paper a flexible and general

methodology for incorporating error detection in array

design.

The two general approaches pursued in the literature

for error detection are hardware and time redundancy.

That is, one can detect errors by introducing additional

computing hardware, perhaps duplicating PEs, or one

can do duplicate computations using the same hard-

ware. In general, there is a tradeoff between the de-

crease in throughput caused by the time redundancy,

and the cost of the extra hardware used for hardware

redundancy. A high degree of time redundancy can

achieve good error detection, but at the cost of de-

creased throughput; a high degree of hardware redun-

dancy can do the same without the attendant decrease

in throughput, but at the cost of more hardware.

Much previous work takes advantage of the regu-

larity of systolic arrays. For example [1] describes

algorithm-based techniques that are especially suited

to systolic arrays, but these are applicable only to a

subset of linear systems, and it is unclear how to use

*This work was supported in part by NSF Grant MIP-8912100, and

U.S. Army Research Office-Durham Grant DAAL03-89-K-0074.

them on problems like the substring comparison we

consider in Section 2. The work in [2], [3] uses dual-

module redundancy to detect errors; the essentially

time-redundant technique of [4] applies only to uni-

lateral linear arrays and results in a slowdown by a fac-

tor of two; [5] also deals with special classes of systolic

arrays and agalns halves the throughput rate using time

redundancy. The method of roving spares described in

[6] uses limited hardware redundancy, but it is not clear

how to extend the method to bilateral arrays or more

complicated structures.

This idea of using tokens to trigger error detection

appears to have been introduced in [7]. They use both

time and space redundancy, and a fixed periodic pattern

of inserting tokens. In the case of unilateral linear ar-

rays, the number of inserted tokens in the array at any

instant cannot exceed the number of extra PEs. Thus,

the frequency of token insertion is predetermined by

the number of extra PEs. In the case of bilateral linear

arrays, they make use of the idle PEs and idle cycles

in the original computations for space and time redun-

dancy, so only one extra PE is needed.

We will combine two ideas to achieve rim-time error

detection: First, as in [7], we introduce special symbols

in the input that signal the processors to perform com-

parisons for the purposes of detecting discrepancies.

Typically, this is done by having two (or more) adjacent

processors perform the same computation and compar-

ing results. In contrast with [7], however, the frequency

of insertion of these special symbols is determined by

332

Sha and Steiglitz

the user at run time, rather than being pre-determined

by hardware constraints. Second, we introduce the

special symbols at the level of the dependency graph,

and follow the effect through the projections used to

arrive at a systolic or wavefront array [8].

There are several advantages to this general approach

over more specialized or ad

hoc

approaches. First, it

allows the user to determine the frequency of error

checking at run time. Thus more error checking can

be done when a lower throughput is acceptable. A sec-

ond advantage stems from the fact that the method is

expressed in terms of the dependency graph. This

allows us to use previous work [8] on scheduling and

projection to prove the correctness of the resulting

working architectures. A third advantage is that the ap-

proach requires no extra

PEs,

and little extra hardware.

In the next section we briefly describe dependency

graphs using the problem of finding minimum substring-

distance as an example. In Section 3 we describe the

general methodology of ITRED. In Section 4 we discuss

our fault model at the level of array nodes, nodes in

the signal flow graph that are mapped to the working

architecture. The details of implementing ITRED for

unilateral linear arrays, which include the minimum

substring-distance problem and convolution, are dis-

cussed in Section 5. Section 6 then shows how to extend

ITRED to more general problems, using matrix multi-

plication as an example. We prove correctness in Sec-

tion 7. Finally, in Section 8 we show how ITRED can

be adapted to handle some special design requirements.

2. Minimum Substring-Distance

In this section, we introduce as a working example the

problem of finding minimum substring-distance. We

use this problem to illustrate the dependency graph

DG

and the mapping method for transforming a

DG

to an

array architecture [8]. String comparison is a time-

consuming and important operation in many applica-

tions, such as information retrieval, databases, artifi-

cial intelligence, pattern recognition, and DNA pattern

matching.

The

edit distance

between two strings is the mini-

mum number of basic operations (insertion, deletion

and substitution) necessary to transform one string to

the other. For example, chao can be transformed to

sha by a sequence of three operations as follows:

chao (delete c) -->hao (delete o) -->

ha (insert s) --> sha.

But two transformations suffice:

chao (substitute s for c) -->

shao (deleteo) -->sha.

In fact this is minimum, so the edit distance between

the two strings is two.

Systolic arrays for computing edit distance between

two strings have been described in [9]-[11]. In [12],

Landau and Vishkin consider the problem of finding

a substring of a string S most similar to a given pattern

P. Given string S and pattern P, let

S(i : j)

be the sub-

string of S from position i to position j and let

dis(S(i :

j), P) be the edit distance between

S(i

: j) and P. The

minimum substring-distance

is the minimum distance

dis(S(i

:j), P), where i andj range from 1 to the length

of S. Thus, the minimum substring-distance between

the string "I like Systolic VLSI arrays," and "Systolic

arrays" is five.

The problem of minimum substring-distance can be

solved by two-dimensional dynamic programming,

which in turn can be implemented by a one-dimensional

systolic array.

An input instance of the problem is

S = sis2 ... Sn:

a (long) string

P = PiP2 ... Pro:

a (short) string

The output of the problem is the minimum of all edit

distances of substrings

S(i - k : i) = si-ksi-~+l 9 9 si

from the pattern P, where 1 _< i < n, 0 _< k < i - 1.

The dynamic programming algorithm proceeds as

follows. Let

D[i, j]

denote the minimum distance of

all substrings as

si

from the prefix

P(1 : j),

where

1

__< i < n, 1 < j < m. Initially,

D[i,

0] = 0 for every i and

D[0, j] = j for every j.

If we think of the D[i, j] as being in a two-dimensional

array, each D [i, j] can be computed from the entries

above, to the left, and above and to the left, as follows:

for i = 1 to n do

forj = 1 tom do

D[i,j]=min ( D[i -

1,j] +

1, D[i,j -

1] + 1,

D[i - 1, j -

1]

i f s i

=pjor

D[i - 1, j -

1], otherwise )

When this double loop is completed, the entries

D[i, m]

contain the minimum distance of all substrings

ending at

si

from the pattern P. If we consider each

min

operation as a node and represent each dependence

Error Detection in Arrays via Dependency Graphs 333

33

3 2

Pl

..'" 33~

...'/]

P2 p3 P~

9 .'"" .ql,rr, ~ ..'"'"

) _:,~:i~~

/ii .....

"~, 1)

(1, 0)

F

Fig. 1. Dependency graph for minimum substring-dist.

of an operation on data as a directed edge between two

nodes, the resulting dependency graph DG is as shown

in figure 1. The graph DG is acyclic and therefore

computable.

We call a node in DG a computation cell, or cell.

As described in [8], the two design steps of processor

assignment and scheduling can be used to map such

a DG to a lower dimensional signal flow graph SFG.

We call a node of the signal flow graph a Processor

Element (PE), this being justified because the signal

flow graph is very close to a hardware specification for

a SIMD systolic or wavefront array. Let an equiproc-

essor curve be a curve containing all the ceils of the

dependency graph that are projected onto one PE of

the signal flow graph of lower dimension, and let an

equitemporal surface be a surface containing all the

computation cells that are active at a given time.

Usually, the equiprocessor curves are parallel straight

lines, in which case we let fr be a vector parallel to

the equiprocessor lines, called the projection vector.

Further, it is often the case that the dependency graph

has a linear schedule; that is, all equitemporal surfaces

are parallel hyperplanes, and so have a unique normal

direction. Let Fbe a vector in this normal direction,

called the schedule vector.

Kung [8] showed that given a projection vector 2Y,

necessary and sufficient conditions for a linear schedule

to be permissible, that is, represent a realizable com-

putation in the signal flow graph, are the following:

(1) V edge ~" i n

DG, FrF ~_ 0.

(2)

s*Tff t > 0.

In our example of the minimum substring-distance

problem, we can choose the projection vector ~ = (1, 0)

and the permissible linear schedule F = (1, 1), as shown

in figure 1. This leads to a signal flow graph with m

processors, where m is the size of the pattern P, and

that is reasonable since n, the size of the string S, is

usually very much larger than m.

3. ITRED: General Approach

In this section we discuss ways of modifying dependency

graphs to achieve error detection, and we will call a

specific algorithm for doing so a strategy. The strategy

determines the way in which special symbols are inserted

in the input data stream. We propose two approaches.

In the first, we derive some strategies that allow every

PE to be tested if the user chooses to provide the right

inputs. In the second approach not only can every PE

be tested consecutively by choice of the input stream,

but the computation results themselves can be produced

by majority vote. We begin with the first approach,

which is actually a special case of the second.

We use a special input symbol, called e~, which

serves the purpose of informing a PE to do error detec-

tion (as in [13]). When PEi receives an a symbol, PEi

will do the same operation as PEi-~ and compare its

result with that of PEi-1. (We assume here that PE i is

in fact capable of performing the same operation as

PEi_ 1. If all processors are not identical, this require-

ment might require augmenting the capabilities of some

of the processors.) If the results are not the same, an

error has been detected. The user has the freedom to

decide how frequently an o~ symbol is inserted in the

original input. At one extreme, the user inserts no ot

symbols, in which case there is no decrease in through-

put: At the other extreme, the user inserts an ot symbol

before each input data point in the original input stream,

so the throughput becomes at most half the original

speed. Thus, the tradeoff between speed and error

coverage is under user control.

DEFINITIOn 3.1. We say a strategy for inserting a's into

the input stream is oz-successful if all PEs are tested

at least once and all computation cells have the correct

timing.

Actually, ITRED can be easily extended so that

every computation cell is tested, but sometimes we may

need to add extra PEs so the computation cells on the

border can be tested.

We want to think of adding the o~ symbols into the

original dependency graph; to do this we add special

334

Sha and Steiglitz

cells called a

cells.

In the dependency graph, the effect

of an a symbol is similar to a delay, since when

PEi

receives an a symbol, it will save its state, discard what

it produces after it simulates

PEi_ l's

computation, and

then restore its previous state.

For simplicity, we first consider the case of a two-

dimensional dependency graph G like the one in figure

1, with m columns and n rows. Without loss of general-

ity, we assume that data for a particular problem in-

stance enters along a row

(row input),

and flows from

column to column. Let

gij

be a computation cell,

where

1 < i <_ n, and 1 < j <- m.

To insert an a symbol in the input stream that travels

from

PE

to

PE,

insert a complete

row

of a cells in the

dependency graph, as shown in figure 2. If this row

is inserted before row i, this splits G into two parts,

the part from row 1 to row i - 1, and the part from

row i to the last row. Keep the edges that went from

row i - 1 to i in the first part. Let 5~ be the vector

normal to the added row, so ~ is (0, 1). Note that in

other, more general situations the inserted a symbols

may not form a hyperplane, and therefore there may

not be a well defined ~ vector. We will see an example

of this in a later section.

Let a j, 1 < j _< m be the row of added a ceils,

ordered in the direction of increasing time. If column

j is projected to

PE],

add the directed edge

(a ], g i,y).

Call these edges

delay edges

and denote by c j the

computation cell pointed to by the delay edge leaving

a j. Since a j and c j project to the same

PE, the

differ-

ence between their coordinate vectors is a vector paral-

lel to ft. Figure 1 shows the original dependency graph

for the minimum substring-distance problem and figure

2 shows the dependency graph modified in the way just

discussed.

An a stream inserted into the dependency graph in

this way can be regarded as a surface, which we call

an

a-surface.

When the a-surface is a hyperplane, we

can call it an

a-hyperplane.

We say that an a-surface

is a

cutting surface

if removing it separates the depend-

ency graph into disconnected pieces. We say that a cut-

ting surface is

unicutting

if all the edges crossing this

surface cross it in the same direction. Cutting or anicut-

ting hyperplanes are defined analogously.

We next derive constraints on the way in which the

original dependency graph should be modified so that

testing takes place correctly. We prove later that these

conditions are sufficient to ensure that a strategy is a-

successful. Observe first that since we need to test every

PE,

the vector ~ cannot be perpendicular to the vector

if1,1

delay edges

g2,l

gn,l

t= t=2 t=3

,/'ffl,~ /'•1,3

,,/ //ffl,m

1

P E1 P E2 P E3 P E,~

Fig. 2. Modified dependency graph for minimum substring-dist.

p, and in fact every

PE

should be the image under pro-

jection of at least one a cell. Furthermore, because we

do not intend to increase the number of

PEs,

we also

require that each

PE

be the image under projection of

at least one computation cell.

We know that different

PEs

should be tested at dif-

ferent times, so the vector ~

cannot be parallel to the

vector Z

(When working architecture is a wavefront

array, this sequential property of the testing will be

naturally ensured by the fact that the testing is data-

driven.) Since each a j is basically a delay for some

later operation

c j

by the same

PE,

the delay edge

should be in the same direction as the vector p.

Let

PE j

be the

PE

to which a j is projected. We

know that whenever a

PE

receives an a, this

PE

needs

to do the same operation as its neighboring

PE

will do.

Thus, for each a ) there should exist a computation cell

(not an a cell) that is projected to

PEJ's

neighbor at

the same time that the a cell is projected to

PE j.

We

summarize the constraints discussed above in the fol-

lowing, which we call the Z constraints for hyperplanes.

~C constraints for hyperplanes:

O. ~ is not parallel to ~'

1. 3 an a cell on the border at which data

arrives

2. all delay edges are parallel to

3. VPE, rl

anacel I which isprojectedtoPE

4. VPE,

3 a computation cell which is pro-

jected to

PE

Error Detection in Arrays via Dependency Graphs 335

5.

rot j, 3

a non-e computation cell that is

in the sameequitemporal hyperplaneas

(~Jand is projected toa neighboringPE

of

PE j

6. The e-hyperplane is unicutting

As noted above the zeroth constraint is not needed

at all when the working architecture is a wavefront

array, so we assume without loss of generality that the

working architecture is a synchronous, systolic array,

rather than a wavefront array. Actually, the zeroth con-

straint is implied by the fifth constraint, so it is redun-

dant and can be omitted. If the equitemporal surface

or the c~ surface is not a hyperplane, we can generalize

the above constraints easily as follows:

const ra i nts :

1. 3 an cz cell on the border at which data

arrives

2. all delay edges are parallel to

3 VPE,:lanecellwhich isprojectedtoPE

4 VPE,

=1 a computat i on cel I wh i ch i s pro-

jected to

PE

5 Ve j, 3 a non-e computation cell that is

in the same equitemporal surface as (z ]

and i s p roj ected to a neighbor i ng

PE

of

pE j

6 The e-hyperplane is unicutting

If the projection, schedule, and modified depend-

ency graph satisfy the above constraints, we say that

this dependency graph is correctly modified. We leave

for Section 7 a proof that a correctly modified depend-

ency graph is a-successful.

In the second approach to modifying the dependency

graph, majority voting is applied. In this scheme k adja-

cent PEs will perform the same operation, the output

will be the majority result, and error detection will be

performed at the same time. We introduce k - 1 special

symbols el, ..., otk_l, which play roles similar to the

e symbol. For simplicity, we assume that k is 3, but

it is straightforward to extend k to be any odd number.

When PEi receives an cq symbol, it performs the same

action as before--it simulates a computation in the adja-

cent PE, say PEi_ 1. If PEi+ 1 receives an c~ 2 symbol,

it simulates the computation of a PE which is distance-2

from it, say PEi_ 1. We need to guarantee that PEi+I

receives e 2 and PE i receives cq at the same time, and

at a time when they can both simulate the same com-

putation by PEi_l, do the error detection, and output

the majority result.

Therefore, 0~2 should immediately precede e I in the

e stream. The constraints analogous to the E constraints

for performing majority voting are given below, with

all terms previously used now indexed by the same in-

dex i as the corresponding symbol e i. For example, ~i

is the normal vector for the oz i hyperplane.

~maj~kconst

ra i nts for hyperpl anes :

1. all the ~i are parallel to each other

2. the ek_t, ..., el-Symbols are in the same

equitemporal hyperplane, and are pro-

jected to k - 1 adjacent

PEs

3. the el-hyperplane satisfies the [;

Const ra i nts

The corresponding more general constraints for the

case of surfaces are:

~maj~ const ra i nts :

1. all thec~i-surfacesareparallel toeach

other

2, theek_l, ..., el-Symbols are in the same

equitemporal surface, and are projected

to k - 1 adjacent

PEs

3. the el-surface satisfies the

Const ra i nts

For example, the modified dependency graph in fig-

ure 3 satisfies the above

~maj_jc

constraints. Note that

if we want every computation cell in the dependency

graph to be tested k PEs, we may need to add some

gl,1

O~2

OL 1

delay edges

g2,1

gn,1

t=l t=2 t=3

.** ** 7" .'" /"

.'"gl,2 "*

9 /*~]l;m ,'/

," gl,3 /"

::%k

PE~ PE2 PE3 PE~

\

Fig. 3. Modified dependency graph for the minimum substring-

distance problem (approach 2).

Error detection in arrays via dependency graphs

Citations

Hardware Algorithms For Tile-Based Real-Time Rendering

An error-detectable array for all-substring comparison

Run-time error detection in arrays based on the data-dependency graph

References

VLSI Array processors

VLSI array processors

Fault tolerant and fault testable hardware design

Error detecting codes, self-checking circuits and applications

Introducing efficient parallelism into approximate string matching and a new serial algorithm

Related Papers (5)

Energy efficient runtime approximate computing on data flow graphs

Wavefront technology mapping

High performance algorithms for MPEG motion estimation

An efficient implementation of an optimal time domain ECG coder

Accurate rounding scheme for the Newton-Raphson method using redundant binary representation