scispace - formally typeset
Open AccessJournal ArticleDOI

Error detection in arrays via dependency graphs

Edwin H.-M. Sha, +1 more
- Vol. 4, Iss: 4, pp 331-342
Reads0
Chats0
TLDR
This paper combines the projection method of deriving systolic arrays from dependency graphs with the idea of input-triggered testing, and calls the method ITRED, forInput-driven Time-Redundancy Error Detection.
Abstract: 
This paper describes a methodology based on dependency graphs for doing concurrent run-time error detection in systolic arrays and wavefront processors. It combines the projection method of deriving systolic arrays from dependency graphs with the idea of input-triggered testing. We call the method ITRED, forInput-driven Time-Redundancy Error Detection. Tests are triggered by inserting special symbols in the input, and so the approach gives the user flexibility in trading off throughput for error coverage. Correctness of timing is proved at the dependency graph level. The method requires no extraPEs and little extra hardware. We propose several variations of the general approach and derive corresponding constraints on the modified dependency graphs that guarantee correctness. One variation performs run-time error correction using majority voting. Examples are given, including a dynamic programming algorithm, convolution, and matrix multiplication.

read more

Content maybe subject to copyright    Report

Journal of VLSI Signal Processing, 4, 331-342 (1992)
9 1992 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.
Error Detection in Arrays via Dependency Graphs*
EDWIN HSING-MEAN SHA AND KENNETH STEIGLITZ
Department of Computer Science, Princeton University, Princeton, NJ 08544
Received August 9, 1991; Revised December 3, 1991.
Abstract. This paper describes a methodology based on dependency graphs for doing concurrent run-time error
detection in systolic arrays and wavefront processors. It combines the projection method of deriving systolic arrays
from dependency graphs with the idea of input-triggered testing. We call the method ITRED, for Input-driven
7~me-Redundancy Error Detection. Tests are triggered by inserting special symbols in the input, and so the approach
gives the user flexibility in trading off throughput for error coverage. Correctness of timing is proved at the dependency
graph level. The method requires no extra PEs and little extra hardware. We propose several variations of the
general approach and derive corresponding constraints on the modified dependency graphs that guarantee correct-
ness. One variation performs run-time error correction using majority voting. Examples are given, including a
dynamic programming algorithm, convolution, and matrix multiplication.
1. Introduction
Reliability is often a critical issue in applications of
high-performance systolic or wavefront array processors,
and for that reason much recent work has addressed
the problems of on-line error detection (see, for exam-
ple, [1]). We consider in this paper a flexible and general
methodology for incorporating error detection in array
design.
The two general approaches pursued in the literature
for error detection are hardware and time redundancy.
That is, one can detect errors by introducing additional
computing hardware, perhaps duplicating PEs, or one
can do duplicate computations using the same hard-
ware. In general, there is a tradeoff between the de-
crease in throughput caused by the time redundancy,
and the cost of the extra hardware used for hardware
redundancy. A high degree of time redundancy can
achieve good error detection, but at the cost of de-
creased throughput; a high degree of hardware redun-
dancy can do the same without the attendant decrease
in throughput, but at the cost of more hardware.
Much previous work takes advantage of the regu-
larity of systolic arrays. For example [1] describes
algorithm-based techniques that are especially suited
to systolic arrays, but these are applicable only to a
subset of linear systems, and it is unclear how to use
*This work was supported in part by NSF Grant MIP-8912100, and
U.S. Army Research Office-Durham Grant DAAL03-89-K-0074.
them on problems like the substring comparison we
consider in Section 2. The work in [2], [3] uses dual-
module redundancy to detect errors; the essentially
time-redundant technique of [4] applies only to uni-
lateral linear arrays and results in a slowdown by a fac-
tor of two; [5] also deals with special classes of systolic
arrays and agalns halves the throughput rate using time
redundancy. The method of roving spares described in
[6] uses limited hardware redundancy, but it is not clear
how to extend the method to bilateral arrays or more
complicated structures.
This idea of using tokens to trigger error detection
appears to have been introduced in [7]. They use both
time and space redundancy, and a fixed periodic pattern
of inserting tokens. In the case of unilateral linear ar-
rays, the number of inserted tokens in the array at any
instant cannot exceed the number of extra PEs. Thus,
the frequency of token insertion is predetermined by
the number of extra PEs. In the case of bilateral linear
arrays, they make use of the idle PEs and idle cycles
in the original computations for space and time redun-
dancy, so only one extra PE is needed.
We will combine two ideas to achieve rim-time error
detection: First, as in [7], we introduce special symbols
in the input that signal the processors to perform com-
parisons for the purposes of detecting discrepancies.
Typically, this is done by having two (or more) adjacent
processors perform the same computation and compar-
ing results. In contrast with [7], however, the frequency
of insertion of these special symbols is determined by

332
Sha and Steiglitz
the user at run time, rather than being pre-determined
by hardware constraints. Second, we introduce the
special symbols at the level of the dependency graph,
and follow the effect through the projections used to
arrive at a systolic or wavefront array [8].
There are several advantages to this general approach
over more specialized or ad
hoc
approaches. First, it
allows the user to determine the frequency of error
checking at run time. Thus more error checking can
be done when a lower throughput is acceptable. A sec-
ond advantage stems from the fact that the method is
expressed in terms of the dependency graph. This
allows us to use previous work [8] on scheduling and
projection to prove the correctness of the resulting
working architectures. A third advantage is that the ap-
proach requires no extra
PEs,
and little extra hardware.
In the next section we briefly describe dependency
graphs using the problem of finding minimum substring-
distance as an example. In Section 3 we describe the
general methodology of ITRED. In Section 4 we discuss
our fault model at the level of array nodes, nodes in
the signal flow graph that are mapped to the working
architecture. The details of implementing ITRED for
unilateral linear arrays, which include the minimum
substring-distance problem and convolution, are dis-
cussed in Section 5. Section 6 then shows how to extend
ITRED to more general problems, using matrix multi-
plication as an example. We prove correctness in Sec-
tion 7. Finally, in Section 8 we show how ITRED can
be adapted to handle some special design requirements.
2. Minimum Substring-Distance
In this section, we introduce as a working example the
problem of finding minimum substring-distance. We
use this problem to illustrate the dependency graph
DG
and the mapping method for transforming a
DG
to an
array architecture [8]. String comparison is a time-
consuming and important operation in many applica-
tions, such as information retrieval, databases, artifi-
cial intelligence, pattern recognition, and DNA pattern
matching.
The
edit distance
between two strings is the mini-
mum number of basic operations (insertion, deletion
and substitution) necessary to transform one string to
the other. For example, chao can be transformed to
sha by a sequence of three operations as follows:
chao (delete c) -->hao (delete o) -->
ha (insert s) --> sha.
But two transformations suffice:
chao (substitute s for c) -->
shao (deleteo) -->sha.
In fact this is minimum, so the edit distance between
the two strings is two.
Systolic arrays for computing edit distance between
two strings have been described in [9]-[11]. In [12],
Landau and Vishkin consider the problem of finding
a substring of a string S most similar to a given pattern
P. Given string S and pattern P, let
S(i : j)
be the sub-
string of S from position i to position j and let
dis(S(i :
j), P) be the edit distance between
S(i
: j) and P. The
minimum substring-distance
is the minimum distance
dis(S(i
:j), P), where i andj range from 1 to the length
of S. Thus, the minimum substring-distance between
the string "I like Systolic VLSI arrays," and "Systolic
arrays" is five.
The problem of minimum substring-distance can be
solved by two-dimensional dynamic programming,
which in turn can be implemented by a one-dimensional
systolic array.
An input instance of the problem is
S = sis2 ... Sn:
a (long) string
P = PiP2 ... Pro:
a (short) string
The output of the problem is the minimum of all edit
distances of substrings
S(i - k : i) = si-ksi-~+l 9 9 si
from the pattern P, where 1 _< i < n, 0 _< k < i - 1.
The dynamic programming algorithm proceeds as
follows. Let
D[i, j]
denote the minimum distance of
all substrings as
si
from the prefix
P(1 : j),
where
1
__< i < n, 1 < j < m. Initially,
D[i,
0] = 0 for every i and
D[0, j] = j for every j.
If we think of the D[i, j] as being in a two-dimensional
array, each D [i, j] can be computed from the entries
above, to the left, and above and to the left, as follows:
for i = 1 to n do
forj = 1 tom do
D[i,j]=min ( D[i -
1,j] +
1, D[i,j -
1] + 1,
D[i - 1, j -
1]
i f s i
=pjor
D[i - 1, j -
1], otherwise )
When this double loop is completed, the entries
D[i, m]
contain the minimum distance of all substrings
ending at
si
from the pattern P. If we consider each
min
operation as a node and represent each dependence

Error Detection in Arrays via Dependency Graphs 333
33
3 2
Pl
..'" 33~
...'/]
P2 p3 P~
9 .'"" .ql,rr, ~ ..'"'"
) _:,~:i~~
/ii .....
"~, 1)
(1, 0)
F
Fig. 1. Dependency graph for minimum substring-dist.
of an operation on data as a directed edge between two
nodes, the resulting dependency graph DG is as shown
in figure 1. The graph DG is acyclic and therefore
computable.
We call a node in DG a computation cell, or cell.
As described in [8], the two design steps of processor
assignment and scheduling can be used to map such
a DG to a lower dimensional signal flow graph SFG.
We call a node of the signal flow graph a Processor
Element (PE), this being justified because the signal
flow graph is very close to a hardware specification for
a SIMD systolic or wavefront array. Let an equiproc-
essor curve be a curve containing all the ceils of the
dependency graph that are projected onto one PE of
the signal flow graph of lower dimension, and let an
equitemporal surface be a surface containing all the
computation cells that are active at a given time.
Usually, the equiprocessor curves are parallel straight
lines, in which case we let fr be a vector parallel to
the equiprocessor lines, called the projection vector.
Further, it is often the case that the dependency graph
has a linear schedule; that is, all equitemporal surfaces
are parallel hyperplanes, and so have a unique normal
direction. Let Fbe a vector in this normal direction,
called the schedule vector.
Kung [8] showed that given a projection vector 2Y,
necessary and sufficient conditions for a linear schedule
to be permissible, that is, represent a realizable com-
putation in the signal flow graph, are the following:
(1) V edge ~" i n
DG, FrF ~_ 0.
(2)
s*Tff t > 0.
In our example of the minimum substring-distance
problem, we can choose the projection vector ~ = (1, 0)
and the permissible linear schedule F = (1, 1), as shown
in figure 1. This leads to a signal flow graph with m
processors, where m is the size of the pattern P, and
that is reasonable since n, the size of the string S, is
usually very much larger than m.
3. ITRED: General Approach
In this section we discuss ways of modifying dependency
graphs to achieve error detection, and we will call a
specific algorithm for doing so a strategy. The strategy
determines the way in which special symbols are inserted
in the input data stream. We propose two approaches.
In the first, we derive some strategies that allow every
PE to be tested if the user chooses to provide the right
inputs. In the second approach not only can every PE
be tested consecutively by choice of the input stream,
but the computation results themselves can be produced
by majority vote. We begin with the first approach,
which is actually a special case of the second.
We use a special input symbol, called e~, which
serves the purpose of informing a PE to do error detec-
tion (as in [13]). When PEi receives an a symbol, PEi
will do the same operation as PEi-~ and compare its
result with that of PEi-1. (We assume here that PE i is
in fact capable of performing the same operation as
PEi_ 1. If all processors are not identical, this require-
ment might require augmenting the capabilities of some
of the processors.) If the results are not the same, an
error has been detected. The user has the freedom to
decide how frequently an o~ symbol is inserted in the
original input. At one extreme, the user inserts no ot
symbols, in which case there is no decrease in through-
put: At the other extreme, the user inserts an ot symbol
before each input data point in the original input stream,
so the throughput becomes at most half the original
speed. Thus, the tradeoff between speed and error
coverage is under user control.
DEFINITIOn 3.1. We say a strategy for inserting a's into
the input stream is oz-successful if all PEs are tested
at least once and all computation cells have the correct
timing.
Actually, ITRED can be easily extended so that
every computation cell is tested, but sometimes we may
need to add extra PEs so the computation cells on the
border can be tested.
We want to think of adding the o~ symbols into the
original dependency graph; to do this we add special

334
Sha and Steiglitz
cells called a
cells.
In the dependency graph, the effect
of an a symbol is similar to a delay, since when
PEi
receives an a symbol, it will save its state, discard what
it produces after it simulates
PEi_ l's
computation, and
then restore its previous state.
For simplicity, we first consider the case of a two-
dimensional dependency graph G like the one in figure
1, with m columns and n rows. Without loss of general-
ity, we assume that data for a particular problem in-
stance enters along a row
(row input),
and flows from
column to column. Let
gij
be a computation cell,
where
1 < i <_ n, and 1 < j <- m.
To insert an a symbol in the input stream that travels
from
PE
to
PE,
insert a complete
row
of a cells in the
dependency graph, as shown in figure 2. If this row
is inserted before row i, this splits G into two parts,
the part from row 1 to row i - 1, and the part from
row i to the last row. Keep the edges that went from
row i - 1 to i in the first part. Let 5~ be the vector
normal to the added row, so ~ is (0, 1). Note that in
other, more general situations the inserted a symbols
may not form a hyperplane, and therefore there may
not be a well defined ~ vector. We will see an example
of this in a later section.
Let a j, 1 < j _< m be the row of added a ceils,
ordered in the direction of increasing time. If column
j is projected to
PE],
add the directed edge
(a ], g i,y).
Call these edges
delay edges
and denote by c j the
computation cell pointed to by the delay edge leaving
a j. Since a j and c j project to the same
PE, the
differ-
ence between their coordinate vectors is a vector paral-
lel to ft. Figure 1 shows the original dependency graph
for the minimum substring-distance problem and figure
2 shows the dependency graph modified in the way just
discussed.
An a stream inserted into the dependency graph in
this way can be regarded as a surface, which we call
an
a-surface.
When the a-surface is a hyperplane, we
can call it an
a-hyperplane.
We say that an a-surface
is a
cutting surface
if removing it separates the depend-
ency graph into disconnected pieces. We say that a cut-
ting surface is
unicutting
if all the edges crossing this
surface cross it in the same direction. Cutting or anicut-
ting hyperplanes are defined analogously.
We next derive constraints on the way in which the
original dependency graph should be modified so that
testing takes place correctly. We prove later that these
conditions are sufficient to ensure that a strategy is a-
successful. Observe first that since we need to test every
PE,
the vector ~ cannot be perpendicular to the vector
if1,1
delay edges
g2,l
gn,l
t= t=2 t=3
,/'ffl,~ /'•1,3
,,/ //ffl,m
1
P E1 P E2 P E3 P E,~
Fig. 2. Modified dependency graph for minimum substring-dist.
p, and in fact every
PE
should be the image under pro-
jection of at least one a cell. Furthermore, because we
do not intend to increase the number of
PEs,
we also
require that each
PE
be the image under projection of
at least one computation cell.
We know that different
PEs
should be tested at dif-
ferent times, so the vector ~
cannot be parallel to the
vector Z
(When working architecture is a wavefront
array, this sequential property of the testing will be
naturally ensured by the fact that the testing is data-
driven.) Since each a j is basically a delay for some
later operation
c j
by the same
PE,
the delay edge
should be in the same direction as the vector p.
Let
PE j
be the
PE
to which a j is projected. We
know that whenever a
PE
receives an a, this
PE
needs
to do the same operation as its neighboring
PE
will do.
Thus, for each a ) there should exist a computation cell
(not an a cell) that is projected to
PEJ's
neighbor at
the same time that the a cell is projected to
PE j.
We
summarize the constraints discussed above in the fol-
lowing, which we call the Z constraints for hyperplanes.
~C constraints for hyperplanes:
O. ~ is not parallel to ~'
1. 3 an a cell on the border at which data
arrives
2. all delay edges are parallel to
3. VPE, rl
anacel I which isprojectedtoPE
4. VPE,
3 a computation cell which is pro-
jected to
PE

Error Detection in Arrays via Dependency Graphs 335
5.
rot j, 3
a non-e computation cell that is
in the sameequitemporal hyperplaneas
(~Jand is projected toa neighboringPE
of
PE j
6. The e-hyperplane is unicutting
As noted above the zeroth constraint is not needed
at all when the working architecture is a wavefront
array, so we assume without loss of generality that the
working architecture is a synchronous, systolic array,
rather than a wavefront array. Actually, the zeroth con-
straint is implied by the fifth constraint, so it is redun-
dant and can be omitted. If the equitemporal surface
or the c~ surface is not a hyperplane, we can generalize
the above constraints easily as follows:
const ra i nts :
1. 3 an cz cell on the border at which data
arrives
2. all delay edges are parallel to
3 VPE,:lanecellwhich isprojectedtoPE
4 VPE,
=1 a computat i on cel I wh i ch i s pro-
jected to
PE
5 Ve j, 3 a non-e computation cell that is
in the same equitemporal surface as (z ]
and i s p roj ected to a neighbor i ng
PE
of
pE j
6 The e-hyperplane is unicutting
If the projection, schedule, and modified depend-
ency graph satisfy the above constraints, we say that
this dependency graph is correctly modified. We leave
for Section 7 a proof that a correctly modified depend-
ency graph is a-successful.
In the second approach to modifying the dependency
graph, majority voting is applied. In this scheme k adja-
cent PEs will perform the same operation, the output
will be the majority result, and error detection will be
performed at the same time. We introduce k - 1 special
symbols el, ..., otk_l, which play roles similar to the
e symbol. For simplicity, we assume that k is 3, but
it is straightforward to extend k to be any odd number.
When PEi receives an cq symbol, it performs the same
action as before--it simulates a computation in the adja-
cent PE, say PEi_ 1. If PEi+ 1 receives an c~ 2 symbol,
it simulates the computation of a PE which is distance-2
from it, say PEi_ 1. We need to guarantee that PEi+I
receives e 2 and PE i receives cq at the same time, and
at a time when they can both simulate the same com-
putation by PEi_l, do the error detection, and output
the majority result.
Therefore, 0~2 should immediately precede e I in the
e stream. The constraints analogous to the E constraints
for performing majority voting are given below, with
all terms previously used now indexed by the same in-
dex i as the corresponding symbol e i. For example, ~i
is the normal vector for the oz i hyperplane.
~maj~kconst
ra i nts for hyperpl anes :
1. all the ~i are parallel to each other
2. the ek_t, ..., el-Symbols are in the same
equitemporal hyperplane, and are pro-
jected to k - 1 adjacent
PEs
3. the el-hyperplane satisfies the [;
Const ra i nts
The corresponding more general constraints for the
case of surfaces are:
~maj~ const ra i nts :
1. all thec~i-surfacesareparallel toeach
other
2, theek_l, ..., el-Symbols are in the same
equitemporal surface, and are projected
to k - 1 adjacent
PEs
3. the el-surface satisfies the
Const ra i nts
For example, the modified dependency graph in fig-
ure 3 satisfies the above
~maj_jc
constraints. Note that
if we want every computation cell in the dependency
graph to be tested k PEs, we may need to add some
gl,1
O~2
OL 1
delay edges
g2,1
gn,1
t=l t=2 t=3
.** ** 7" .'" /"
.'"gl,2 "*
9 /*~]l;m ,'/
," gl,3 /"
::%k
PE~ PE2 PE3 PE~
\
Fig. 3. Modified dependency graph for the minimum substring-
distance problem (approach 2).

Citations
More filters

Hardware Algorithms For Tile-Based Real-Time Rendering

D. Crisu
TL;DR: This dissertation presents a novel and efficient hardware primitive list sorting algorithm that lowers on the one hand the effort of the host processor required to generate the primitive tiling lists and reduces on the other hand the external memory traffic.
Proceedings ArticleDOI

An error-detectable array for all-substring comparison

TL;DR: The author designs an error-detectable systolic array for the problem of all-substring comparison, and analyzes the performance of the design, which incorporates a novel design methodology, called ITRED, in the design.
Proceedings ArticleDOI

Run-time error detection in arrays based on the data-dependency graph

TL;DR: ITRED (input-driven time-redundancy error detection) combines the projection method of deriving systolic arrays from dependency graphs with the idea of input-triggered testing, and gives the user flexibility in trading off throughput for error coverage.
References
More filters
Journal ArticleDOI

VLSI Array processors

Sun-Yuan Kung
- 01 Jan 1985 - 
TL;DR: A general overview of VLSI array processors and a unified treatment from algorithm, architecture, and application perspectives is provided in this article, where a broad range of application domains including digital filtering, spectrum estimation, adaptive array processing, image/vision processing, and seismic and tomographic signal processing.

VLSI array processors

Sun-Yuan Kung
TL;DR: A general overview of VLSI array processors is provided and a unified treatment from algorithm, architecture, and application perspectives is provided.
Proceedings ArticleDOI

Introducing efficient parallelism into approximate string matching and a new serial algorithm

TL;DR: Given a text of length n, a pattern of length m and an integer k, this paper presents parallel and serial algorithms for finding all occurrences of the pattern in the text with at most k differences.