scispace - formally typeset
Open AccessJournal ArticleDOI

Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing

TLDR
This self-contained paper develops the theory necessary to statically schedule SDF programs on single or multiple processors, and a class of static (compile time) scheduling algorithms is proven valid, and specific algorithms are given for scheduling SDF systems onto single ormultiple processors.
Abstract
Large grain data flow (LGDF) programming is natural and convenient for describing digital signal processing (DSP) systems, but its runtime overhead is costly in real time or cost-sensitive applications. In some situations, designers are not willing to squander computing resources for the sake of programmer convenience. This is particularly true when the target machine is a programmable DSP chip. However, the runtime overhead inherent in most LGDF implementations is not required for most signal processing systems because such systems are mostly synchronous (in the DSP sense). Synchronous data flow (SDF) differs from traditional data flow in that the amount of data produced and consumed by a data flow node is specified a priori for each input and output. This is equivalent to specifying the relative sample rates in signal processing system. This means that the scheduling of SDF nodes need not be done at runtime, but can be done at compile time (statically), so the runtime overhead evaporates. The sample rates can all be different, which is not true of most current data-driven digital signal processing programming methodologies. Synchronous data flow is closely related to computation graphs, a special case of Petri nets. This self-contained paper develops the theory necessary to statically schedule SDF programs on single or multiple processors. A class of static (compile time) scheduling algorithms is proven valid, and specific algorithms are given for scheduling SDF systems onto single or multiple processors.

read more

Content maybe subject to copyright    Report

24
IEEE
TRANSACTIONS ON COMPUTERS,
VOL.
'2-36,
NO.
1.
JANUARY
1987
1'
i
Static Scheduling
of
Synchronous Data
Flow
Programs
for
Digital Signal Processing
EDWARD ASHFORD
LEE,
MEMBER,
IEEE,
AND
DAVID
G.
MESSERSCHMI'TT,
FELLOW,
IEEE
Abstract-hrge grain data flow (LGDF) programming is
natural and convenient for describing digital signal processing
(DSP) systems, but its runtime overhead is costly
in
real time
or
cost-sensitive applications. In some situations, designers are
not
willing to squander computing resources for the sake of program-
mer convenience. This is particularly true when the target
machine is
a
programmable DSP chip. However, the runtime
overhead inherent in most
LGDF
implementations is
not
required
for most signal processing systems because such systems are
mostly synchronous (in the DSP sense). Synchronous data
flow
(SDF) differs from traditional data flow in that the amount of
data produced and consumed by a data flow node
is
specified
a
priori
for each input and output. This is equivalent to specifying
the relative sample rates in signal processing system. This means
that the scheduling of SDF nodes need not be done at runtime,
but can be done at compile time (statically),
so
the runtime
overhead evaporates. The sample rates can all
be
different, which
is
not
true
of
most
current data-driven digital signal processing
programming methodologies. Synchronous data flow
is
closely
related to computation graphs,
a
special case of Petri
nets.
This self-contained paper develops the theory necessary
to
statically schedule
SDF
programs
on
single
or multiple proces-
sors.
A
class of static (compile time) scheduling algorithms is
proven valid, and specific algorithms are given for scheduling
SDF
systems onto single or multiple processors.
I
Index
Terms-Block diagram, computation graphs, data flow
digital signal processing, hard real-time systems, multiprocessing,
Petri nets, static scheduling, synchronous data flow.
I.
INTRODUCTION
0
achieve high performance
in
a processor specialized for
T
signal processing, the need to depart from the simplicity
of
von Neumann computer architectures is axiomatic. Yet,
in
the software realm, deviations from von Neumann program-
ming are often viewed with suspicion.
For
example, in the
design of most successful commercial signal processors today
[
11-[5], compromises are made to preserve sequential pro-
gramming. Two notable exceptions are the Bell
Labs
DSP
family
[6],
[7]
and the NEC data flow chip
[8],
both of which
are
programmed with concurrency in mind.
For
the majority,
however, preserving von Neumann programming style is
given priority.
This
practice has a long and distinguished history. Often, a
new non-von Neumann architecture has elaborate hardware
Manuscript received
August
15.
1985; revised March 17. 1986. This work
was supported
in
pan
by
the National Science Foundation under Grant ECS-
8211071, an
IBM
Fellowship. and
a
grant from the Shell Development
Corporation.
The
authors are with the Department of Electrical Engineering and
Computer Science, University of California, Berkeley,
CA
94720.
IEEE
Log
Number
861
1442.
and software techniques enabling a programmer to write
sequential code irrespective
of
the parallel nature
of
the
underlying hardware.
For
example, in machines
with
multiple
function units, such as the
CDCW
and Cray family,
so
called "scoreboarding" hardware resolves conflicts to ensure
the integrity of sequential code. In deeply pipelined machines
such as the IBM
360
Model
91,
interlocking mechanisms
[9]
resolve pipeline conflicts. In the M.I.T. Lincoln
Labs
signal
processor
[
101
specialized associative memories are used to
ensure the integrity of
data
precedences.
The affinity
for
von
Neumann programming is not all
surprising, stemming
from
familiarity and a proven track
record, but the cost is high
in
the design of specialized digital
signal processors. Comparing two pipelined chips that differ
radically only in programming methodology, the
TI
TMS32010
[2]
and the Bell Labs
DSP20,
a faster version of
the DSPl
[6],
we find that they achieve exactly the same
performance on the most basic benchmark, the
FIR
(finite
impulse response) filter. But the Bell
Labs
chip
outperforms
the
TI
chip
on
the next
most
basic benchmark, the
IIR
(infinite
impulse response) filter. Surprisingly, close examination
reveals that the arithmetic hardware (multiplier and
ALU)
of
the Bell
Labs
chip
is half as fast as in the TI chip.
The
performance gain appears to follow from the departure
from
conventional sequential programming.
However, programming the Bell Labs chip is not easy.
The
code more closely resembles horizontal microcode than
assembly languages. Programmers invariably adhere to the
quaint custom of programming these processors in assembler-
level languages, for maximum use of hardware resources.
Satisfactory compilers have failed to appear.
In this paper, we propose programming signal processors
using a technique based on large grain data flow
(LGDF)
languages
[
1
I].
which should ease the programming task by
enhancing the modularity of code
and
permitting algorithms
to
be
described more naturally.
In
addition, concurrency is
immediately evident in the program description,
so
parallel
hardware resources can
be
used
more effectively. We begin by
reviewing the data
flow
paradigm and its relationship with
previous methods applied to signal processing.
Synchronous
data
flow
(SDF)
is
introduced,
with its suitability for
describing signal processing systems explained. The advan-
tage of
SDF
over conventional data flow is that
more
efficient runtime code can be generated because the data flow
nodes can
be
scheduled
at
compile time, rather than at
runtime.
A
class
of
algorithms for constructing sequential
(single processor) schedules is proven valid, and a simple
001
8-9340/87/0100-0024$01
.OO
0
1987
IEEE

..
I
i:
i
t
i
LEE
AND,
MESSERSCHMITT.
STATIC
SCHEDUUNG
OF
SYNCHRONOUS
DATA
heuristic for constructing parallel (multiprocessor) schedules
is
described. Finally, the limitations
of
the model are
considered.
II.
THE
DATA
FLOW
PARADIGM
In
data flow,
a
program is divided into pieces
(nodes
or
blocks)
which can execute
(fire)
whenever input data are
available
[
121,
[
131.
An
algorithm is described as
a
dataflow
graph,
a directed graph where the nodes represent functions
and the
arcs
represent data paths, as shown in Fig. 1. Signal
processing algorithms are usually described in the literature by
a
combination of mathematical expressions and block dia-
grams. Block diagrams are
large grain dataflow
(LGDF)
graphs, [14]-1161, in which
the
nodes
or
blocks may be
atomic
(from the Greek
atornos,
or
indivisble), such as
adders
or
multipliers,
or
nonatomic (large grain), such as
digital filters, FFT units, modulators,
or
phase locked loops.
The arcs connecting blocks show the signal paths, where a
signal
is
simply
an
infinite stream
of
data, and each data token
is called
a
sample.
The complexity of the functions (the
granularify)
will determine the
amount
of
parallelism availa-
ble because, while the blocks can sometimes be executed
concurrently, we make
no
attempt to exploit the concurrency
inside
a
block. The functions within the blocks can
be
specified using conventional von Neumann programming
techniques.
If
the granularity is at the level of signal
processing subsystems (second-order sections, butterfly units,
etc.), then the specification of
a
system will be extremely
natural and enough concurrency will
be
evident
to
exploit at
least small-scale parallel processors. The blocks can themsel-
ves represent another data
flow
graph,
so
the specification can
be hierarchical. This is consistent with the general practice in
signal processing where, for example, an adaptive equalizer
may be treated as
a
block
in
a
large system, and may
be
itself a
network
of
simpler blocks.
LGDF
is
ideally suited for signal processing, and has
been
adopted in simulators in the past [17]. Other signal processing
systems
use
a
data-driven paradigm
to
partition a task among
cooperating processors
[181,
and many
so
called “block
diagram languages” have
been
developed
to
permit program-
mers
to
describe signal processing systems more naturally.
Some examples are Blodi [19], Patsi [20], Blodib [21],
Lotus
1221,
Dare
[23],
Mitsyn [24], Circus 1251, and Topsirn [26].
But these simulators are based
on
the principle
of
“next state
simulation” [20], [27] and thus have difficulty with multiple
sample
rates,
not
to mention asynchronous systems. (We use
the
term
“asynchronous” here in the DSP sense
to
refer to
systems with sample rates that are
not
related by a rational
multiplicative factor.) Although true asynchrony
is
rare in
signal processing, multiple sample rates are common, stem-
ming from the frequent
use
of decimation and interpolation.
The technique we propose here handles multiple sample rates
easily.
In
addition to being natural for
DSP,
large grain data flow
has another significant advantage for signal processing.
As
long as the integrity of the
flow
of
data is preserved, any
implementation
of
a
data flow description will produce the
same results. This means that the same software description
of
FLOW
PROGRAMS
25
Fig.
1.
A
three
node
data
flow
grsph
with OIY
input
and
two
outputs.
The
nodes
represent
functions
of
arbitrary
complexity,
and
the
arcs represent
paths
on
which
sequences
of
data
(fokens
or
surnpks)
flow.
a
signal processing system
can
be
simulated
on
a
single
processor
or
multiple processors, implemented in specialized
hardware,
or
even,
ultimately, compiled into a
VLSI
chip
III.
SYNCHRONOUS
DATA
FLOW GRAPHS
In
this paper we concentrate
on
synchronous
systems.
At
the risk
of
being pedantic, we define this precisely.
A
block
is
a function that is invoked when there is
enough
input available
to
perform
a
computation (blocks lacking inputs can be
invoked at any time). When a block is invoked, it will
consume
a
fixed number
of
new input samples
on
each input
path. These samples may remain in the system for some time
to be used
as
old samples
[17],
but they will never again
be
considered new samples.
A
block
is
said
to
be
synchronous
if
we can specify
a
priori
the
number
of
input samples consumed
on
each input and the number
of
output samples produced on
each output each time the block is invoked.
Thus,
a
synchro-
nous block is shown in Fig.
2(a)
with
a
number associated with
each input
or
output specifying the number
of
inputs consumed
or
the number
of
outputs produced. These numbers are part of
the block definition.
For
example,
a
digital filter block would
have one input and
one
output, and the number of input
samples consumed
or
output samples produced would
be
one.
A
2:l decimator block would also have one input and one
output, but would consume two samples for every sample
produced.
A
synchronous data
flow (SDF)
graph
is
a
network
of
synchronous blocks,
as
in Fig. 2@).
SDF
graphs are closely related
to
computation graphs.
introduced in
1966
by
Karp
and
Miller
[29]
and
further
explored by Reiter
[30].
Computation graphs are slightly more
elaborate than SDF graphs, in that each input
to
a
block has
two
numbers associated with it,
a
threshold
and the number of
samples consumed. The threshold specifies the number of
samples required to invoke the block, and may be different
from the number
of
samples consumed by the block.
It
cannot,
of
course, be smaller than the number
of
samples
consumed.
The use of a distinct threshold in the model, however, does
not
significantly change the
results
presented in
this
paper,
so
for
simplicity, we assume these
two
numbers are the same. Karp
and Miller [29] show that computations specified by
a
computation graph are determinate, meaning that the same
computations are performed
by
any proper execution.
This
type of theorem,
of
course.
also
underlies the validity of data
flow descriptions. They also give
a
test
to
determine whether
a
computation terminates. which is potentially useful because in
signal processing we are mainly interested in computations
that
do
not terminate.
We
assume that signal processing
t281*

IEEE
TRANSACTIONS ON
COMPUTERS.
VOL.
C-36.
NO.
1.
JANUARY
1987
(b)
Fig.
2.
(a)
A
synchronous
node.
(a)
A
synchronous
data
flow
graph.
systems repetitively apply an algorithm
to
an infinite sequence
of
data.
To
make it easier to describe such applications, we
expand the model slightly to allow nodes with no inputs. These
can fire
at
any time. Other results presented in
[29]
are only
applicable
to
computations that terminate, and therefore are
not
useful in
our
application.
Computation graphs have
been
shown to
be
a special case
of
Petri nets
[31]-[33] or
vector addition system
[N].
These
more general models can be used to describe asynchronous
systems. There has
also
been work with models that are
special cases
of
computation graphs. In 1971, Commoner and
Holt (351 described
marked directed graphs,
and reached
some conclusions similar to those presented in this paper.
However, marked directed graphs are much more restricted
samples produced
or
consumed on any arc to unity. This
extessively restricts the sample rates in the system, reducing
the utility
of
the model.
In
1968, Reiter [36] simplified the
computation graph model in much the same way (with minor
variations), and tackled
a
scheduling problem. However, his
scheduling problem assumes that each node in the graph is a
processor, and the only unknown is the firing time for the
invocation
of
each
associated function. In this paper we
preserve the generality
of
computation graphs and solve
a
different scheduling problem, relevant to data flow program-
ming, iv which nodes represent functions that must be mapped
Onto processors.
Implementing the signal processing system described by a
SDF
graph requires
buffering
the data samples passed
between blocks and
scheduling
6locks
so
that they are
executed when data are available. This could
be
done
dynamically,
in
which case a runtime supervisor determines
wben blocks
are
ready for execution and schedules them onto
pcessors
as they become
free.
This
runtime supervisor may
be
a software routine
or
specialized hardware, and is the same
as
the control mechanisms generally associated with data flow.
It
is
a
costly approach, however, in that the supervisory
overhead can become severe, particularly if relatively little
computation is done each time a block is invoked.
SDF
graphs, however, can
be
scheduled statically
(at
compile time), regardless
of
the number
of
processors, and the
overhead associated with dynamic control evaporates. Specifi-
cally, a
large grain compiler
determines the order in which
nodes can
be
executed and constructs sequential code for each
t
than
SDF
graphs because they constrain the number
of
,
processor. Communication between nodes and between proc-
essors is set up by the compiler,
so
no
runtime control is
required beyond the traditional sequential control in the
processors. The
LGDF
paradigm gives the programmer
a
natural interface for easily constructing well structured signal
processing programs, with evident concurrency, and the large
grain compiler maps this concurrency onto parallel proces-
sors.
This paper is dedicated mainly to demonstrating the
feasibility
of
such a large grain compiler.
IV.
A
SYNCHRONOUS
LARGE
GRAIN
COMPILER
We
need
a
methodology for translating from an
SDF
graph
to a set
of
sequential programs running on
a
number
of
processors. Such a compiler has the
two
following basic tasks.
Allocation
of
shared memory for the passing
of
data
between blocks, if shared memory exists,
or
setting
up
communication paths if not.
Scheduling blocks onto processors
in
such
a
way that data
is available
for
a block when that block is invoked.
The first task is not an unfamiliar
one.
A
single processor
solution (which
also
handles asynchronous systems) is given
by the buffer management techniques in Blosim
[
171. Simplifi-
cations
of
these techniques that
use
the synchrony
of
the
system are easy to imagine, as are generalizations to multiple
processors,
so
this paper will concentrate
on
the second task.
that
of
scheduling blocks onto processors
so
that data are
available when a block is invoked.
Some assumptions are necessary.
The
SDF
graph is nonterminating (cf. [29], [30]) mean-
ing that it can run forever without deadlock.
As
mentioned
earlier, this assumption is natural for signal processing.
The
SDF
graph
is
connected.
If
not, the separate graphs
can
be
scheduled separately using subsets
of
the processors.
The
SDF
graph is nonterminating (cf.
1291.
f301)
meaning’
that it can run forever without deadlock.
As
mentioned earlier.
this assumption is natural for signal processing.
Specifically,
our
ultimate goal is
a
periodic admissible
parallel schedule, designated
PAPS.
The schedule should be
periodic
because
of
the assumption that we are repetitively
applying the same program
on
an infinite stream
of
data. The
desired schedule is
admissible,
meaning that blocks will be
scheduled to run only when data are available, and that a finite
amount
of
memory is required.
It
is
parallel
in that more than
one processing resource can
be
used.
A
special case is
a
periodic admissible
sequential
schedule, or
PASS,
which
implements an
SDF
graph on a single processor. The method
for constructing
a
PASS
leads to a simple solution
to
the
problem
of
constructing a
PAPS,
so
we begin with the
sequential schedule.
A.
Construction
of
a
PASS
A
simple
SDF
graph is shown in Fig.
3,
with each block and
each arc labeled with a number. (The connections
to
the
outside world are not considered, and for the remainder
of
this
paper, will not
be
shown.
Thus,
a block with one input from
the outside will
be
considered a block with no inputs, which
can therefore
be
scheduled at any time. The limitations
of
this
approximation are discussed
in
Section
V.)
An
SDF
graph can

~?1:
.*%D
MESSERSCHMITT:
STATIC
SCHEDULING
OF
SYNCHRONOUS
DATA
FLOW
PROGRAMS
27
I3
Fig.
3.
SDF
graph showing
the
numbering
of
the
nodes and arcs.
The
input
and
output
arcs
arc
ignored
for
now.
be
characterized by
a
matrix similar
to
the incidence matrix
associated with directed graphs in graph theory.
It
is con-
structed by first numbering each node and arc, as in Fig.
3,
and assigning
a
column to each node and a
row
to each arc.
The
(i,
j)th
entry in the matrix is the amount
of
data produced
bj
node
j
on
arc
i
each time it is invoked.
If
node
j
consumes
data
from arc
i,
the number is negative, and if it is not
connected to arc
i,
then the number is zero.
For
the graph in
Fig.
3
we get
This matrix can
be
called a
topology
matrix,
and need
not
be
square, in general.
If
a
node has
a
connection to itself
(a
serf-loop),
then only
one entry in
I?
describes this link. This entry gives the net
difference between the amount of data produced
on
this link
and the amount consumed each time the block is invoked. This
difference should clearly
be
zero for
a
correctly constructed
graph,
so
the
I?
entry describing
a
self-loop should
be
zero.
We can replace each arc with a FIFO queue (buffer)
to
pass
data
from
one
block
to
another. The size
of
the queue will vary
at
different times in the execution. Define the vector
b(n)
to
contain the queue sizes
of
all the buffers at time
n.
In
Blosim
[
171
buffers are
also
used
to
store old samples (samples that
have
been
?consumed?), making implementations
of
delay
lines panicularly easy. These past samples are not considered
part
of
the buffer size here.
For
the sequential schedule, only one block can
be
invoked
at
a
time, and for the purposes
of
scheduling it
does
not
matter
how long it runs.
Thus,
the index
n
can simply
be
incremented
each time
a
block finishes and
a
new block is begun.
We
specify the block invoked at time
n
with
a
vector
u(n),
which
has
a
one in the position corresponding to the number
of
the
block that is invoked at time
n
and zeros for each block that is
not
invoked.
For
the system in Fig.
3,
in a sequential schedule,
o(n)
can
take one
of
three values,
depending
on
which of the three blocks is invoked. Each time
a
block is invoked, it will consume
data
from zero
or
more
input arcs and produce data on
zero
or
more output arcs.
The
change in the size
of
the buffer queues caused by invoking a
node is given by
b(n+
i)=b(n)+ru(n).
(3)
Fig.
4.
An example
of
M
SDF
graph
with
delays
on
the
arcs.
The topology matrix
I?
characterizes the effect
on
the buffers
of
running a node program.
The simple computation
model
is powerful. First we
note
that the computation model handles delays.
The
term
delay
is
used
in the signal processing sense, corresponding
to
a
sample
offset between the input and the output. We define
a
unit
delay
on
an
arc from node
A
to node
B
to
mean that the nth
sample consumed by
B
will be the
(n
-
1)th sample produced
by
A.
This
implies that the first sample the destination block
consumes is
not
produced by the source block at all, but is pan
of
the initial state
of
the arc buffer. Indeed,
a
delay of
d
samples
on
an arc is implemented in our model simply by
setting an initial condition for
(3).
Specifically, the initial
buffer state,
b(O),
should have a
d
in the position correspond-
ing
to
the
arc
with the delay
of
d
units.
To make this idea firm, consider the example system in Fig.
4.
The
symbol
?0?
on
an arc means
a
single sample delay,
while
?20?
means
a
two-sample delay. The initial condition
for the buffers
is
thus
--
(4)
Because of these initial conditions, block
2
can
be
invoked
once and block
3
twice before block
1
is invoked at all.
Delays, therefore, affect the way the system starts
up.
Given this computation model we can
find necessary and sufficient conditions for the existence
find practical algorithms
that
provably find a PASS if one
find practical algorithms that construct a reasonable (but
We begin by showing that a necessary condition for
the
of
a
PASS, and hence
a
PAPS;
exists;
not
necessarily optimal) PAPS, if a PASS exists.
existence
of
a PASS is
rank
(I?)=s-l
(5)
where
s
is
the
number of blocks in the graph.
We
need
a
series
of
lemmas before we can prove this. The word ?node? is used
below to refer to the blocks because it is traditional in graph
theory.
Lemma
I:
All
topology matrices
for
a
given
SDF
graph
have the same rank.
ProoJ
Topology
matrices
are
related by renumbering of
nodes and arcs, which translates into row and column
permutations in the topology
matrix.
Such operations preserve
the rank. Q.E.D.
Lemma
2:
A
topology matrix for
a
tree
graph has rank
s
-
1 where
s
is the number
of
nodes
(a
tree is
a
connected
graph without cycles, where we ignore the directions
of
the
arcs).
Proof:
Proof is by induction. The lemma
is
clearly true
for
a
two-node tree. Assume that for an
N
node tree

IEEE
TRANSACTIONS ON
COMPUTERS.
VOL.
C-36.
NO.
I.
JANUAR'I'
1987
28
I'
rank(rN)
=
N
-
1.
Adding one node and one link
connecting
that
node to our graph will yield an
N
+
1
node
tree. A topology matrix for the new graph can
be
written
r"I0
Fig.
5.
Example
of
a
defective
SDF
Bnph
with
sample rate inconsistencies.
rN+I=
[
73
The
topology
matrix
is
where
0
is
a
column vector full
of
zeros, and
pf
is
a
row
vector corresponding to the arc we
just
added. The last entry in
the vector
p
is nonzero because'the node we
just
added
corresponds
to
the
last column, and it must
be
connected to the
graph. Hence, the last row is linearly independent from the
other rows,
so
rank(rN,J
=
rank(rN)
+
I.
Q.E.D.
Lemma
3:
For a connected SDF graph with topology
matrix
I'
[:
-f
-']
rank
(I')=s=3.
r=
o
0
-I
rank
(r)
2
s-1
where
s
is the number
of
nodes in the graph.
Proof:
Consider any spanning tree
7 of
the connected
SDF graph
(a
spanning tree is
a
tree that includes every node
in the graph). Now define
r,
to
be
the topology matrix for this
subgraph. By Lemma
2
rank
(T',)
=
s
-
1.
Adding arcs
to
the subgraph simply adds rows to the topology matrix. Adding
rows to
a
matrix can increase the rank, if the rows are linearly
independent
of
existing rows, but cannot decrease it. Q.E.D.
Proof:
r
has only
s
columns,
so
its rank cannot exceed
s.
Therefore, by Lemma
3,
s
and
s
-
I
are the only
possibilities. Q.E.D.
Definition
I:
An
admissible sequential schedule
+
is a
nonempty ordered list of nodes such that if the nodes are
executed in the sequence given by
4,
the amount
of
data in
the
buffers
('
'buffer sizes") will remain nonnegative and
bounded. Each node must appear in
(b
at least once.
A
periodic admissible sequential schedule
(PASS) is
a
periodic and infinite admissible sequential schedule.
It
is
specified by
a
list
(b
that
is the list of nodes in one period.
For the example in Fig.
6,
(b
=
{
1
,
2,
3,
3)
is a PASS, but
4
=
(2,
1,
3,
3)
is not because node
2
cannot be run before
node
1.
The list
4
=
{
1
,
2,
3)
is
not
a PASS because the
infinite schedule resulting from repetitions
of
this list will
result in an infinite accumulation of
data
samples on the arcs
leading into node
3.
Theorem
I:
For
a
connected SDFgraph with
s
nodes and
topology matrix
r,
rank
(F)
=
s
-
1
is
a
necessary condition
for
a
PASS
to
exist.
Proo$
We
must prove that the existence
of
a PASS
of
pcnodp
implies
rank
(T')
=
s
-
1.
Observe from
(3)
that we
can
write
Corollury:
rank
(r)
is
s
-
1
or
s.
where
n=O
Since the PASS is periodic, we
can
write
b(np)
=
b(0)
+
nrq.
Fig.
6.
An
SDF
graph
with
consistent sample rates has a positive integer
vector
q
in
the
nullspace of the topology matrix
r.
r=
[
o
-:
0
-1
-'I
.=[:I
cv(r)
Since the PASS is admissible, the buffers must remain
bounded, by Definition
1.
The buffers remain bounded if and
only if
rq=o
where
0
is
a
vector full
of
zeros. For
q
#
0,
this implies
that
rank(r)
<
s
where
s
is the dimension
of
q.
From the corollary
of
Lemma
3,
runk(r)
is either
s
or
s
-
1,
and
so
it must
be
s
-
I.
Q.E.D.
This theorem tells
us
that if we have
a
SDF graph with
a
topology matrix
of
rank
s,
that the graph
is
somehow
defective, because
no
PASS can
be
found for it. Fig.
5
illustrates such
a
graph and its topology matrix. Any schedule
for this graph will result either
in
deadlock
or
unbounded
buffer sizes, as the reader can
easily
verify. The
rank
of
the
topology matrix indicates
a
sample rate inconsistency
in the
graph.
In
Fig.
6,
by contrast.
a
graph without this defect is
shown. The topology matrix
has
rank
s
-
1
=
2.
so
we can
find a vector
q
such
that
rq
=
0.
Furthermore, the following
theorem shows that we can find
a
positive integer vector
q
in
the nullspace of
r.
This vector tells
us
how many times
we
should invoke each node in
one
period
of
a
PASS. Refemng
again to Fig.
6,
the
reader can easily verify that if we invoke
node
1
once, node
2
once, followed by node
3
twice, that the
buffers will end up once again in their initial state.
A5
before.
we prove some lemmas
before
getting
to
the theorem.
Lemma
4:
Assume a connected SDF graph with topology
matrix
r.
Let
q
be
any vector such that
rq
=
0.
Denote
a
connected path through the graph by the
set
B
=
{
b,,
*
*
.
,
bL
}
where each entry designates a
node,
and
node
bl
is
connected
to
node
bz.
node
bz
to
node
b3,
up
to
bL.
Then all
q,,
i
E
B
are zero,
or
all are strictly positive,
or
all are strictly
negative. Furthermore, if any
q,
is rational then all
q,
are
rational.

Citations
More filters
Journal ArticleDOI

Synchronous data flow

TL;DR: A preliminary SDF software system for automatically generating assembly language code for DSP microcomputers is described, and two new efficiency techniques are introduced, static buffering and an extension to SDF to efficiently implement conditionals.
Journal ArticleDOI

Dataflow process networks

TL;DR: Dataflow process networks are shown to be a special case of Kahn process networks, a model of computation where a number of concurrent processes communicate through unidirectional FIFO channels, where writes to the channel are nonblocking, and reads are blocking.
Journal ArticleDOI

Big data

TL;DR: This paper presents a comprehensive discussion on state-of-the-art big data technologies based on batch and stream data processing based on structuralism and functionalism paradigms and strengths and weaknesses of these technologies are analyzed.
Journal ArticleDOI

A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures

TL;DR: The authors present a compile-time scheduling heuristic called dynamic level scheduling, which accounts for interprocessor communication overhead when mapping precedence-constrained, communicating tasks onto heterogeneous processor architectures with limited or possibly irregular interconnection structures.
Journal ArticleDOI

The synchronous approach to reactive and real-time systems

TL;DR: The authors present and discuss the application fields and the principles of synchronous programming, which makes it possible to handle compilation, logical correctness proofs, and verification of real-time programs in a formal way, leading to a clean and precise methodology for design and programming.
References
More filters
Journal ArticleDOI

Petri Nets

TL;DR: The structure of Petr i nets, thei r markings and execution, several examples of Petm net models of computer hardware and software, and research into the analysis of Pet m nets are presented, as are the use of the reachabil i ty tree and the decidability and complexity of some Petr i net problems.
Journal ArticleDOI

Parallel program schemata

TL;DR: This paper introduces a model called the parallel program schema for the representation and study of programs containing parallel sequencing, related to Ianov's program schema, but extends it, both by modelling memory structure in more detail and by admitting parallel computation.
Journal ArticleDOI

Parallel Sequencing and Assembly Line Problems

T. C. Hu
- 01 Dec 1961 - 
TL;DR: This paper deals with a new sequencing problem in which n jobs with ordering restrictions have to be done by men of equal ability, and how to arrange a schedule that requires the minimum number of men to complete all jobs within a prescribed time T.
Book

Fast Algorithms for Digital Signal Processing

TL;DR: Fast algorithms for digital signal processing, Fast algorithms fordigital signal processing , and so on.
Journal ArticleDOI

A comparison of list schedules for parallel processing systems

TL;DR: The problem of scheduling two or more processors to minimize the execution time of a program which consists of a set of partially ordered tasks and a dynamic programming solution for the case in which execution times are random variables is presented.