Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing

doi:10.1109/TC.1987.5009446

24

IEEE

TRANSACTIONS ON COMPUTERS,

VOL.

'2-36,

NO.

1.

JANUARY

1987

1'

i

Static Scheduling

of

Synchronous Data

Flow

Programs

for

Digital Signal Processing

EDWARD ASHFORD

LEE,

MEMBER,

IEEE,

AND

DAVID

G.

MESSERSCHMI'TT,

FELLOW,

IEEE

Abstract-hrge grain data flow (LGDF) programming is

natural and convenient for describing digital signal processing

(DSP) systems, but its runtime overhead is costly

in

real time

or

cost-sensitive applications. In some situations, designers are

not

willing to squander computing resources for the sake of program-

mer convenience. This is particularly true when the target

machine is

a

programmable DSP chip. However, the runtime

overhead inherent in most

LGDF

implementations is

not

required

for most signal processing systems because such systems are

mostly synchronous (in the DSP sense). Synchronous data

flow

(SDF) differs from traditional data flow in that the amount of

data produced and consumed by a data flow node

is

specified

a

priori

for each input and output. This is equivalent to specifying

the relative sample rates in signal processing system. This means

that the scheduling of SDF nodes need not be done at runtime,

but can be done at compile time (statically),

so

the runtime

overhead evaporates. The sample rates can all

be

different, which

is

not

true

of

most

current data-driven digital signal processing

programming methodologies. Synchronous data flow

is

closely

related to computation graphs,

a

special case of Petri

nets.

This self-contained paper develops the theory necessary

to

statically schedule

SDF

programs

on

single

or multiple proces-

sors.

A

class of static (compile time) scheduling algorithms is

proven valid, and specific algorithms are given for scheduling

SDF

systems onto single or multiple processors.

I

Index

Terms-Block diagram, computation graphs, data flow

digital signal processing, hard real-time systems, multiprocessing,

Petri nets, static scheduling, synchronous data flow.

I.

INTRODUCTION

0

achieve high performance

in

a processor specialized for

T

signal processing, the need to depart from the simplicity

of

von Neumann computer architectures is axiomatic. Yet,

in

the software realm, deviations from von Neumann program-

ming are often viewed with suspicion.

For

example, in the

design of most successful commercial signal processors today

[

11-[5], compromises are made to preserve sequential pro-

gramming. Two notable exceptions are the Bell

Labs

DSP

family

[6],

[7]

and the NEC data flow chip

[8],

both of which

are

programmed with concurrency in mind.

For

the majority,

however, preserving von Neumann programming style is

given priority.

This

practice has a long and distinguished history. Often, a

new non-von Neumann architecture has elaborate hardware

Manuscript received

August

15.

1985; revised March 17. 1986. This work

was supported

in

pan

by

the National Science Foundation under Grant ECS-

8211071, an

IBM

Fellowship. and

a

grant from the Shell Development

Corporation.

The

authors are with the Department of Electrical Engineering and

Computer Science, University of California, Berkeley,

CA

94720.

IEEE

Log

Number

861

1442.

and software techniques enabling a programmer to write

sequential code irrespective

of

the parallel nature

of

the

underlying hardware.

For

example, in machines

with

multiple

function units, such as the

CDCW

and Cray family,

so

called "scoreboarding" hardware resolves conflicts to ensure

the integrity of sequential code. In deeply pipelined machines

such as the IBM

360

Model

91,

interlocking mechanisms

[9]

resolve pipeline conflicts. In the M.I.T. Lincoln

Labs

signal

processor

[

101

specialized associative memories are used to

ensure the integrity of

data

precedences.

The affinity

for

von

Neumann programming is not all

surprising, stemming

from

familiarity and a proven track

record, but the cost is high

in

the design of specialized digital

signal processors. Comparing two pipelined chips that differ

radically only in programming methodology, the

TI

TMS32010

[2]

and the Bell Labs

DSP20,

a faster version of

the DSPl

[6],

we find that they achieve exactly the same

performance on the most basic benchmark, the

FIR

(finite

impulse response) filter. But the Bell

Labs

chip

outperforms

the

TI

chip

on

the next

most

basic benchmark, the

IIR

(infinite

impulse response) filter. Surprisingly, close examination

reveals that the arithmetic hardware (multiplier and

ALU)

of

the Bell

Labs

chip

is half as fast as in the TI chip.

The

performance gain appears to follow from the departure

from

conventional sequential programming.

However, programming the Bell Labs chip is not easy.

The

code more closely resembles horizontal microcode than

assembly languages. Programmers invariably adhere to the

quaint custom of programming these processors in assembler-

level languages, for maximum use of hardware resources.

Satisfactory compilers have failed to appear.

In this paper, we propose programming signal processors

using a technique based on large grain data flow

(LGDF)

languages

[

1

I].

which should ease the programming task by

enhancing the modularity of code

and

permitting algorithms

to

be

described more naturally.

In

addition, concurrency is

immediately evident in the program description,

so

parallel

hardware resources can

be

used

more effectively. We begin by

reviewing the data

flow

paradigm and its relationship with

previous methods applied to signal processing.

Synchronous

data

flow

(SDF)

is

introduced,

with its suitability for

describing signal processing systems explained. The advan-

tage of

SDF

over conventional data flow is that

more

efficient runtime code can be generated because the data flow

nodes can

be

scheduled

at

compile time, rather than at

runtime.

A

class

of

algorithms for constructing sequential

(single processor) schedules is proven valid, and a simple

001

8-9340/87/0100-0024$01

.OO

0

1987

IEEE

..

I

i:

i

t

i

LEE

AND,

MESSERSCHMITT.

STATIC

SCHEDUUNG

OF

SYNCHRONOUS

DATA

heuristic for constructing parallel (multiprocessor) schedules

is

described. Finally, the limitations

of

the model are

considered.

II.

THE

DATA

FLOW

PARADIGM

In

data flow,

a

program is divided into pieces

(nodes

or

blocks)

which can execute

(fire)

whenever input data are

available

[

121,

[

131.

An

algorithm is described as

a

dataflow

graph,

a directed graph where the nodes represent functions

and the

arcs

represent data paths, as shown in Fig. 1. Signal

processing algorithms are usually described in the literature by

a

combination of mathematical expressions and block dia-

grams. Block diagrams are

large grain dataflow

(LGDF)

graphs, [14]-1161, in which

the

nodes

or

blocks may be

atomic

(from the Greek

atornos,

or

indivisble), such as

adders

or

multipliers,

or

nonatomic (large grain), such as

digital filters, FFT units, modulators,

or

phase locked loops.

The arcs connecting blocks show the signal paths, where a

signal

is

simply

an

infinite stream

of

data, and each data token

is called

a

sample.

The complexity of the functions (the

granularify)

will determine the

amount

of

parallelism availa-

ble because, while the blocks can sometimes be executed

concurrently, we make

no

attempt to exploit the concurrency

inside

a

block. The functions within the blocks can

be

specified using conventional von Neumann programming

techniques.

If

the granularity is at the level of signal

processing subsystems (second-order sections, butterfly units,

etc.), then the specification of

a

system will be extremely

natural and enough concurrency will

be

evident

to

exploit at

least small-scale parallel processors. The blocks can themsel-

ves represent another data

flow

graph,

so

the specification can

be hierarchical. This is consistent with the general practice in

signal processing where, for example, an adaptive equalizer

may be treated as

a

block

in

a

large system, and may

be

itself a

network

of

simpler blocks.

LGDF

is

ideally suited for signal processing, and has

been

adopted in simulators in the past [17]. Other signal processing

systems

use

a

data-driven paradigm

to

partition a task among

cooperating processors

[181,

and many

so

called “block

diagram languages” have

been

developed

to

permit program-

mers

to

describe signal processing systems more naturally.

Some examples are Blodi [19], Patsi [20], Blodib [21],

Lotus

1221,

Dare

[23],

Mitsyn [24], Circus 1251, and Topsirn [26].

But these simulators are based

on

the principle

of

“next state

simulation” [20], [27] and thus have difficulty with multiple

sample

rates,

not

to mention asynchronous systems. (We use

the

term

“asynchronous” here in the DSP sense

to

refer to

systems with sample rates that are

not

related by a rational

multiplicative factor.) Although true asynchrony

is

rare in

signal processing, multiple sample rates are common, stem-

ming from the frequent

use

of decimation and interpolation.

The technique we propose here handles multiple sample rates

easily.

In

addition to being natural for

DSP,

large grain data flow

has another significant advantage for signal processing.

As

long as the integrity of the

flow

of

data is preserved, any

implementation

of

a

data flow description will produce the

same results. This means that the same software description

of

FLOW

PROGRAMS

25

Fig.

1.

A

three

node

data

flow

grsph

with OIY

input

and

two

outputs.

The

nodes

represent

functions

of

arbitrary

complexity,

and

the

arcs represent

paths

on

which

sequences

of

data

(fokens

or

surnpks)

flow.

a

signal processing system

can

be

simulated

on

a

single

processor

or

multiple processors, implemented in specialized

hardware,

or

even,

ultimately, compiled into a

VLSI

chip

III.

SYNCHRONOUS

DATA

FLOW GRAPHS

In

this paper we concentrate

on

synchronous

systems.

At

the risk

of

being pedantic, we define this precisely.

A

block

is

a function that is invoked when there is

enough

input available

to

perform

a

computation (blocks lacking inputs can be

invoked at any time). When a block is invoked, it will

consume

a

fixed number

of

new input samples

on

each input

path. These samples may remain in the system for some time

to be used

as

old samples

[17],

but they will never again

be

considered new samples.

A

block

is

said

to

be

synchronous

if

we can specify

a

priori

the

number

of

input samples consumed

on

each input and the number

of

output samples produced on

each output each time the block is invoked.

Thus,

a

synchro-

nous block is shown in Fig.

2(a)

with

a

number associated with

each input

or

output specifying the number

of

inputs consumed

or

the number

of

outputs produced. These numbers are part of

the block definition.

For

example,

a

digital filter block would

have one input and

one

output, and the number of input

samples consumed

or

output samples produced would

be

one.

A

2:l decimator block would also have one input and one

output, but would consume two samples for every sample

produced.

A

synchronous data

flow (SDF)

graph

is

a

network

of

synchronous blocks,

as

in Fig. 2@).

SDF

graphs are closely related

to

computation graphs.

introduced in

1966

by

Karp

and

Miller

[29]

and

further

explored by Reiter

[30].

Computation graphs are slightly more

elaborate than SDF graphs, in that each input

to

a

block has

two

numbers associated with it,

a

threshold

and the number of

samples consumed. The threshold specifies the number of

samples required to invoke the block, and may be different

from the number

of

samples consumed by the block.

It

cannot,

of

course, be smaller than the number

of

samples

consumed.

The use of a distinct threshold in the model, however, does

not

significantly change the

results

presented in

this

paper,

so

for

simplicity, we assume these

two

numbers are the same. Karp

and Miller [29] show that computations specified by

a

computation graph are determinate, meaning that the same

computations are performed

by

any proper execution.

This

type of theorem,

of

course.

also

underlies the validity of data

flow descriptions. They also give

a

test

to

determine whether

a

computation terminates. which is potentially useful because in

signal processing we are mainly interested in computations

that

do

not terminate.

We

assume that signal processing

t281*

IEEE

TRANSACTIONS ON

COMPUTERS.

VOL.

C-36.

NO.

1.

JANUARY

1987

(b)

Fig.

2.

(a)

A

synchronous

node.

(a)

A

synchronous

data

flow

graph.

systems repetitively apply an algorithm

to

an infinite sequence

of

data.

To

make it easier to describe such applications, we

expand the model slightly to allow nodes with no inputs. These

can fire

at

any time. Other results presented in

[29]

are only

applicable

to

computations that terminate, and therefore are

not

useful in

our

application.

Computation graphs have

been

shown to

be

a special case

of

Petri nets

[31]-[33] or

vector addition system

[N].

These

more general models can be used to describe asynchronous

systems. There has

also

been work with models that are

special cases

of

computation graphs. In 1971, Commoner and

Holt (351 described

marked directed graphs,

and reached

some conclusions similar to those presented in this paper.

However, marked directed graphs are much more restricted

samples produced

or

consumed on any arc to unity. This

extessively restricts the sample rates in the system, reducing

the utility

of

the model.

In

1968, Reiter [36] simplified the

computation graph model in much the same way (with minor

variations), and tackled

a

scheduling problem. However, his

scheduling problem assumes that each node in the graph is a

processor, and the only unknown is the firing time for the

invocation

of

each

associated function. In this paper we

preserve the generality

of

computation graphs and solve

a

different scheduling problem, relevant to data flow program-

ming, iv which nodes represent functions that must be mapped

Onto processors.

Implementing the signal processing system described by a

SDF

graph requires

buffering

the data samples passed

between blocks and

scheduling

6locks

so

that they are

executed when data are available. This could

be

done

dynamically,

in

which case a runtime supervisor determines

wben blocks

are

ready for execution and schedules them onto

pcessors

as they become

free.

This

runtime supervisor may

be

a software routine

or

specialized hardware, and is the same

as

the control mechanisms generally associated with data flow.

It

is

a

costly approach, however, in that the supervisory

overhead can become severe, particularly if relatively little

computation is done each time a block is invoked.

SDF

graphs, however, can

be

scheduled statically

(at

compile time), regardless

of

the number

of

processors, and the

overhead associated with dynamic control evaporates. Specifi-

cally, a

large grain compiler

determines the order in which

nodes can

be

executed and constructs sequential code for each

t

than

SDF

graphs because they constrain the number

of

,

processor. Communication between nodes and between proc-

essors is set up by the compiler,

so

no

runtime control is

required beyond the traditional sequential control in the

processors. The

LGDF

paradigm gives the programmer

a

natural interface for easily constructing well structured signal

processing programs, with evident concurrency, and the large

grain compiler maps this concurrency onto parallel proces-

sors.

This paper is dedicated mainly to demonstrating the

feasibility

of

such a large grain compiler.

IV.

A

SYNCHRONOUS

LARGE

GRAIN

COMPILER

We

need

a

methodology for translating from an

SDF

graph

to a set

of

sequential programs running on

a

number

of

processors. Such a compiler has the

two

following basic tasks.

Allocation

of

shared memory for the passing

of

data

between blocks, if shared memory exists,

or

setting

up

communication paths if not.

Scheduling blocks onto processors

in

such

a

way that data

is available

for

a block when that block is invoked.

The first task is not an unfamiliar

one.

A

single processor

solution (which

also

handles asynchronous systems) is given

by the buffer management techniques in Blosim

[

171. Simplifi-

cations

of

these techniques that

use

the synchrony

of

the

system are easy to imagine, as are generalizations to multiple

processors,

so

this paper will concentrate

on

the second task.

that

of

scheduling blocks onto processors

so

that data are

available when a block is invoked.

Some assumptions are necessary.

The

SDF

graph is nonterminating (cf. [29], [30]) mean-

ing that it can run forever without deadlock.

As

mentioned

earlier, this assumption is natural for signal processing.

The

SDF

graph

is

connected.

If

not, the separate graphs

can

be

scheduled separately using subsets

of

the processors.

The

SDF

graph is nonterminating (cf.

1291.

f301)

meaning’

that it can run forever without deadlock.

As

mentioned earlier.

this assumption is natural for signal processing.

Specifically,

our

ultimate goal is

a

periodic admissible

parallel schedule, designated

PAPS.

The schedule should be

periodic

because

of

the assumption that we are repetitively

applying the same program

on

an infinite stream

of

data. The

desired schedule is

admissible,

meaning that blocks will be

scheduled to run only when data are available, and that a finite

amount

of

memory is required.

It

is

parallel

in that more than

one processing resource can

be

used.

A

special case is

a

periodic admissible

sequential

schedule, or

PASS,

which

implements an

SDF

graph on a single processor. The method

for constructing

a

PASS

leads to a simple solution

to

the

problem

of

constructing a

PAPS,

so

we begin with the

sequential schedule.

A.

Construction

of

a

PASS

A

simple

SDF

graph is shown in Fig.

3,

with each block and

each arc labeled with a number. (The connections

to

the

outside world are not considered, and for the remainder

of

this

paper, will not

be

shown.

Thus,

a block with one input from

the outside will

be

considered a block with no inputs, which

can therefore

be

scheduled at any time. The limitations

of

this

approximation are discussed

in

Section

V.)

An

SDF

graph can

~?1:

.*%D

MESSERSCHMITT:

STATIC

SCHEDULING

OF

SYNCHRONOUS

DATA

FLOW

PROGRAMS

27

I3

Fig.

3.

SDF

graph showing

the

numbering

of

the

nodes and arcs.

The

input

and

output

arcs

arc

ignored

for

now.

be

characterized by

a

matrix similar

to

the incidence matrix

associated with directed graphs in graph theory.

It

is con-

structed by first numbering each node and arc, as in Fig.

3,

and assigning

a

column to each node and a

row

to each arc.

The

(i,

j)th

entry in the matrix is the amount

of

data produced

bj

node

j

on

arc

i

each time it is invoked.

If

node

j

consumes

data

from arc

i,

the number is negative, and if it is not

connected to arc

i,

then the number is zero.

For

the graph in

Fig.

3

we get

This matrix can

be

called a

topology

matrix,

and need

not

be

square, in general.

If

a

node has

a

connection to itself

(a

serf-loop),

then only

one entry in

I?

describes this link. This entry gives the net

difference between the amount of data produced

on

this link

and the amount consumed each time the block is invoked. This

difference should clearly

be

zero for

a

correctly constructed

graph,

so

the

I?

entry describing

a

self-loop should

be

zero.

We can replace each arc with a FIFO queue (buffer)

to

pass

data

from

one

block

to

another. The size

of

the queue will vary

at

different times in the execution. Define the vector

b(n)

to

contain the queue sizes

of

all the buffers at time

n.

In

Blosim

[

171

buffers are

also

used

to

store old samples (samples that

have

been

?consumed?), making implementations

of

delay

lines panicularly easy. These past samples are not considered

part

of

the buffer size here.

For

the sequential schedule, only one block can

be

invoked

at

a

time, and for the purposes

of

scheduling it

does

not

matter

how long it runs.

Thus,

the index

n

can simply

be

incremented

each time

a

block finishes and

a

new block is begun.

We

specify the block invoked at time

n

with

a

vector

u(n),

which

has

a

one in the position corresponding to the number

of

the

block that is invoked at time

n

and zeros for each block that is

not

invoked.

For

the system in Fig.

3,

in a sequential schedule,

o(n)

can

take one

of

three values,

depending

on

which of the three blocks is invoked. Each time

a

block is invoked, it will consume

data

from zero

or

more

input arcs and produce data on

zero

or

more output arcs.

The

change in the size

of

the buffer queues caused by invoking a

node is given by

b(n+

i)=b(n)+ru(n).

(3)

Fig.

4.

An example

of

M

SDF

graph

with

delays

on

the

arcs.

The topology matrix

I?

characterizes the effect

on

the buffers

of

running a node program.

The simple computation

model

is powerful. First we

note

that the computation model handles delays.

The

term

delay

is

used

in the signal processing sense, corresponding

to

a

sample

offset between the input and the output. We define

a

unit

delay

on

an

arc from node

A

to node

B

to

mean that the nth

sample consumed by

B

will be the

(n

-

1)th sample produced

by

A.

This

implies that the first sample the destination block

consumes is

not

produced by the source block at all, but is pan

of

the initial state

of

the arc buffer. Indeed,

a

delay of

d

samples

on

an arc is implemented in our model simply by

setting an initial condition for

(3).

Specifically, the initial

buffer state,

b(O),

should have a

d

in the position correspond-

ing

to

the

arc

with the delay

of

d

units.

To make this idea firm, consider the example system in Fig.

4.

The

symbol

?0?

on

an arc means

a

single sample delay,

while

?20?

means

a

two-sample delay. The initial condition

for the buffers

is

thus

--

(4)

Because of these initial conditions, block

2

can

be

invoked

once and block

3

twice before block

1

is invoked at all.

Delays, therefore, affect the way the system starts

up.

Given this computation model we can

find necessary and sufficient conditions for the existence

find practical algorithms

that

provably find a PASS if one

find practical algorithms that construct a reasonable (but

We begin by showing that a necessary condition for

the

of

a

PASS, and hence

a

PAPS;

exists;

not

necessarily optimal) PAPS, if a PASS exists.

existence

of

a PASS is

rank

(I?)=s-l

(5)

where

s

is

the

number of blocks in the graph.

We

need

a

series

of

lemmas before we can prove this. The word ?node? is used

below to refer to the blocks because it is traditional in graph

theory.

Lemma

I:

All

topology matrices

for

a

given

SDF

graph

have the same rank.

ProoJ

Topology

matrices

are

related by renumbering of

nodes and arcs, which translates into row and column

permutations in the topology

matrix.

Such operations preserve

the rank. Q.E.D.

Lemma

2:

A

topology matrix for

a

tree

graph has rank

s

-

1 where

s

is the number

of

nodes

(a

tree is

a

connected

graph without cycles, where we ignore the directions

of

the

arcs).

Proof:

Proof is by induction. The lemma

is

clearly true

for

a

two-node tree. Assume that for an

N

node tree

IEEE

TRANSACTIONS ON

COMPUTERS.

VOL.

C-36.

NO.

I.

JANUAR'I'

1987

28

I'

rank(rN)

=

N

-

1.

Adding one node and one link

connecting

that

node to our graph will yield an

N

+

1

node

tree. A topology matrix for the new graph can

be

written

r"I0

Fig.

5.

Example

of

a

defective

SDF

Bnph

with

sample rate inconsistencies.

rN+I=

[

73

The

topology

matrix

is

where

0

is

a

column vector full

of

zeros, and

pf

is

a

row

vector corresponding to the arc we

just

added. The last entry in

the vector

p

is nonzero because'the node we

just

added

corresponds

to

the

last column, and it must

be

connected to the

graph. Hence, the last row is linearly independent from the

other rows,

so

rank(rN,J

=

rank(rN)

+

I.

Q.E.D.

Lemma

3:

For a connected SDF graph with topology

matrix

I'

[:

-f

-']

rank

(I')=s=3.

r=

o

0

-I

rank

(r)

2

s-1

where

s

is the number

of

nodes in the graph.

Proof:

Consider any spanning tree

7 of

the connected

SDF graph

(a

spanning tree is

a

tree that includes every node

in the graph). Now define

r,

to

be

the topology matrix for this

subgraph. By Lemma

2

rank

(T',)

=

s

-

1.

Adding arcs

to

the subgraph simply adds rows to the topology matrix. Adding

rows to

a

matrix can increase the rank, if the rows are linearly

independent

of

existing rows, but cannot decrease it. Q.E.D.

Proof:

r

has only

s

columns,

so

its rank cannot exceed

s.

Therefore, by Lemma

3,

s

and

s

-

I

are the only

possibilities. Q.E.D.

Definition

I:

An

admissible sequential schedule

+

is a

nonempty ordered list of nodes such that if the nodes are

executed in the sequence given by

4,

the amount

of

data in

the

buffers

('

'buffer sizes") will remain nonnegative and

bounded. Each node must appear in

(b

at least once.

A

periodic admissible sequential schedule

(PASS) is

a

periodic and infinite admissible sequential schedule.

It

is

specified by

a

list

(b

that

is the list of nodes in one period.

For the example in Fig.

6,

(b

=

{

1

,

2,

3,

3)

is a PASS, but

4

=

(2,

1,

3,

3)

is not because node

2

cannot be run before

node

1.

The list

4

=

{

1

,

2,

3)

is

not

a PASS because the

infinite schedule resulting from repetitions

of

this list will

result in an infinite accumulation of

data

samples on the arcs

leading into node

3.

Theorem

I:

For

a

connected SDFgraph with

s

nodes and

topology matrix

r,

rank

(F)

=

s

-

1

is

a

necessary condition

for

a

PASS

to

exist.

Proo$

We

must prove that the existence

of

a PASS

of

pcnodp

implies

rank

(T')

=

s

-

1.

Observe from

(3)

that we

can

write

Corollury:

rank

(r)

is

s

-

1

or

s.

where

n=O

Since the PASS is periodic, we

can

write

b(np)

=

b(0)

+

nrq.

Fig.

6.

An

SDF

graph

with

consistent sample rates has a positive integer

vector

q

in

the

nullspace of the topology matrix

r.

r=

[

o

-:

0

-1

-'I

.=[:I

cv(r)

Since the PASS is admissible, the buffers must remain

bounded, by Definition

1.

The buffers remain bounded if and

only if

rq=o

where

0

is

a

vector full

of

zeros. For

q

#

0,

this implies

that

rank(r)

<

s

where

s

is the dimension

of

q.

From the corollary

of

Lemma

3,

runk(r)

is either

s

or

s

-

1,

and

so

it must

be

s

-

I.

Q.E.D.

This theorem tells

us

that if we have

a

SDF graph with

a

topology matrix

of

rank

s,

that the graph

is

somehow

defective, because

no

PASS can

be

found for it. Fig.

5

illustrates such

a

graph and its topology matrix. Any schedule

for this graph will result either

in

deadlock

or

unbounded

buffer sizes, as the reader can

easily

verify. The

rank

of

the

topology matrix indicates

a

sample rate inconsistency

in the

graph.

In

Fig.

6,

by contrast.

a

graph without this defect is

shown. The topology matrix

has

rank

s

-

1

=

2.

so

we can

find a vector

q

such

that

rq

=

0.

Furthermore, the following

theorem shows that we can find

a

positive integer vector

q

in

the nullspace of

r.

This vector tells

us

how many times

we

should invoke each node in

one

period

of

a

PASS. Refemng

again to Fig.

6,

the

reader can easily verify that if we invoke

node

1

once, node

2

once, followed by node

3

twice, that the

buffers will end up once again in their initial state.

A5

before.

we prove some lemmas

before

getting

to

the theorem.

Lemma

4:

Assume a connected SDF graph with topology

matrix

r.

Let

q

be

any vector such that

rq

=

0.

Denote

a

connected path through the graph by the

set

B

=

{

b,,

*

.

,

bL

}

where each entry designates a

node,

and

node

bl

is

connected

to

node

bz.

node

bz

to

node

b3,

up

to

bL.

Then all

q,,

i

E

B

are zero,

or

all are strictly positive,

or

all are strictly

negative. Furthermore, if any

q,

is rational then all

q,

are

rational.

Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing

Citations

Synchronous data flow

Dataflow process networks

Big data

A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures

The synchronous approach to reactive and real-time systems

References

Petri Nets

Parallel program schemata

Parallel Sequencing and Assembly Line Problems

Fast Algorithms for Digital Signal Processing

A comparison of list schedules for parallel processing systems

Related Papers (5)

Synchronous data flow

The Semantics of a Simple Language for Parallel Programming.

Dataflow process networks

StreamIt: A Language for Streaming Applications

Ptolemy: a framework for simulating and prototyping heterogeneous systems