scispace - formally typeset
Open AccessJournal ArticleDOI

Micropreemption synthesis: an enabling mechanism for multitask VLSI systems

Reads0
Chats0
TLDR
Techniques to incorporate micropreemption constraints during multitask VLSI system synthesis are presented and algorithms to insert and refine preemption points in scheduled task graphs subject to preemption latency constraints are presented.
Abstract
Task preemption is a critical enabling mechanism in multitask very large scale integration (VLSI) systems. On preemption, data in the register files must be preserved for the task to be resumed. This entails extra memory to preserve the context and additional clock cycles to save and restore the context. In this paper, techniques and algorithms to incorporate micropreemption constraints during multitask VLSI system synthesis are presented. Specifically, algorithms to insert and refine preemption points in scheduled task graphs subject to preemption latency constraints, techniques to minimize the context switch overhead by considering the dedicated registers required to save the state of a task on preemption and the shared registers required to save the remaining values in the tasks, and a controller-based scheme to preclude the preemption-related performance degradation by: 1) partitioning the states of a task into critical sections; 2) executing the critical sections atomically; and 3) preserving atomicity by rolling forward to the end of the critical sections on preemption have been developed. The effectiveness of all approaches, algorithms, and software implementations is demonstrated on real examples. Validation of all the results is complete in the sense that functional simulation is conducted to complete layout implementation.

read more

Content maybe subject to copyright    Report

0-89791-993-9/97 $10.001997 IEEE
Micro-Preemption Synthesis: An Enabling Mechanism for
Multi-Task VLSI Systems
Kyosun Kim, Ramesh Karri Miodrag Potkonjak
DepartmentofECE Department of Computer Science
University of Massachusetts University of California
Amherst, MA 01002 Los Angeles, CA 90095
f
karri,kkim
g
@ecs.umass.edu miodrag@cs.ucla.edu
Abstract -
Task preemption is a critical enabling
mechanism in multi-task VLSI systems. On pre-
emption, data in the register les must bepreserved
in order for the task to beresumed. This entails
extra memory to save the context and additional
clock cycles to restore the context. In this paper,
we present techniques and algorithms to incorporate
micro-preemption constraints during multi-task VLSI
system synthesis. Specical ly, we have developed: (i)
algorithms to insert and rene preemption points in
scheduled task graphs subject to preemption latency
constraints, (ii) techniques to minimize the context
switch overhead by considering the dedicatedregisters
required to save the state of a task on preemption and
the sharedregisters required to save the remaining val-
ues in the tasks, and (iii) a control ler based scheme to
preclude preemption relatedperformancedegradation.
1 Intro duction
1.1 Motivation
Task preemption is a critical enabling mechanism in a
variety of application scenarios. Hard real-time com-
puter systems have stringent timing requirements. In
such systems, the deadlines for critical tasks are en-
forced by preempting less critical tasks. In soft real-
time systems where infrequent deadline violations are
tolerated, less important tasks are preempted to exe-
cute more imp ortant ones so as to meet some quality-
of-service requirements. For example, in multimedia
systems, video, voice, and data streams are scheduled
and occasionally interrupted and resumed according
to priority strategies to enforce soft end-to-end dead-
lines.
Along a dierent dimension, multi-task VLSI systems
are becoming commonplace. For example, Motorola
oers numerous DSP ASPPs [3]. An ASPP can be dy-
namically congured to one of the implemented tasks.
Although the reconguration time of an ASPP maybe
very low (b ecause reconguration entails moving from
the nal state of the current task to the start state of
another task), it may not be acceptable for a critical
task in a real-time system to wait until the current
task is completed.
This researchwas supp orted by an NSF CAREER grant
MIP-9702676
On receiving a preemption request, the state of the
active task must be saved, the context of the new task
must be loaded and then executed. On completion of
task execution, the state of the preempted task must
be restored and the interrupted task resumed to com-
pletion. Imp ortant factors that should be considered
while implementing task preemption include:
Preemption latency
: It is dened as the maxi-
mum time it takes from the instant a preemption
request is received to the instant the task state is
saved.
Context switch cost:
Hardware overhead in-
curred by the installation of a preemption han-
dling scheme must be considered. A saved state
should contain only enough information (and no
more) so that the preempted task can b e resumed
at the precise point where it was interrupted. The
task state should consist of the contents of the
general purp ose registers, the condition registers,
and the relevant p ortion of background memory.
Performance degradation
: There are two
main sources of performance degradation: (i) on
a preemption request, some task states that have
already been executed may b e aborted. Retrac-
ing these ab orted states adds to the nish time
of the aborted task. (ii) Anyscheme that saves
the context of a preempted task in background
memory may stall execution units.
In this paper we will present a systematic metho dol-
ogy for incorp orating preemption constraints in multi-
task VLSI systems. Specically,we will showhow
context switch cost
and
performance degrada-
tion
can be minimized while satisfying task sp ecic
throughput and
preemption latency
constraints.
1.2 A Motivating Example
Consider a system implementing two tasks A and B.
Task A takes four clock cycles and task B takes six
clock cycles for one iteration. Let A1, A2, A3, and
A4 denote parts of task A that are executed in the
rst, second, third and fourth clo ck cycles resp ectively.
Similarly, let B1, B2, B3, B4, B5 and B6 denote parts
of task B that are executed in the rst, second, third,
fourth, fth and sixth clo ck cycles respectively. The
33

following assumptions were made while designing the
micro-preemption controller.
1. A non-overlapping two-phase clo ck
(clk 1, clk 2)
.
2. Activation of a new task (i.e. changing the
selected
task
signal) and transition of a task from one state
to the next are synchronized with falling edge of
clk
2
.
3. Setting and resetting of
preempt mask
is synchro-
nized with falling edge of
clk 1
.
4. A task preemption request is serviced when the
pre-
empt mask
is low and
selected task
is high and
clk 2
is falling.
A simulation snapshot of micro-preemption in the two-
task VLSI system is shown in gure 1
1
. To mini-
time (ns)
275.0 300.0 325.0 350.0 375.0 400.0 425.0 450.0 475.0 500.0 524.0
CLK 1
CLK 2
SELECTED TASK
PREEMPT MASK
ACTIVE TASK
STATE
A B A B NOP
A B A B NOP
A4 A1 A2 A3 B1 B2 B3 B4 A4 A1 B5 B6
NOP
Figure 1: Simulation snapshot showing preemption re-
quest and servicing mechanism in a two-task VLSI
system
mize the controller and context switchoverhead, we
mandate that task A can be preempted in
state
sA1
and A3 alone. These are called
task preemption
points
. Initially the system is in
state
A4. When
the system go es to
state
A1, execution of a new task
is requested by setting the
selected task
to B and
the data inputs to appropriate data values (these have
not been shown here for simplicity). However, it is not
avalid preemption request (since
preempt mask
is
high). Even when a valid preemption request arrives
in
state
A2 (i.e. all conditions in item 4 are satis-
ed) task
A
is not ab orted immediately. Rather, the
computation
rolls forward
to end of state A3 (the
next preemption p oint) b efore the preemption request
is serviced. Notice that it has taken two clo ck cycles
from the time a valid preemption request arrived (b e-
ginning of
state
A2) to the time the new task (task
B) became active (end of
state
A3). From the point
of view of task B, this is its
preemption latency
.
From the point of view of the multi-task VLSI system
as a whole (and task A in particular) rolling forward
of the computation has eliminated the p erformance
degradation due to an immediate ab ort. In a nutshell,
preemption p oints A1 and A3 have partitioned the ex-
ecution of task A into two
critical sections
f
A2, A3
g
and
f
A4, A1
g
. Similarly,
task preemption p oints
B2, B4 and B6 for task B partition it into three critical
sections
f
B1, B2
g
,
f
B3, B4
g
, and
f
B5, B6
g
.
1
the controller has been implemented using 1
SCMOS stan-
dard cell library and simulated using IRSIM.
Task
Select
Interrupt
Disable
Control
Logic
Write Enable
Read Select
State
Register
File
To Datapaths
Control Signals
Pipeline Register
CLK2
CLK1
CLK2
State(T+1)
State(T)
ID(T)
ID(T-1)
Task ID Queue
CLK2
CLK1
Figure 2: Controller for a multi-task VLSI system sup-
porting micro-preemption
The controller is shown in gure 2. It is a collection of
nite state machines (one for each implemented task)
and has a state register le that holds the identica-
tion of the currently active task. Atevery clock cycle,
a dierent task can b e initiated or a preempted task
resumed by the task select signal. The controller sig-
nals are pip elined so that the controller delaydoesnot
aect the critical path.
The rest of the pap er is organized in the following way.
We rst briey survey the related work along several
dimensions. Next, we will discuss the computational
and hardware mo dels. In sections 3 and 4, weintro-
duce our approach, formulate the micro-preemption
synthesis problems, and describ e the proposed algo-
rithms for micro-preemption synthesis. Experimental
results are presented in section 5. In section 6, we
conclude by summarizing the results.
1.3 Related Research
Recongurable computing platforms are attracting a
lot of attention recently. A fast growing billion dollar
Field Programmable Gate Array (FPGA) industry is
supported byanumber of commercial and research
tools [12]. A number of special purpose recongurable
computers have b een built. Early work in this di-
rection includes the systems realized at Universityof
Texas, Austin (TRAC) [5]. The Splash system enables
recongurability to more than 100 dierent congura-
tions which are well suited for several computational
tasks in molecular biology [7]. Several generations of
data path recongurable video-pro cessors with accom-
panying compilation support have been developed at
University of California, Berkeley [13]. Recently, Ap-
plication Specic Programmable Processors (ASPPs)
[16]have b een introduced as an excellent candidate
for multifunctional datapaths with frequent context
switching. Though their functionalities must b e deter-
mined in the design phase, a single ASPP implement-
ing multiple functions obtains signicant area savings
when compared with the dedicated ASIC implemen-
tations of the functions.
Research in implementing interrupts is outlined next.
The IBM 360/91 supports precise and imprecise in-
terrupt handling [1]. Hwu and Patt [2] proposed a
checkpointing approach to handling interrupts. The
checkpoints (which incur some penalty in pro cessor
34

performance) are used to divide the sequential instruc-
tion stream into smaller units to reduce the cost of
resumption. Sohi [9]integrated the functions of reser-
vation stations and reorder buers into the register
update unit to realize precise interrupts. In addition,
Smith and Pleszkun [6] presented architectural solu-
tions suchassaving the intermediate state of vector
instructions and saving a sequence of instructions that
must be executed before saving the program counter.
Mosberger et. al. [15] presented a software-only solu-
tion to the synchronization problem in uniprocessors.
Their idea was to execute atomic sequences without
any hardware protection, and to roll the sequence for-
ward to the end, thereby preserving atomicity.
Behavioral synthesis has been active research area for
more than two decades [8], and numerous outstand-
ing systems have been built targeting both data path
oriented and control oriented applications [8]. Synthe-
sis systems that optimize p ower, testability and fault-
tolerance [14]have b een developed.
2 Computational/Hardware Mo dels
Our computational model for a single task is homoge-
neous synchronous data ow [4]. Within this mo del,
a task is represented as a hierarchical Control Data
Flow Graph
G
(
N; E; T
) (or CDFG), with nodes
N
representing the ow graph operations, and the edges
E
and
T
respectively the data and timing dep endences
between the op erations.
In mo dern designs a variety of register le mo dels have
been used [10]. From among them wehave selected the
dedicated register le hardware mo del. This model
clusters all registers in register les and each le is
then connected only to the inputs of the corresp onding
execution units. An important b enet of the chosen
hardware model is that it reduces the interconnect at
the exp ense of additional registers.
3 Issues and Our Approach
On preemption, the data in the register les must be
preserved somewhere in order for a task to resume.
In general purpose microprocessors, these values are
transferred to background memory b efore an inter-
rupt is serviced. This technique is not acceptable in
multi-task VLSI systems due to the attendant per-
formance penalty. Alternately, a register windowing
technique is used in the Sparc architecture [11]. In
this scheme, data is saved in registers within the pro-
cessor even when a new computation environmentis
required. However, it entails non-negligible area over-
heads for duplicated registers. In contrast, we pro-
pose an intuitively simple technique by classifying the
edges in the CDFG and the registers hold them into
two groups:
Dedicated registers
2
(
R
t
d
)
store the values of
edges of a task that straddle preemption points.
These edges that straddle a preemption p oint are
2
coecient registers (
R
t
c
)
that hold the constants used by
the task are not targeted during the context switch optimiza-
tion. This is because, generally these constants dier from one
task to another and cannot share registers.
called the
red
edges, and representintermediate
values essential to resume the task if preempted.
Shared registers (
R
t
s
)
are shared by the values
associated with the remaining edges (of all tasks)
in the system. These edges that do not straddle a
preemption point are called the
green
edges. The
dedicated registers of a task can also b e used to
store the values associated with the green edges
in the task. However, the shared registers cannot
be assigned to red edges.
Since dedicated registers cannot b e shared between
tasks, the asso ciated context switchoverhead is the
sum of the dedicated registers over all tasks. On the
other hand, the context switchoverhead due to shared
registers is the maximum value across all tasks. Over-
all, the context switch cost of a multi-task VLSI sys-
tem with task set
T
is:
j
R
j
=
X
t
2
T
j
R
t
d
j
+ max
t
2
T
j
R
t
s
j
(1)
Performance degradation resulting from aborting a
task is eliminated by (i) partitioning the task states
into
critical sections
, (ii) executing critical sections
and (iii) preserving atomicity of a critical section by
rolling forward
to the end of a critical section on pre-
emption. This is analogous to the classical approach
to
precise interrupts
.
Next, we present algorithms for (i) preemption point
insertion and (ii) preemption context synthesis that
minimizes the context switchoverhead during multi-
task VLSI system synthesis. The optimization prob-
lem can be dened as follows:
Given an underlying hardwaremodel and N scheduled
tasks, each with its own time bound (
) and maximum
preemption latency (
), insert preemption points, and
bind edges to registers, so that the context switch over-
head is minimized.
Initially, all tasks are scheduled in an integrated fash-
ion by considering their word length, precision, hard-
ware and top ological similarities. Using the number
of edges straddling a clo ck cycle as an estimate of
the context switchoverhead, preemption p oints are
inserted. The resulting preemption p oint set for each
task will have more than the minimum number of pre-
emption points. In the next step these preemption
point sets are rened. Finally, the preemption con-
text is synthesized by binding edges to registers sub-
ject to preemption constraints. The output is then
passed through hardware mapping and layout genera-
tion to ols to synthesize a multi-task VLSI system.
3.1 Preemption Point Insertion
Towards investigating preemption p oint insertion,
consider a task with ve edges (
e
1
; :::; e
5
), an appli-
cation latency of eight clock cycles and an edge-to-
register binding shown in gure 3. The register le
has one input port and one output port which are ac-
cessed at the rst half cycle and the last half cycle,
respectively.
The register overhead of a preemption point can b e
estimated as the number of edges straddling it. For in-
stance, assuming a preemption latency of three yields
35

e4e2
e1
e3
e5
r1
r2
r3
PPP
12345067
Clock Cycle
Register
Figure 3: Preemption p oint insertion and register
binding. The dotted line shows a cyclic dep endency.
clock cycles 1, 4, and 7 as preemption points as shown
in gure 3. The preemption points are marked by
`P'. On preemption p oint insertion,
e
1
and
e
5
become
red edges and are assigned to two dedicated registers
r
1
and
r
2
.
e
3
becomes a green edge and is assigned
to shared register
r
3
.
e
2
and
e
4
become green edges
but are assigned to dedicated register
r
2
. Initially,
preemption points are inserted one task at a time us-
ing a p olynomial heuristic time algorithm
InsertPre-
emptionPoints()
. It incrementally inserts preemp-
tion p oints (into each task) such that the number of
edges straddling the preemption p oints is minimized
until the preemption latency constraint is satised.
3.2 Preemption Point Renement
Minimizing dedicated registers alone do es not reduce
the context switchoverhead. Instead, it may increase
the number of shared registers and hence the total
context switchoverhead. For example, assume pre-
emption points are inserted at clo ck cycles 0 and 5.
Consider the following scenarios:
e1 e2
e3 e4 e5
e1 e5
e3 e2
e4
r2
r3
(b)
123450
PP
Clock Cycle
Register
r1
PP
123450
Clock Cycle
Register
r1
r2
(a)
Figure 4: Shared register overhead
scenario 1 (gure 4 (a)) :
Red edges
e
1
and
e
5
are assigned to dedicated registers
r
1
and
r
2
,
respectively. This results in two dedicated regis-
ters and zero shared registers for the task with a
context switchoverhead of two registers.
scenario 2 (gure 4 (b)):
Red edges
e
1
and
e
5
are bound to dedicated register
r
1
. The green
edges
e
2
,
e
3
and
e
4
are b ound to shared registers
r
2
and
r
3
. This results in one dedicated register
and two shared registers with a context switch
overhead of three registers.
Scenario 1
is sup erior to
scenario 2
if all other tasks
in the system do not require shared registers.
Sce-
nario 2
is sup erior to
scenario 1
if at least one of the
remaining tasks uses more than two shared registers.
Based on these observation it is clear that both the
shared and dedicated registers must b e considered in
an integrated manner to optimize the context switch
overhead.
RenePreemptionPoints (
T; P
)
f
1: for each
t
i
2
T
f
2: for (
j
0;
j<
j
P
j
;
j
++)
f
3:
R
[
i
][
j
]
PreemptionContextSynthesis(
E
i
;P
);
4:
p
p
k
s.t. NumberOfEdges(
p
k
)is
max
p
l
2
P
NumberOfEdges(
p
l
) and
MaxPreemptionLatency(
P
,f
p
k
g
)
i
;
5: if (
p
=
) break;
6:
P
P
,f
p
g
; /* Prune a preemption p oint*/
g
g
7:
P C C ost
best
inf inite
;
8: while ((
C
GenerateConguration())
6
=
)
f
9:
P C C ost
PreemptionContextCost(
R; C
);
10: if (
P C C ost < P C C ost
best
)
P C C ost
best
P C C ost
;
g
g
Figure 5: Algorithm for preemption point renement-
For each task
t
i
in task set
T
,
E
i
is the set of edges,
i
is the input latency, and
i
is the preemption latency.
For each task, we start from the preemption p oint sets
generated by the insertion step. We then generate
a list of candidate preemption point sets by pruning
preemption p oints with large context switchoverhead
(steps 2-6 in
RenePreemptionPoints()
in gure
5). Both dedicated and shared registers are used to
compute the context switchoverhead. Since the peak
usage of shared registers cannot b e known a priori,
edges are b ound to registers (using
PreemptionCon-
textSynthesis()
)toevaluate the context switchover-
head, exactly. This pruning technique is p ossible b e-
cause for each task, preemption point insertion usually
inserts more preemption points than are necessary. Fi-
nally, the b est preemption p oint set one for eachof
the tasks is obtained by using the context switch cost
function given by equation 1. This is summarized in
steps 7-10.
Consider a multi-task VLSI system implementing
three tasks,
t
1
;t
2
;t
3
shown in gure 6. Following the
preceding steps, task
t
1
has two candidate preemption
point sets (PPS) with context switchoverheads (CSO)
(3, 4) and (2, 5). Similarly, tasks
t
2
and
t
3
have four
and three preemption point sets, respectively. The
context switchoverhead for each preemption p oint set
is given as the two-tuple, (# of dedicated registers, #
of shared registers). Selecting preemption p oint set 2
for
t
1
, preemption p oint set 4 for
t
2
and preemption
point set 3 for
t
3
will result in a context switch cost
of (2 + 2 + 1) + max (5, 7, 5) = 12. The context
switch cost of selecting preemption point set 2 for
t
1
,
preemption p oint set 3 for
t
2
and preemption p oint set
2 for
t
3
is (2 + 3 + 1) + max (5, 5, 5) = 11. From
among the 2
4
3 = 24 congurations, this has the
lowest context switchoverhead.
36

t
1
t
2
t
3
PPS CSO PPS CSO PPS CSO
1 (3, 4) 1 (5, 4) 1 (2, 4)
2 (2, 5) 2 (4, 5) 2 (1, 5)
3 (3, 5) 3 (1, 5)
4 (2, 7)
Figure 6: Candidate preemption p oint sets for tasks
t
1
;t
2
and
t
3
3.3 Preemption Context Synthesis
The optimization problem asso ciated with preemption
context binding can be dened as:
Given a scheduled task and a set of preemption points,
bind the edges to the registers so that (i) the rededges
arebound to dedicatedregisters, and (ii) the total num-
ber of registers is minimized.
PreemptionContextSynthesis (
E; P
)
f
1.
Classify
edges into red and green
2.
Bind
red edges to dedicated registers
3.
Bind
green edges to dedicated or shared registers
g
(a)
Classify (
E; P
)
f
Red
;
Green
E
;
foreach
e
2
E
foreach
p
2
P
if (lifetime of
e
overlaps
p
)
f
Green
Green
,f
e
g
;
Red
Red
[f
e
g
;
g
g
(b)
Bind (
E
)
f
repeat
f
e
e
k
s.t.
j
e
k
:nbr
j
is max
e
i
2
E
j
e
i
:nbr
j
;
e:reg
min
r
s.t.
r
6
=
n:reg
8
n
2
e:nbr
;
g
until ((
E
E
,f
e
g
)
6
=
);
g
(c)
Figure 7: Algorithms for preemption context synthesis
The algorithm, outlined in gure 7, minimizes the
number of dedicated registers rst and then minimizes
the number of shared registers. Initially, the algorithm
groups the edges into red and green edges using
Clas-
sify ()
. Then the red edges are bound to dedicated
registers. Finally, the green edges are b ound. The
ordering is important since while green edges can b e
bound to either the dedicated or the shared registers,
red edges can only be bound to dedicated registers.
A graph coloring heuristic
Bind ()
(outlined in g-
ure 7 (c)) is used for binding. The edge with the
largest number of b ound neighbors (
nbr
) is selected
and b ound to a register (
reg
) which is not b ound to
any of its neighbors.
4 Exp erimental Results
Micro-preemption synthesis techniques prop osed in
this pap er were validated on a set of DSP, video, con-
trol and communication applications. The selected
applications span a wide range of complexities in com-
putational structures and include Arai's fast DCT
algorithm (ARAI), decimate-by-four wave digital l-
ter (DECBY4), four-state linear controller (FSLC),
Winograd's DFT for N = 8 (FFT8), digital wavelet
transform (WAVELET) and ninth degree birecipro cal
WDF with Butterworth resp onse (WDF9). Synthesis
modules for hardware mapping and layout generation
from HYPER high level synthesis system were used to
complete the synthesis tra jectory.
4.1 Register Overhead Evaluation
multi-task allocation # of registers
VLSI system + - * 0-p 1-p all-p
f
ARAI, FFT8, 2 2 1 64 65 86
WAVELET
g
2% 34%
f
FIR20, VETT, 2 2 2 81 88 105
VOLTERRA
g
9% 30%
f
DECBY4, FSLC, 2 3 2 119 134 170
NC, WANG
g
13% 43%
f
WDF7, WDF9, 2 4 2 62 80 91
WDFB
g
29% 47%
f
DIF, LDI LP, 2 2 1 40 54 68
WDF5
g
35% 70%
f
ADAPT, LEE, 2 2 2 57 79 92
CASCADE
g
39% 61%
Table 1: Register overhead asso ciated with micro-
preemption
The results of six multi-task VLSI systems are sum-
marized in table 1. The rst column shows the appli-
cations implemented in each system. The next three
columns summarize the hardware allo cation. The last
three columns give the number of registers for the case
when no preemption p oints are inserted (0-p), when
one preemption p oint is inserted (1-p), and when pre-
emption p oints are inserted at all clo ck cycles (all-p).
Using the 0-p case as the base line, the register over-
head for the 1-p case varies from 2% to 39%. At the
other extreme, the register overhead for the all-p case
varies from 30% to 70%.
area (
mm
2
) over
multi-task VLSI system 0-p all-p head
f
ARAI, FFT8, WAVELET
g
40.8 43.2 6%
f
FIR20, VETT, VOLTERRA
g
94.2 98.0 4%
f
DECBY4, FSLC, NC, WANG
g
84.6 90.6 7%
f
WDF7, WDF9, WDFB
g
90.9 96.1 6%
f
DIF, LDI LP, WDF5
g
28.4 31.4 11%
f
ADAPT, CASCADE, LEE
g
50.3 54.4 8%
Table 2: Area overhead asso ciated with micro-
preemption
We completed the synthesis tra jectory by passing
these designs through the hardware mapping and lay-
out synthesis phase. The area overhead for the six
designs using actual layouts are summarized in table
2. The areas are rep orted for the 0-p case and the
all-p case. Again using the 0-p case as the basis, the
area overhead for the all-p case varies from 4% to 11%
as shown in the last column.
37

Citations
More filters
Proceedings ArticleDOI

A Hardware Preemptive Multitasking Mechanism Based on Scan-path Register Structure for FPGA-based Reconfigurable Systems

TL;DR: A hardware preemptive multitasking mechanism which uses scan-path register structure and allows identifying the total task's register size for the FPGA-based reconfigurable systems and shows its feasibility by allowing us to design a simple computing example as well as the implementation of AES-128 encryption algorithm.
Patent

Controller support device, simulation method of control program, support program of controller and computer-readable storage medium storing support program of controller

TL;DR: In this paper, a sequence control portion of a control program is configured to execute simulation for one period to generate an execution result related to the motion control portion, and if determined as the resumable control period, the content of a resuming data buffer (828) updated in the previous control period is saved in a reuming data storage section.
Book ChapterDOI

A Preemption Algorithm for a Multitasking Environment on Dynamically Reconfigurable Processor

TL;DR: Evaluation results show that the proposed method for saving and restoring the state data of a hardware task, executing on a dynamically reconfigurable processing array, achieves a reasonable hardware overhead while satisfying a given preemption latency.
References
More filters
Book

Synthesis and optimization of digital circuits

TL;DR: This book covers techniques for synthesis and optimization of digital circuits at the architectural and logic levels, i.e., the generation of performance-and-or area-optimal circuits representations from models in hardware description languages.
Proceedings ArticleDOI

Simultaneous multithreading: maximizing on-chip parallelism

TL;DR: Simultaneous multithreading has the potential to achieve 4 times the throughput of a superscalar, and double that of fine-grain multi-threading, and is an attractive alternative to single-chip multiprocessors.
Journal ArticleDOI

Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing

TL;DR: This self-contained paper develops the theory necessary to statically schedule SDF programs on single or multiple processors, and a class of static (compile time) scheduling algorithms is proven valid, and specific algorithms are given for scheduling SDF systems onto single ormultiple processors.
Proceedings ArticleDOI

Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor

TL;DR: This paper presents an architecture for simultaneous multithreading that minimizes the architectural impact on the conventional superscalar design, has minimal performance impact on a single thread executing alone, and achieves significant throughput gains when running multiple threads.
Proceedings ArticleDOI

The Tera computer system

TL;DR: The Tera architecture was designed with several goals in mind; it needed to be suitable for very high speed implementations, i.
Related Papers (5)
Frequently Asked Questions (16)
Q1. What contributions have the authors mentioned in the paper "Micro-preemption synthesis: an enabling mechanism for multi-task vlsi systems" ?

In this paper, the authors present techniques and algorithms to incorporate micro-preemption constraints during multi-task VLSI system synthesis. Speci cally, the authors have developed: ( i ) algorithms to insert and re ne preemption points in scheduled task graphs subject to preemption latency constraints, ( ii ) techniques to minimize the context switch overhead by considering the dedicated registers required to save the state of a task on preemption and the shared registers required to save the remaining values in the tasks, and ( iii ) a controller based scheme to preclude preemption related performance degradation. 

As the preemption latency increases, the number of dedicated registers decreases, the number of shared registers increases, and the total number of registers decreases monotonously. 

Synthesis modules for hardware mapping and layout generation from HYPER high level synthesis system were used to complete the synthesis trajectory. 

Since the peak usage of shared registers cannot be known a priori, edges are bound to registers (using PreemptionContextSynthesis()) to evaluate the context switch overhead, exactly. 

The optimization problem associated with preemption context binding can be de ned as: Given a scheduled task and a set of preemption points, bind the edges to the registers so that (i) the red edges are bound to dedicated registers, and (ii) the total number of registers is minimized. 

The checkpoints (which incur some penalty in processorperformance) are used to divide the sequential instruction stream into smaller units to reduce the cost of resumption. 

The authors have also implemented a controller based scheme to eliminate the performance degradation by (i) partitioning the task states into critical sections, (ii) executing critical sections and (iii) preserving atomicity by rolling forward to the end of the critical sections on preemption. 

the context switch cost of a multi-task VLSI system with task set T is:j R j= Xt2Tj Rtd j +max t2T j Rts j (1)Performance degradation resulting from aborting a task is eliminated by (i) partitioning the task states into critical sections, (ii) executing critical sections and (iii) preserving atomicity of a critical section by rolling forward to the end of a critical section on preemption. 

the best preemption point set one for each of the tasks is obtained by using the context switch cost function given by equation 1. 

Consider a system implementing two tasks A and B. Task A takes four clock cycles and task B takes six clock cycles for one iteration. 

Towards investigating preemption point insertion, consider a task with ve edges (e1; :::; e5), an application latency of eight clock cycles and an edge-toregister binding shown in gure 3. 

Within this model, a task is represented as a hierarchical Control Data Flow Graph G(N;E; T ) (or CDFG), with nodes N representing the ow graph operations, and the edges E and T respectively the data and timing dependences between the operations. 

Notice that it has taken two clock cycles from the time a valid preemption request arrived (beginning of state A2) to the time the new task (task B) became active (end of state A3). 

The register le has one input port and one output port which are accessed at the rst half cycle and the last half cycle, respectively. 

The optimization problem can be de ned as follows: Given an underlying hardware model and N scheduled tasks, each with its own time bound ( ) and maximum preemption latency ( ), insert preemption points, and bind edges to registers, so that the context switch overhead is minimized. 

When compared to the background memory based schemes, the proposed scheme does not need (i) additional ports of the register les which are used to save/restore data to/from the background memories without stalling currently running task, (ii) additional buses to interconnect the register les to the background memories, and (iii) additional control logic to compute memory addresses.