How many registers are used in the 0-p case?

As the preemption latency increases, the number of dedicated registers decreases, the number of shared registers increases, and the total number of registers decreases monotonously.

How is the preemption point insertion algorithm calculated?

Towards investigating preemption point insertion, consider a task with ve edges (e1; :::; e5), an application latency of eight clock cycles and an edge-toregister binding shown in gure 3.

What is the simplest way to represent a task?

Within this model, a task is represented as a hierarchical Control Data Flow Graph G(N;E; T ) (or CDFG), with nodes N representing the ow graph operations, and the edges E and T respectively the data and timing dependences between the operations.

What is the area overhead of the proposed scheme?

When compared to the background memory based schemes, the proposed scheme does not need (i) additional ports of the register les which are used to save/restore data to/from the background memories without stalling currently running task, (ii) additional buses to interconnect the register les to the background memories, and (iii) additional control logic to compute memory addresses.

(Open Access) Micropreemption synthesis: an enabling mechanism for multitask VLSI systems (2006) | Kyosun Kim

Q: What contributions have the authors mentioned in the paper "Micro-preemption synthesis: an enabling mechanism for multi-task vlsi systems" ?

In this paper, the authors present techniques and algorithms to incorporate micro-preemption constraints during multi-task VLSI system synthesis. Speci cally, the authors have developed: ( i ) algorithms to insert and re ne preemption points in scheduled task graphs subject to preemption latency constraints, ( ii ) techniques to minimize the context switch overhead by considering the dedicated registers required to save the state of a task on preemption and the shared registers required to save the remaining values in the tasks, and ( iii ) a controller based scheme to preclude preemption related performance degradation.

Q: What is the synthesis trajectory of the HYPER high level system?

Synthesis modules for hardware mapping and layout generation from HYPER high level synthesis system were used to complete the synthesis trajectory.

Q: What is the way to evaluate the context switch overhead?

Since the peak usage of shared registers cannot be known a priori, edges are bound to registers (using PreemptionContextSynthesis()) to evaluate the context switch overhead, exactly.

Q: What is the common problem associated with preemption context binding?

The optimization problem associated with preemption context binding can be de ned as: Given a scheduled task and a set of preemption points, bind the edges to the registers so that (i) the red edges are bound to dedicated registers, and (ii) the total number of registers is minimized.

Q: What is the main idea behind the checkpoints approach?

The checkpoints (which incur some penalty in processorperformance) are used to divide the sequential instruction stream into smaller units to reduce the cost of resumption.

Q: How does the scheme reduce the performance of the multi-task VLSI system?

The authors have also implemented a controller based scheme to eliminate the performance degradation by (i) partitioning the task states into critical sections, (ii) executing critical sections and (iii) preserving atomicity by rolling forward to the end of the critical sections on preemption.

Q: What is the context switch overhead of a multitask VLSI system?

the context switch cost of a multi-task VLSI system with task set T is:j R j= Xt2Tj Rtd j +max t2T j Rts j (1)Performance degradation resulting from aborting a task is eliminated by (i) partitioning the task states into critical sections, (ii) executing critical sections and (iii) preserving atomicity of a critical section by rolling forward to the end of a critical section on preemption.

Q: How do the authors get the preemption point set?

the best preemption point set one for each of the tasks is obtained by using the context switch cost function given by equation 1.

0-89791-993-9/97 $10.00  1997 IEEE

Micro-Preemption Synthesis: An Enabling Mechanism for

Multi-Task VLSI Systems



Kyosun Kim, Ramesh Karri Miodrag Potkonjak

DepartmentofECE Department of Computer Science

University of Massachusetts University of California

Amherst, MA 01002 Los Angeles, CA 90095

karri,kkim

@ecs.umass.edu miodrag@cs.ucla.edu

Abstract -

Task preemption is a critical enabling

mechanism in multi-task VLSI systems. On pre-

emption, data in the register les must bepreserved

in order for the task to beresumed. This entails

extra memory to save the context and additional

clock cycles to restore the context. In this paper,

we present techniques and algorithms to incorporate

micro-preemption constraints during multi-task VLSI

system synthesis. Specical ly, we have developed: (i)

algorithms to insert and rene preemption points in

scheduled task graphs subject to preemption latency

constraints, (ii) techniques to minimize the context

switch overhead by considering the dedicatedregisters

required to save the state of a task on preemption and

the sharedregisters required to save the remaining val-

ues in the tasks, and (iii) a control ler based scheme to

preclude preemption relatedperformancedegradation.

1 Intro duction

1.1 Motivation

Task preemption is a critical enabling mechanism in a

variety of application scenarios. Hard real-time com-

puter systems have stringent timing requirements. In

such systems, the deadlines for critical tasks are en-

forced by preempting less critical tasks. In soft real-

time systems where infrequent deadline violations are

tolerated, less important tasks are preempted to exe-

cute more imp ortant ones so as to meet some quality-

of-service requirements. For example, in multimedia

systems, video, voice, and data streams are scheduled

and occasionally interrupted and resumed according

to priority strategies to enforce soft end-to-end dead-

lines.

Along a dierent dimension, multi-task VLSI systems

are becoming commonplace. For example, Motorola

oers numerous DSP ASPPs [3]. An ASPP can be dy-

namically congured to one of the implemented tasks.

Although the reconguration time of an ASPP maybe

very low (b ecause reconguration entails moving from

the nal state of the current task to the start state of

another task), it may not be acceptable for a critical

task in a real-time system to wait until the current

task is completed.



This researchwas supp orted by an NSF CAREER grant

MIP-9702676

On receiving a preemption request, the state of the

active task must be saved, the context of the new task

must be loaded and then executed. On completion of

task execution, the state of the preempted task must

be restored and the interrupted task resumed to com-

pletion. Imp ortant factors that should be considered

while implementing task preemption include:



Preemption latency

: It is dened as the maxi-

mum time it takes from the instant a preemption

request is received to the instant the task state is

saved.



Context switch cost:

Hardware overhead in-

curred by the installation of a preemption han-

dling scheme must be considered. A saved state

should contain only enough information (and no

more) so that the preempted task can b e resumed

at the precise point where it was interrupted. The

task state should consist of the contents of the

general purp ose registers, the condition registers,

and the relevant p ortion of background memory.



Performance degradation

: There are two

main sources of performance degradation: (i) on

a preemption request, some task states that have

already been executed may b e aborted. Retrac-

ing these ab orted states adds to the nish time

of the aborted task. (ii) Anyscheme that saves

the context of a preempted task in background

memory may stall execution units.

In this paper we will present a systematic metho dol-

ogy for incorp orating preemption constraints in multi-

task VLSI systems. Specically,we will showhow

context switch cost

and

performance degrada-

tion

can be minimized while satisfying task sp ecic

throughput and

preemption latency

constraints.

1.2 A Motivating Example

Consider a system implementing two tasks A and B.

Task A takes four clock cycles and task B takes six

clock cycles for one iteration. Let A1, A2, A3, and

A4 denote parts of task A that are executed in the

rst, second, third and fourth clo ck cycles resp ectively.

Similarly, let B1, B2, B3, B4, B5 and B6 denote parts

of task B that are executed in the rst, second, third,

fourth, fth and sixth clo ck cycles respectively. The

following assumptions were made while designing the

micro-preemption controller.

1. A non-overlapping two-phase clo ck

(clk 1, clk 2)

2. Activation of a new task (i.e. changing the

selected

task

signal) and transition of a task from one state

to the next are synchronized with falling edge of

clk

3. Setting and resetting of

preempt mask

is synchro-

nized with falling edge of

clk 1

4. A task preemption request is serviced when the

pre-

empt mask

is low and

selected task

is high and

clk 2

is falling.

A simulation snapshot of micro-preemption in the two-

task VLSI system is shown in gure 1

. To mini-

time (ns)

275.0 300.0 325.0 350.0 375.0 400.0 425.0 450.0 475.0 500.0 524.0

CLK 1

CLK 2

SELECTED TASK

PREEMPT MASK

ACTIVE TASK

STATE

A B A B NOP

A4 A1 A2 A3 B1 B2 B3 B4 A4 A1 B5 B6

NOP

Figure 1: Simulation snapshot showing preemption re-

quest and servicing mechanism in a two-task VLSI

system

mize the controller and context switchoverhead, we

mandate that task A can be preempted in

state

sA1

and A3 alone. These are called

task preemption

points

. Initially the system is in

state

A4. When

the system go es to

state

A1, execution of a new task

is requested by setting the

selected task

to B and

the data inputs to appropriate data values (these have

not been shown here for simplicity). However, it is not

avalid preemption request (since

preempt mask

high). Even when a valid preemption request arrives

state

A2 (i.e. all conditions in item 4 are satis-

ed) task

is not ab orted immediately. Rather, the

computation

rolls forward

to end of state A3 (the

next preemption p oint) b efore the preemption request

is serviced. Notice that it has taken two clo ck cycles

from the time a valid preemption request arrived (b e-

ginning of

state

A2) to the time the new task (task

B) became active (end of

state

A3). From the point

of view of task B, this is its

preemption latency

From the point of view of the multi-task VLSI system

as a whole (and task A in particular) rolling forward

of the computation has eliminated the p erformance

degradation due to an immediate ab ort. In a nutshell,

preemption p oints A1 and A3 have partitioned the ex-

ecution of task A into two

critical sections

A2, A3

and

A4, A1

. Similarly,

task preemption p oints

B2, B4 and B6 for task B partition it into three critical

sections

B1, B2

B3, B4

, and

B5, B6

the controller has been implemented using 1



SCMOS stan-

dard cell library and simulated using IRSIM.

Task

Select

Interrupt

Disable

Control

Logic

Write Enable

Read Select

State

File

To Datapaths

Control Signals

Pipeline Register

CLK2

CLK1

CLK2

State(T+1)

State(T)

ID(T)

ID(T-1)

Task ID Queue

CLK2

CLK1

Figure 2: Controller for a multi-task VLSI system sup-

porting micro-preemption

The controller is shown in gure 2. It is a collection of

nite state machines (one for each implemented task)

and has a state register le that holds the identica-

tion of the currently active task. Atevery clock cycle,

a dierent task can b e initiated or a preempted task

resumed by the task select signal. The controller sig-

nals are pip elined so that the controller delaydoesnot

aect the critical path.

The rest of the pap er is organized in the following way.

We rst briey survey the related work along several

dimensions. Next, we will discuss the computational

and hardware mo dels. In sections 3 and 4, weintro-

duce our approach, formulate the micro-preemption

synthesis problems, and describ e the proposed algo-

rithms for micro-preemption synthesis. Experimental

results are presented in section 5. In section 6, we

conclude by summarizing the results.

1.3 Related Research

Recongurable computing platforms are attracting a

lot of attention recently. A fast growing billion dollar

Field Programmable Gate Array (FPGA) industry is

supported byanumber of commercial and research

tools [12]. A number of special purpose recongurable

computers have b een built. Early work in this di-

rection includes the systems realized at Universityof

Texas, Austin (TRAC) [5]. The Splash system enables

recongurability to more than 100 dierent congura-

tions which are well suited for several computational

tasks in molecular biology [7]. Several generations of

data path recongurable video-pro cessors with accom-

panying compilation support have been developed at

University of California, Berkeley [13]. Recently, Ap-

plication Specic Programmable Processors (ASPPs)

[16]have b een introduced as an excellent candidate

for multifunctional datapaths with frequent context

switching. Though their functionalities must b e deter-

mined in the design phase, a single ASPP implement-

ing multiple functions obtains signicant area savings

when compared with the dedicated ASIC implemen-

tations of the functions.

Research in implementing interrupts is outlined next.

The IBM 360/91 supports precise and imprecise in-

terrupt handling [1]. Hwu and Patt [2] proposed a

checkpointing approach to handling interrupts. The

checkpoints (which incur some penalty in pro cessor

performance) are used to divide the sequential instruc-

tion stream into smaller units to reduce the cost of

resumption. Sohi [9]integrated the functions of reser-

vation stations and reorder buers into the register

update unit to realize precise interrupts. In addition,

Smith and Pleszkun [6] presented architectural solu-

tions suchassaving the intermediate state of vector

instructions and saving a sequence of instructions that

must be executed before saving the program counter.

Mosberger et. al. [15] presented a software-only solu-

tion to the synchronization problem in uniprocessors.

Their idea was to execute atomic sequences without

any hardware protection, and to roll the sequence for-

ward to the end, thereby preserving atomicity.

Behavioral synthesis has been active research area for

more than two decades [8], and numerous outstand-

ing systems have been built targeting both data path

oriented and control oriented applications [8]. Synthe-

sis systems that optimize p ower, testability and fault-

tolerance [14]have b een developed.

2 Computational/Hardware Mo dels

Our computational model for a single task is homoge-

neous synchronous data ow [4]. Within this mo del,

a task is represented as a hierarchical Control Data

Flow Graph

(

N; E; T

) (or CDFG), with nodes

representing the ow graph operations, and the edges

and

respectively the data and timing dep endences

between the op erations.

In mo dern designs a variety of register le mo dels have

been used [10]. From among them wehave selected the

dedicated register le hardware mo del. This model

clusters all registers in register les and each le is

then connected only to the inputs of the corresp onding

execution units. An important b enet of the chosen

hardware model is that it reduces the interconnect at

the exp ense of additional registers.

3 Issues and Our Approach

On preemption, the data in the register les must be

preserved somewhere in order for a task to resume.

In general purpose microprocessors, these values are

transferred to background memory b efore an inter-

rupt is serviced. This technique is not acceptable in

multi-task VLSI systems due to the attendant per-

formance penalty. Alternately, a register windowing

technique is used in the Sparc architecture [11]. In

this scheme, data is saved in registers within the pro-

cessor even when a new computation environmentis

required. However, it entails non-negligible area over-

heads for duplicated registers. In contrast, we pro-

pose an intuitively simple technique by classifying the

edges in the CDFG and the registers hold them into

two groups:



Dedicated registers

(

)

store the values of

edges of a task that straddle preemption points.

These edges that straddle a preemption p oint are

coecient registers (

)

that hold the constants used by

the task are not targeted during the context switch optimiza-

tion. This is because, generally these constants dier from one

task to another and cannot share registers.

called the

red

edges, and representintermediate

values essential to resume the task if preempted.



Shared registers (

)

are shared by the values

associated with the remaining edges (of all tasks)

in the system. These edges that do not straddle a

preemption point are called the

green

edges. The

dedicated registers of a task can also b e used to

store the values associated with the green edges

in the task. However, the shared registers cannot

be assigned to red edges.

Since dedicated registers cannot b e shared between

tasks, the asso ciated context switchoverhead is the

sum of the dedicated registers over all tasks. On the

other hand, the context switchoverhead due to shared

registers is the maximum value across all tasks. Over-

all, the context switch cost of a multi-task VLSI sys-

tem with task set

is:

+ max

(1)

Performance degradation resulting from aborting a

task is eliminated by (i) partitioning the task states

into

critical sections

, (ii) executing critical sections

and (iii) preserving atomicity of a critical section by

rolling forward

to the end of a critical section on pre-

emption. This is analogous to the classical approach

precise interrupts

Next, we present algorithms for (i) preemption point

insertion and (ii) preemption context synthesis that

minimizes the context switchoverhead during multi-

task VLSI system synthesis. The optimization prob-

lem can be dened as follows:

Given an underlying hardwaremodel and N scheduled

tasks, each with its own time bound (



) and maximum

preemption latency (



), insert preemption points, and

bind edges to registers, so that the context switch over-

head is minimized.

Initially, all tasks are scheduled in an integrated fash-

ion by considering their word length, precision, hard-

ware and top ological similarities. Using the number

of edges straddling a clo ck cycle as an estimate of

the context switchoverhead, preemption p oints are

inserted. The resulting preemption p oint set for each

task will have more than the minimum number of pre-

emption points. In the next step these preemption

point sets are rened. Finally, the preemption con-

text is synthesized by binding edges to registers sub-

ject to preemption constraints. The output is then

passed through hardware mapping and layout genera-

tion to ols to synthesize a multi-task VLSI system.

3.1 Preemption Point Insertion

Towards investigating preemption p oint insertion,

consider a task with ve edges (

; :::; e

), an appli-

cation latency of eight clock cycles and an edge-to-

has one input port and one output port which are ac-

cessed at the rst half cycle and the last half cycle,

respectively.

The register overhead of a preemption point can b e

estimated as the number of edges straddling it. For in-

stance, assuming a preemption latency of three yields

e4e2

PPP

12345067

Clock Cycle

Figure 3: Preemption p oint insertion and register

binding. The dotted line shows a cyclic dep endency.

clock cycles 1, 4, and 7 as preemption points as shown

in gure 3. The preemption points are marked by

`P'. On preemption p oint insertion,

and

become

red edges and are assigned to two dedicated registers

and

becomes a green edge and is assigned

to shared register

and

become green edges

but are assigned to dedicated register

. Initially,

preemption points are inserted one task at a time us-

ing a p olynomial heuristic time algorithm

InsertPre-

emptionPoints()

. It incrementally inserts preemp-

tion p oints (into each task) such that the number of

edges straddling the preemption p oints is minimized

until the preemption latency constraint is satised.

3.2 Preemption Point Renement

Minimizing dedicated registers alone do es not reduce

the context switchoverhead. Instead, it may increase

the number of shared registers and hence the total

context switchoverhead. For example, assume pre-

emption points are inserted at clo ck cycles 0 and 5.

Consider the following scenarios:

e1 e2

e3 e4 e5

e1 e5

e3 e2

(b)

123450

Clock Cycle

123450

Clock Cycle

(a)

Figure 4: Shared register overhead



scenario 1 (gure 4 (a)) :

Red edges

and

are assigned to dedicated registers

and

respectively. This results in two dedicated regis-

ters and zero shared registers for the task with a

context switchoverhead of two registers.



scenario 2 (gure 4 (b)):

Red edges

and

are bound to dedicated register

. The green

edges

and

are b ound to shared registers

and

. This results in one dedicated register

and two shared registers with a context switch

overhead of three registers.

Scenario 1

is sup erior to

scenario 2

if all other tasks

in the system do not require shared registers.

Sce-

nario 2

is sup erior to

scenario 1

if at least one of the

remaining tasks uses more than two shared registers.

Based on these observation it is clear that both the

shared and dedicated registers must b e considered in

an integrated manner to optimize the context switch

overhead.

RenePreemptionPoints (

T; P

)

1: for each

2: for (

;

++)

[

][

]

PreemptionContextSynthesis(

);

s.t. NumberOfEdges(

)is

max

NumberOfEdges(

) and

MaxPreemptionLatency(

)





;

5: if (



) break;

; /* Prune a preemption p oint*/

P C C ost

best

inf inite

;

8: while ((

GenerateConguration())



)

P C C ost

PreemptionContextCost(

R; C

);

10: if (

P C C ost < P C C ost

best

)

P C C ost

best

P C C ost

;

Figure 5: Algorithm for preemption point renement-

For each task

in task set

is the set of edges,



is the input latency, and



is the preemption latency.

For each task, we start from the preemption p oint sets

generated by the insertion step. We then generate

a list of candidate preemption point sets by pruning

preemption p oints with large context switchoverhead

(steps 2-6 in

RenePreemptionPoints()

in gure

5). Both dedicated and shared registers are used to

compute the context switchoverhead. Since the peak

usage of shared registers cannot b e known a priori,

edges are b ound to registers (using

PreemptionCon-

textSynthesis()

)toevaluate the context switchover-

head, exactly. This pruning technique is p ossible b e-

cause for each task, preemption point insertion usually

inserts more preemption points than are necessary. Fi-

nally, the b est preemption p oint set one for eachof

the tasks is obtained by using the context switch cost

function given by equation 1. This is summarized in

steps 7-10.

Consider a multi-task VLSI system implementing

three tasks,

shown in gure 6. Following the

preceding steps, task

has two candidate preemption

point sets (PPS) with context switchoverheads (CSO)

(3, 4) and (2, 5). Similarly, tasks

and

have four

and three preemption point sets, respectively. The

context switchoverhead for each preemption p oint set

is given as the two-tuple, (# of dedicated registers, #

of shared registers). Selecting preemption p oint set 2

for

, preemption p oint set 4 for

and preemption

point set 3 for

will result in a context switch cost

of (2 + 2 + 1) + max (5, 7, 5) = 12. The context

switch cost of selecting preemption point set 2 for

preemption p oint set 3 for

and preemption p oint set

2 for

is (2 + 3 + 1) + max (5, 5, 5) = 11. From

among the 2



3 = 24 congurations, this has the

lowest context switchoverhead.

PPS CSO PPS CSO PPS CSO

1 (3, 4) 1 (5, 4) 1 (2, 4)

2 (2, 5) 2 (4, 5) 2 (1, 5)

3 (3, 5) 3 (1, 5)

4 (2, 7)

Figure 6: Candidate preemption p oint sets for tasks

and

3.3 Preemption Context Synthesis

The optimization problem asso ciated with preemption

context binding can be dened as:

Given a scheduled task and a set of preemption points,

bind the edges to the registers so that (i) the rededges

arebound to dedicatedregisters, and (ii) the total num-

ber of registers is minimized.

PreemptionContextSynthesis (

E; P

)

Classify

edges into red and green

Bind

red edges to dedicated registers

Bind

green edges to dedicated or shared registers

(a)

Classify (

E; P

)

Red



;

Green

;

foreach

if (lifetime of

overlaps

)

Green

;

Red

;

(b)

Bind (

)

repeat

s.t.

:nbr

is max

:nbr

;

e:reg

min

s.t.

n:reg

e:nbr

;

until ((

)



);

(c)

Figure 7: Algorithms for preemption context synthesis

The algorithm, outlined in gure 7, minimizes the

number of dedicated registers rst and then minimizes

the number of shared registers. Initially, the algorithm

groups the edges into red and green edges using

Clas-

sify ()

. Then the red edges are bound to dedicated

registers. Finally, the green edges are b ound. The

ordering is important since while green edges can b e

bound to either the dedicated or the shared registers,

red edges can only be bound to dedicated registers.

A graph coloring heuristic

Bind ()

(outlined in g-

ure 7 (c)) is used for binding. The edge with the

largest number of b ound neighbors (

nbr

) is selected

and b ound to a register (

reg

) which is not b ound to

any of its neighbors.

4 Exp erimental Results

Micro-preemption synthesis techniques prop osed in

this pap er were validated on a set of DSP, video, con-

trol and communication applications. The selected

applications span a wide range of complexities in com-

putational structures and include Arai's fast DCT

algorithm (ARAI), decimate-by-four wave digital l-

ter (DECBY4), four-state linear controller (FSLC),

Winograd's DFT for N = 8 (FFT8), digital wavelet

transform (WAVELET) and ninth degree birecipro cal

WDF with Butterworth resp onse (WDF9). Synthesis

modules for hardware mapping and layout generation

from HYPER high level synthesis system were used to

complete the synthesis tra jectory.

4.1 Register Overhead Evaluation

multi-task allocation # of registers

VLSI system + - * 0-p 1-p all-p

ARAI, FFT8, 2 2 1 64 65 86

WAVELET

2% 34%

FIR20, VETT, 2 2 2 81 88 105

VOLTERRA

9% 30%

DECBY4, FSLC, 2 3 2 119 134 170

NC, WANG

13% 43%

WDF7, WDF9, 2 4 2 62 80 91

WDFB

29% 47%

DIF, LDI LP, 2 2 1 40 54 68

WDF5

35% 70%

ADAPT, LEE, 2 2 2 57 79 92

CASCADE

39% 61%

Table 1: Register overhead asso ciated with micro-

preemption

The results of six multi-task VLSI systems are sum-

marized in table 1. The rst column shows the appli-

cations implemented in each system. The next three

columns summarize the hardware allo cation. The last

three columns give the number of registers for the case

when no preemption p oints are inserted (0-p), when

one preemption p oint is inserted (1-p), and when pre-

emption p oints are inserted at all clo ck cycles (all-p).

Using the 0-p case as the base line, the register over-

head for the 1-p case varies from 2% to 39%. At the

other extreme, the register overhead for the all-p case

varies from 30% to 70%.

area (

) over

multi-task VLSI system 0-p all-p head

ARAI, FFT8, WAVELET

40.8 43.2 6%

FIR20, VETT, VOLTERRA

94.2 98.0 4%

DECBY4, FSLC, NC, WANG

84.6 90.6 7%

WDF7, WDF9, WDFB

90.9 96.1 6%

DIF, LDI LP, WDF5

28.4 31.4 11%

ADAPT, CASCADE, LEE

50.3 54.4 8%

Table 2: Area overhead asso ciated with micro-

preemption

We completed the synthesis tra jectory by passing

these designs through the hardware mapping and lay-

out synthesis phase. The area overhead for the six

designs using actual layouts are summarized in table

2. The areas are rep orted for the 0-p case and the

all-p case. Again using the 0-p case as the basis, the

area overhead for the all-p case varies from 4% to 11%

as shown in the last column.

Micropreemption synthesis: an enabling mechanism for multitask VLSI systems

Figures

Citations

IEEE transactions on computer-aided design of integrated circuits and systems : a publication of the IEEE Circuits and Systems Society

A Hardware Preemptive Multitasking Mechanism Based on Scan-path Register Structure for FPGA-based Reconfigurable Systems

Controller support device, simulation method of control program, support program of controller and computer-readable storage medium storing support program of controller

A study on a multitasking environment for dynamically reconfigurable processors

A Preemption Algorithm for a Multitasking Environment on Dynamically Reconfigurable Processor

References

Synthesis and optimization of digital circuits

Simultaneous multithreading: maximizing on-chip parallelism

Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing

Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor

The Tera computer system

Related Papers (5)

Multitasking on FPGA Coprocessors

Register window management for a real-time multitasking RISC

Shared memory implementation of a parallel switch-level circuit simulator

CPU Architecture Based on a Hardware Scheduler and Independent Pipeline Registers

Effective loop partitioning and scheduling under memory and register dual constraints

Frequently Asked Questions (16)

Q1. What contributions have the authors mentioned in the paper "Micro-preemption synthesis: an enabling mechanism for multi-task vlsi systems" ?

Q2. How many registers are used in the 0-p case?

Q3. What is the synthesis trajectory of the HYPER high level system?

Q4. What is the way to evaluate the context switch overhead?

Q5. What is the common problem associated with preemption context binding?

Q6. What is the main idea behind the checkpoints approach?

Q7. How does the scheme reduce the performance of the multi-task VLSI system?

Q8. What is the context switch overhead of a multitask VLSI system?

Q9. How do the authors get the preemption point set?

Q10. How many clock cycles does a task take?

Q11. How is the preemption point insertion algorithm calculated?

Q12. What is the simplest way to represent a task?

Q13. how long does it take to get a new task to be active?

Q14. How many edges are accessed at the rst half cycle?

Q15. What are the edges that do not straddle a preemption point?

Q16. What is the area overhead of the proposed scheme?