What is the effect of the proposed flow on the balance of the thermal profile of the register file?

Besides the reduction of hotspots, the proposed flow also succeeded on the balance of the thermal profile of the register file by reducing the thermal gradients.

Why does the register file have the highest peak temperature?

Due to its high utilization and relatively small area, the register file has been shown to have the highest peak temperature in several studies [8].

How many registers are in the deployed SPARC processor?

The deployed SPARC processor contained an 8 window register file that contains completely 136 registers; 8 global and 16 register per window [22].

What is the proposed compiler compiler flow?

The proposed flow introduces a thermal-aware compiler register realloca-tion based on application-specific information regarding register accesses and frequency of execution, as well as the control flow graph (CFG) of such application.

What is the threshold for a register window-based architecture?

if the number of remaining instructions is less than 100, then the threshold is 5.DIST MAX aims to reduce the thermal diffusion effect between registers within the same register window in a register window-based architecture, or any register file in various architectures.

How many windows will be used in a register file?

if i − 2 is chosen for a register file with an even number of register windows, only half of the register file will be utilized.

What is the effect of the proposed compiler on the thermal profile of the register file?

In this paper the authors have presented an efficient register-assignment mechanism that, based on a uniform distribution of accesses is able to optimize the thermal profile of the register file.

What is the thermal response of the register file?

The thermal response of the register file is clearly determined by the assignment of registers to the variables defined in the source code, as well as by the profile of accesses to this device.

(Open Access) Thermal-aware compilation for system-on-chip processing architectures (2010) | Mohamed M. Sabry

Q: What is the effect of reducing the register file power density spots?

Reducing the register file power density spots would lead to reduction in peak temperatures for both the entire chip and the register file, which in turn would result in improved reliability and reduced leakage power.

Q: What is the effect of the proposed flow on the temperature distribution of the register file?

This observation along with the reduction of hotspots, implies a more uniform distribution of temperature within the register file.

Q: What is the effect of the proposed compilation flow on the temperature of the register file?

The proposed compilation flow achieves a significant reduction in both the percentage of hotspots and the peak temperature, as shown in Figures 6 and 7, respectively.

Q: What are the main contributions of this paper?

The main contributions of this paper are the followings:• Analysis of the thermal effects that current register allocators and performance-oriented compiler optimizations have on the register file of register window-based architectures.•

Thermal-Aware Compilation for System-on-Chip

Processing Architectures

Mohamed M. Sabry

Embedded Systems

Laboratory, EPFL

EPFL-STI-IEL-ESL

1015 Lausanne, Switzerland

mohamed.sabry@epﬂ.ch

José L. Ayala

DACYA

Complutense University of

Madrid

28040 M adrid, Spain

jayala@fdi.ucm.es

David Atienza

Embedded Systems

Laboratory, EPFL

EPFL-STI-IEL-ESL

1015 Lausanne, Switzerland

david.atienza@epﬂ.ch

ABSTRACT

The development of compiler-based mechanisms to reduce the

percentage of hotspots and optimize the thermal proﬁle of

large register ﬁles has b ecome an important issue. Thermal

hotspots have been known to cause severe reliability issues,

while the thermal proﬁle of the devices is also related to the

leakage power consumption and the cooling cost. In this paper

we propose several compilation techniques that, based on an

eﬃcient register allocation mechanism, reduce the percentage

of hotspots in the register ﬁle and uniformly distribute the

heat. As a result, the thermal proﬁle and reliability of the

device is clearly improved. Simulation results show that the

proposed ﬂow achieved 91% reduction of hotspots and 11%

reduction of the peak temperature.

Categories and Subject Descriptors: D.3.4 Program-

ming Languages:Processors [Compilers]; B.8 Performance and

Reliability.

General Terms:Algorithms, Management, Reliability.

Keywords:Thermal-aware, Compiler, Register-ﬁle.

1. INTRODUCTION

Temperature dissipation is an important factor in the per-

formance and reliability of embedded systems. With the ad-

vent of new technologies and scaling design parameters, ther-

mal issues have emerged as one of the key design parameters

that need to be addressed.

Thermal dissipation in integrated circuits has a negative ef-

fect on multiple aspects. On one hand, leakage current (one

of the main sources of power dissipation in todays sub-micron

technologies) presents an exponential dependency with tem-

perature [1]. Secondly, temperature has a direct impact on the

reliability of the system, because several processes are driven

by the increase of temperature or the spatial and temporal

gradients that appear during normal functioning. Tempera-

tures over a threshold in localized areas of the chip (hotspots)

can produce timing delay variations, transient reduction in

overall system performance, or even permanent damages in

the devices [2]. Moreover, the reliability factors do not only

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for proﬁt or commercial advantage and that copies

bear this notice and the full citation on the ﬁrst page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior speciﬁc

permission and/or a fee.

GLSVLSI’10, May 16–18, 2010, Providence, Rhode Island, USA.

depend on the average temperature of the chip, but the spatial

and temporal variations have a strong inﬂuence in phenomena

like electro migration, negative bias temperature instability,

or thermal cycles [3]. It has b een shown how the Mean Time

Between Failure (MTBF) of an IC is divided by 10 for every

300

C rise in the junction temperature [4]. Finally, system

performance is strongly determined by the temperature. With

increasing temperature, phonon concentration increases and

causes increased scattering. Thus the carrier mobility due to

lattice scattering decreases.

These facts explain the strong eﬀorts that nowadays are be-

ing done in the area of thermal optimization in electronic cir-

cuits. Some of these eﬀorts look for expensive heat dissipater

and sinks that improve the thermal dissipation but increase

the cost per chip by more than $1/W [5]. Some other re-

search works are being conducted to tackle the thermal prob-

lem at diﬀerent levels of abstraction. Computer architects de-

velop thermal eﬃcient processor architectures that optimize

the thermal behavior by proposing smart ways of sharing the

computer resources [6]. Also, temperature depends on the

placement of the units in the chip. Placing heavy power con-

suming units close together will intuitively generate an even

higher temperature area in the chip as temperature is additive

in nature. In contrast, placing power consuming units close to

units that have a moderate power consumption will allow the

heat generated to dissipate through these units. Therefore,

thermal-aware ﬂoorplanning is an intense area of research [7].

Finally, the software part can control the thermal proﬁle of

many processor-based systems by the careful execution order

of tasks, the assignment of resources, and the code generation

phase. In this area, compilers can play an important role.

Due to its high utilization and relatively small area, the reg-

ister ﬁle has been shown to have the highest peak temperature

in several studies [8]. Reducing the register ﬁle power density

spots would lead to reduction in peak temperatures for both

the entire chip and the register ﬁle, which in turn would result

in improved reliability and reduced leakage power.

The thermal response of the register ﬁle is clearly deter-

mined by the assignment of registers to the variables deﬁned

in the source code, as well as by the proﬁle of accesses to this

device. Both parameters can be controlled by the compiler

from a software perspective and will lead to the deﬁnition

of our optimization policies. These techniques should be con-

ceived with a minimal impact on code size and execution time

of the application.

This paper proposes a thermal-aware compilation ﬂow that

is embedded in a state-of-the-art compiler. The proposed

ﬂow introduces a thermal-aware compiler register realloca-

221

tion based on application-speciﬁc information regarding reg-

ister accesses and frequency of execution, as well as the con-

trol ﬂow graph (CFG) of such application. This technique is

shown to be an eﬀective stand-alone mechanism for tempera-

ture optimization in the microprocessor architecture.

The main contributions of this paper are the followings:

• Analysis of the thermal eﬀects that current register allo-

cators and performance-oriented compiler optimizations

have on the register ﬁle of register window-based archi-

tectures.

• Development of compiler techniques that improve the

thermal proﬁle of the register ﬁle in terms of mean and

peak temperature, as well as percentage of hotspots.

• Integration of the proposed mechanisms in the CoSy

compilation ﬂow [9], a retargetable compiler for the gen-

eration of high-quality compiled code.

Simulation results show a signiﬁcant enhancements in terms

of reduction of hotspots by 73% on average, as will be shown

in section 4.1.

2. RELATED WORK

In the last years, there has been an intense work at the com-

piler level in power-aware scheduling for VLIW processors,

that propose to turn oﬀ those unused units to save leakage

power [10, 11, 12]. Some of these works [13] explicitly target

the register ﬁle for energy saving. However, these approaches

do not consider temperature as the metric to be optimized

and, therefore, the thermal proﬁle is not optimal. Some of

the ﬁrst static approaches to thermal optimization are found

in [14, 15], where load balancing heuristics and high-level syn-

thesis techniques are considered.

In Narayanan et al. [16], several techniques are proposed

to minimize the thermal emergencies in NoC-based systems

through compiler-directed power density reduction. Also, sev-

eral thermal managing techniques for multicore architectures

are explored in Donald and Martonosi [17], and Patel et al. [18]

where register temperature reduction through register ﬁle and

As ours, a very recent the work by Zhou et al. [19] proposes

a register reallocation algorithm for power-density minimiza-

tion in the register ﬁle. However, they target a few speciﬁc

cases (high-power density registers) in VLIW architectures.

VLIW architectures have been also considered in [20], where

a thermal-aware instruction generation algorithm is proposed.

Our proposed work diﬀers from previous static approaches

in: ﬁrst, the minimization of several thermal- and reliability-

related metrics like the mean and peak temperature of the

ligible penalty; second, the development of techniques that

cope with the limitations exhibited by register ﬁles with reg-

ister windows; and ﬁnally, the integration in a high-quality

industrial compilation ﬂow.

3. THERMAL-AWARE REGISTER ASSIGN-

MENT FLOW

Registers in register ﬁle can exhibit a high thermal proﬁle

due to:

1. High frequency of accesses (self-heating eﬀect).

2. Proximity of hot registers (mutual diﬀusion eﬀect).

Based on the thermal proﬁles collected for diﬀerent bench-

marks, the following observations were noticed:

1. The thermal response of a register will reach a steady

state if the frequency of access to such register exceeds

a certain threshold, provided the adjacent registers are

not allocated. This observation implies that it is more

thermal eﬃcient to assign the same register to two vari-

ables (iﬀ the frequency of accesses to one of these two

variables is above the mentioned threshold) than to al-

locate two diﬀerent registers.

2. The thermal proﬁle of the device is improved when the

registers are assigned from spread spots of the register

ﬁle, reducing in this way the mutual diﬀusion eﬀect.

Based on these observations, register reallocation policies

should be designed in order to minimize the number of as-

signed registers, as well as to locate physically nonadjacent

registers. However, if the application exhibits a high register

pressure, both constraints could not be jointly satisﬁed.

Thus, a new thermal-aware compilation ﬂow is proposed to

minimize the high thermal proﬁle of the register ﬁle. Such

ﬂow is designed to minimize the mutual diﬀusion, as well as

self-heating eﬀect. The proposed ﬂow is targeted for vari-

ous processing architectures but, it mainly targets register

window-based or register bank-based processing architectures.

As shown in Figure 3, the proposed ﬂow is integrated with

the whole compilation process before the code generation or

code emission process, where the pseudo-code generated in

the previous phases is translated to target code. This ﬂow is

divided into two stages that appear only in register window-

based architectures: Multi-window context switching and BBCS.

Then, a third stage that can be applied on any processing ar-

chitecture, DIST MAX, takes part.

Start compilation

process

Scannig , parsing,

code analysis and

CFG generation

window-based

architecture?

Mult-window context

switching

BBCS

DIST _MAX

Code generation

End compilation

Yes

No

Figure 1: Proposed thermal-aware compilation ﬂow.

Each one of these techniques is deployed to minimize the

thermal proﬁle of the register ﬁle, but with a diﬀerent per-

spective, as will be shown in the following subsections.

3.1 Multi-window context switching

The Multi-window context switching technique aims to re-

duce the mutual thermal diﬀusion between two adjacent win-

dows allocated due to functional (or sub-functional) calls. For

example, assuming a function F 1 is composed of two main

222

loops, the ﬁrst loop contains a call to another function F 2

while the other loop contains no functional calls. In execu-

tion, register window i will be allocated to F 1. However, since

the ﬁrst loop contains a functional call to F 2, F 2 will be ex-

ecuted in the adjacent window i − 1. This will lead to the

usage of two adjacent register windows that will have a ther-

mal diﬀusion impact on each other. Thus, the overall thermal

proﬁle of the register ﬁle will be worsened.

The multi-window context switching technique is proposed

such that each called function will be executed in a nonadja-

cent to the recently used window, hence the thermal diﬀusion

between register windows will be diminished in such scenario.

The proposed technique shifts from the working register win-

dow i to a new one; k, in case of a functional call.

For a register ﬁle having N register windows, k is calculated

from Equation 1

k = (i − 3 + (N%2) + N )%N (1)

This new window reallocation allows the called function to use

window i − 3 in case of an even number of register windows

within the register ﬁle, and window i − 2 in case of an odd

number. These speciﬁc windows are selected since these are

the ﬁrst windows that comes in normal sequence after the

adjacent window; i − 1. However, if i − 2 is chosen for a

half of the register ﬁle will be utilized. Besides that, the

selection of diﬀerent value for the next window instead of the

chosen values might have larger overhead impact and slightly

similar performance outcome.

This technique will ameliorate the sequence in which regis-

ter windows are deployed such that the spatial distance be-

tween two consecutively used windows will be increased, as

well as the temporal separation between two physically adja-

cent windows. For example, the sequence on the access to the

0− 5− 2−7−4−1−6− 3− 0. while the sequence for a register

ﬁle with 9 windows would be 0−7− 5− 3−1− 8− 6−4−2− 0.

The enhancements resulted from applying this technique

have an overhead cost, since additional instructions will be

needed for such movement. The available instructions

that

shift the register window can only manage one single window

per instruction. Therefore, to shift more than one window,

it is required to repeat the execution of the shifting instruc-

tions more than once. However, this overhead is found to be

negligible, as it will be shown in section 4.1.

3.2 Basic-Block Code Splitter (BBCS)

Basic-Block Code Splitter (or BBCS) aims to reduce the

self heating eﬀect of a register window by allocating more

than a single register window to the same function, regardless

the existence of sub-functional calls in such function. This

technique will allow a procedure to use two register windows

i and i−1 instead of just one window. However, these windows

are used sequentially not simultaneously (i.e. a portion of the

procedure will be executed using register window i, and the

rest will b e executed using i − 1).

This technique explores the whole procedure via its control

ﬂow graph (CFG). From the entry basic blo ck, the graph is

being explored in a breadth ﬁrst fashion. For each block, the

predecessor and the successor blocks are identiﬁed and stored

in diﬀerent lists; predecessor list and successor list.

Each block in the predecessor blocks is checked to be in

predecessor list. If it is not, such block index is inserted in

in SPARC-like architectures

another list; notfound list. After ﬁnishing the processing of

predecessor blocks, the current block index is inserted in pre-

decessor list.

After that, each block in the successor blocks is checked

to be in notfound list. If it is found, its index is removed

from that list. If it is not, such block index is inserted in

successor list. The splitting condition is fulﬁlled when there

are no block indices in notfound list and there is only one

block index in successor list. When splitting occurs, a micro-

code is injected to move the live registers to the new window,

in addition to the context switching instruction.

Such condition could be elaborated as follows: the blocks

executed before the splitting should be dead (i.e. they will not

be executed again) by the time the splitting point is reached.

This condition could also be rephrased as follows: in order to

make a successful splitting, all the nodes in the control ﬂow

graph (CFG) should lead into the same basic block BB with

no dependency on a block that will be executed after BB.

When the splitting condition is satisﬁed, the compiler counts

the number of input live registers that should be available at

the new window, named N

liveR

. If N

liveR

is lower than a

certain threshold ( T H), then the splitting occurs. If not, the

algorithm will continue looking for another splitting point.

The mentioned threshold is related to the number of output

registers of the register ﬁle; N

, and the remaining number

instructions after the potential splitting block; N

. T H can

be calculated using Equation 2.

T H =







0.05N

when 0.05N

≤ N

when otherwise

(2)

This equation can be interpreted as follows: the instruction

overhead resulting from moving the live registers from the old

window to the new one should not exceed 10% the number of

proceeding instructions until the end of the procedure; N

Such limit of instruction overhead is program independent and

assumed with this value to diminish the overhead introduced

due to context switching from both thermal and code size

point of view. Moreover, the instruction overhead is limited

by the available number of output registers in the window

, since it is architecture based limitation and it will not

be eﬃcient to use the memory to move the live registers to

the new window.

The overhead resulting from moving one live register is 2

instructions; one instruction is required for moving the live

ter switching to move the input register to its proper location.

Therefore, for N ≤ N

registers, the overhead O V equals:

O V = 2N ≤ 2N

(3)

And since this overhead cannot exceed 10% of the remaining

instructions, therefore:

O V ≤ 0. 1N

(4)

By substituting the values of 3 in 4:

2N ≤ 0.1N

(5)

T heref ore N ≤ 0.05N

(6)

And since N ≤ N

(7)

T heref ore N ≤ T H (8)

Where T H is the value deﬁned in Equation 2. For example,

if there are 7 output registers and the remaining number of

223

instructions is greater than 140, then the threshold is 7. How-

ever, if the number of remaining instructions is less than 100,

then the threshold is 5.

3.3 DIST_MAX

DIST MAX aims to reduce the thermal diﬀusion eﬀect be-

tween registers within the same register window in a regis-

ter window-based architecture, or any register ﬁle in various

architectures. This technique groups the registers in several

classiﬁcation classes. The registers used in a function or a pro-

cedure are classiﬁed into these classes based on an estimation

of the number of accesses to such registers. They are classiﬁed

as heavy use, medium use, low use, zeros, and system. Zeros

class includes the registers with zero number of access (i.e.

unused registers), while system class contains registers used

by the system that cannot be reallo cated such as, the stack

pointer and the registers used in passing parameters.

Statistics of registers accesses are evaluated by the compiler,

and it is used to classify registers based on standard deviation

analysis. Assuming that K number of registers are used, and

each register R has a number of accesses N

. The mean of

the number of accesses M

is deﬁned as:

∑

i=1

(9)

And the standard deviation of the number of accesses σ

, can

be calculated as follows:

√

∑

i=1

− M

)

(10)

Using these values, the registers are classiﬁed into the men-

tioned groups where:

1. Heavy used registers have values of number of accesses

≥ M

2. Medium used registers have values of number of accesses

≥ M

−

and < M

3. Low used registers have values of number of accesses

< M

−

After the classiﬁcation of the registers, DIST MAX reallo-

cates the registers as follows:

1. Heavy used registers are placed at a maximum distance

between each other.

2. Each one of these registers is surrounded by zero or low

used registers.

3. Medium used registers are reallocated as a second or

third surrounding layer to the heavy used ones.

4. The remaining zero and low used registers are placed in

the remaining locations that have not been reallocated.

It is clear that this reallocation mechanism has a direct

dependency on the physical layout of the register ﬁle. This

means that the same number of registers used in a program

might have diﬀerent reallocation maps depending on the lay-

out of the register ﬁle. Generally, the layout of the register

ﬁle could be viewed as a 2D mesh [21]. With this assumption,

the term maximum distance could be achieved with numer-

ous mapping techniques. In this paper, a simple (yet eﬀective)

method has been applied, where a register is identiﬁed by the

row and column it belongs to. First, heavy used registers are

reallocated such that the targeted locations are identiﬁed by

diﬀerent rows and columns. Then, the reallo cation is contin-

ued as mentioned before. This method would reallocate the

registers from their preallocated positions in Figure 2(a) to

the new positions, as shown in Figure 2(b). This distribution

guarantees that each row would have at most a single heavy

used register, but the number of medium used, low used, and

zero are dependent on the number of used registers in each

program.

Z L Z Z

M M M Z

H H M Z

L L M Sys

(a)

M L

H

Z

Z M Z M

H

Z M L

L M Z Sys

(b)

Figure 2: Register reallocation map using DIST MAX.

DIST MAX could reallocate any register provided there are

no restrictions to the access of such register. However, there

are some registers that cannot be reallocated. For instance,

the sys register in Figure 2(a) is a system register (similar

to a stack pointer) and cannot be reallocated to any other

location, because it could aﬀect the system execution as well

as the probability of misuse of memory contents by the use of

the wrong stack address.

4. CASE STUDY: SPARC V8

SPARC V8 architecture has been selected as one of the

such architecture are AMP 29k and Intel i960). SPARC V8

is a 32-bit RISC machine with diﬀerent integer and ﬂoating

point register ﬁles. Register windows are found only in the

integer register ﬁle, while the ﬂoating point register ﬁle is a

single window, 32 registers register ﬁle.

In the SPARC architecture, the instruction format only al-

lows the assignment of registers within the same window. i.e.

within the same instruction, the source(s) and the destination

registers should belong to the same register window. More-

over, the registers within a single window are classiﬁed into

global, output, local, and input registers. The input and out-

put registers are used for passing and returning parameters in

case of a functional call. Thus, these registers cannot be real-

located because, in case of a functional call, wrong parameters

could be passed/returned.

In those cases, DIST MAX has a very small chance of get-

ting a major enhancement in the thermal proﬁle. This is

illustrated in Figure 3 that shows the location of the registers

within a register ﬁle assuming a 2D layout with 8 registers

per row. Also, the speciﬁcations of SPARC V8 [22] show that

there are many limitations in the reallocation of the regis-

ters. Even within the same window, various constraints do

not work in favor of the reassignment of registers:

1. The stack pointer (O6) and the frame pointer (i6) can-

not be reallocated to another registers.

2. Register i7 is used to save the return address for a called

procedure.

3. Register o7 contains the address of the calling instruc-

tion.

4. The output registers used for passing parameters cannot

be reallocated (variates from a single register to all the

existing 5 output registers).

224

fp (i6)

sp(o6) /

fp(i6)

oo7/ i7

sp(o6) /

fp(i6)

o7/ i7

sp(o6) /

fp(i6)

o7/ i7

sp(o6)

Window

i-1

Window

i+1

Window

i+2

Figure 3: Schematic diagram showing the register ﬁle of

the SPARC V8. Black registers are irreplaceable, gray

registers are replaceable within the same group.

5. The input registers containing the incoming parameters

cannot be reallocated.

6. The global registers contain global variables used within

more than one procedure. Thus, they cannot be inter-

changed with local, input, or output registers.

These constraints limits the beneﬁcial capability of DIST MAX

on reallocating the registers within the same window. For-

tunately, multi-window context switching and BBCS are not

aﬀected by the mentioned constraints. Hence, the percentage

of hotspots and peak temperature have been managed to be

reduced, as will be shown in simulation results.

4.1 Simulation results

The experimental work conducted in this work has been

performed using the HW-SW emulation platform presented

in [23]. This platform is required to extract the power traces

corresponding to the execution of the application. This emu-

lation environment allows to implement the core of the SPARC

architecture and extract the required thermal statistics, like

the proﬁle of accesses to the register ﬁle.

The proposed compilation techniques have been embed-

ded in the professional CoSy compilation framework provided

by ACE [9]. All the results have been acquired assuming a

threshold of 51



for hotspots.

The deployed SPARC processor contained an 8 window reg-

ister ﬁle that contains completely 136 registers; 8 global and 16

suite have been applied to measure the proposed ﬂow perfor-

mance.

Figure 4 shows the rate change of hotspots raised during

the execution of the MPEG2 decoding benchmark. This ﬁg-

ure shows the execution using the default compilation ﬂow,

the combined BBCS and DIST MAX, multi-window context

switching, and all the proposed compilation ﬂow. Although

combining all the techniques reduced the rate of hotspots sig-

niﬁcantly, there is no signiﬁcant improvement when using each

technique separately. This can be explained by the limitations

of the window-based register ﬁle, as previously discussed. The

multi-window context switching spread the usage of windows,

but it did not modify the behavior of the register ﬁle access

within a single window. However, it allowed BBCS to make

use of multiple windows for the same procedure. Thus, the

overall number of access of each register is diminished, which

results in a signiﬁcant reduction of the hotspots.

Besides the reduction of hotspots, the proposed ﬂow also

succeeded on the balance of the thermal proﬁle of the register

ﬁle by reducing the thermal gradients. Figure 5 shows the rate

Figure 4: Rate change of hotspots in execution of MPEG2

benchmarks using various compilation techniques.

change of the thermal gradient computed as the diﬀerence

between the maximum and minimum temperatures found on

the chip surface per unit area. This ﬁgure shows that the

thermal gradient is lowered by 38% which means that the

variation of temperature within the register ﬁle is reduced.

This observation along with the reduction of hotspots, implies

a more uniform distribution of temperature within the register

ﬁle.

Figure 5: Rate change of thermal gradient of the register

ﬁle with MPEG.

The proposed compilation ﬂow achieves a signiﬁcant re-

duction in both the percentage of hotspots and the peak

temperature, as shown in Figures 6 and 7, respectively. It

can be noticed that, although the peak temp erature of both

G711 encode and G711 decode is reduced to values similar to

those exhibited by the other applications, the percentage of

hotspots is not very much diminished. This indicates that the

hotspots were found in a single window of the register ﬁle, and

that there was not an appreciable impact of neighboring win-

dows. On average, the percentage of hotspots is reduced by

73% with respect to such values of the original compilation, as

well as the p eak temperature is reduced by 8% with respect

to the original peak temperature. The maximum reduction

in percentage of hotspots and peak temperature reached 91%

and 11%, resp ectively with MPEG benchmark.

The proposed techniques have a small impact on the code

size, which was analyzed in the experimental setup. Table 1

shows the code size of the benchmarks used and the increase

due to the extra instructions included by the proposed com-

pilation techniques. The increase in the code size, as seen

in the table, can be considered negligible because it did not

exceed 0.2%. Since these instructions do not access the mem-

ory, there is not any overhead in the dynamic memory size. It

is also worth noticing that proposed technique introduced a

225

Thermal-aware compilation for system-on-chip processing architectures

Figures

Citations

Administering thermal distribution among memory modules of a computing system

Temperature-aware computing

Temperature-Aware Design and Management for 3D Multi-Core Architectures

Wearout-aware compiler-directed register assignment for embedded systems

Thermal-aware source code compilation

References

Design challenges of technology scaling

Techniques for Multicore Thermal Management: Classification and New Exploration

Standby and Active Leakage Current Control and Minimization in CMOS VLSI Circuits

The SPARC architecture manual: version 8

Techniques for Multicore Thermal Management: Classification and New Exploration

Related Papers (5)

Temperature-Aware Compilation for VLIWProcessors

Thermal Balancing Policy for Multiprocessor Stream Computing Platforms

Processor reliability enhancement through compiler-directed register file peak temperature reduction

COOL: control-based optimization of load-balancing for thermal behavior

Thermal-aware scratchpad memory design and allocation

Frequently Asked Questions (13)

Q1. What is the effect of reducing the register file power density spots?

Q2. What is the effect of the proposed flow on the balance of the thermal profile of the register file?

Q3. Why does the register file have the highest peak temperature?

Q4. How many registers are in the deployed SPARC processor?

Q5. What is the proposed compiler compiler flow?

Q6. What is the threshold for a register window-based architecture?

Q7. How many windows will be used in a register file?

Q8. What is the effect of the proposed flow on the temperature distribution of the register file?

Q9. What is the effect of the proposed compilation flow on the temperature of the register file?

Q10. What are the main contributions of this paper?

Q11. What are the efforts to improve the thermal dissipation of electronic circuits?

Q12. What is the effect of the proposed compiler on the thermal profile of the register file?

Q13. What is the thermal response of the register file?