scispace - formally typeset
Open AccessProceedings ArticleDOI

Thermal-aware compilation for system-on-chip processing architectures

Reads0
Chats0
TLDR
Several compilation techniques that, based on an efficient register allocation mechanism, reduce the percentage of hotspots in the register file and uniformly distribute the heat are proposed and the thermal profile and reliability of the device is clearly improved.
Abstract
The development of compiler-based mechanisms to reduce the percentage of hotspots and optimize the thermal profile of large register files has become an important issue. Thermal hotspots have been known to cause severe reliability issues, while the thermal profile of the devices is also related to the leakage power consumption and the cooling cost. In this paper we propose several compilation techniques that, based on an efficient register allocation mechanism, reduce the percentage of hotspots in the register file and uniformly distribute the heat. As a result, the thermal profile and reliability of the device is clearly improved. Simulation results show that the proposed flow achieved 91% reduction of hotspots and 11% reduction of the peak temperature.

read more

Content maybe subject to copyright    Report

Thermal-Aware Compilation for System-on-Chip
Processing Architectures
Mohamed M. Sabry
Embedded Systems
Laboratory, EPFL
EPFL-STI-IEL-ESL
1015 Lausanne, Switzerland
mohamed.sabry@epfl.ch
José L. Ayala
DACYA
Complutense University of
Madrid
28040 M adrid, Spain
jayala@fdi.ucm.es
David Atienza
Embedded Systems
Laboratory, EPFL
EPFL-STI-IEL-ESL
1015 Lausanne, Switzerland
david.atienza@epfl.ch
ABSTRACT
The development of compiler-based mechanisms to reduce the
percentage of hotspots and optimize the thermal profile of
large register files has b ecome an important issue. Thermal
hotspots have been known to cause severe reliability issues,
while the thermal profile of the devices is also related to the
leakage power consumption and the cooling cost. In this paper
we propose several compilation techniques that, based on an
efficient register allocation mechanism, reduce the percentage
of hotspots in the register file and uniformly distribute the
heat. As a result, the thermal profile and reliability of the
device is clearly improved. Simulation results show that the
proposed flow achieved 91% reduction of hotspots and 11%
reduction of the peak temperature.
Categories and Subject Descriptors: D.3.4 Program-
ming Languages:Processors [Compilers]; B.8 Performance and
Reliability.
General Terms:Algorithms, Management, Reliability.
Keywords:Thermal-aware, Compiler, Register-file.
1. INTRODUCTION
Temperature dissipation is an important factor in the per-
formance and reliability of embedded systems. With the ad-
vent of new technologies and scaling design parameters, ther-
mal issues have emerged as one of the key design parameters
that need to be addressed.
Thermal dissipation in integrated circuits has a negative ef-
fect on multiple aspects. On one hand, leakage current (one
of the main sources of power dissipation in todays sub-micron
technologies) presents an exponential dependency with tem-
perature [1]. Secondly, temperature has a direct impact on the
reliability of the system, because several processes are driven
by the increase of temperature or the spatial and temporal
gradients that appear during normal functioning. Tempera-
tures over a threshold in localized areas of the chip (hotspots)
can produce timing delay variations, transient reduction in
overall system performance, or even permanent damages in
the devices [2]. Moreover, the reliability factors do not only
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
GLSVLSI’10, May 16–18, 2010, Providence, Rhode Island, USA.
Copyright 2010 ACM 978-1-4503-0012-4/10/06 ...$10.00.
depend on the average temperature of the chip, but the spatial
and temporal variations have a strong influence in phenomena
like electro migration, negative bias temperature instability,
or thermal cycles [3]. It has b een shown how the Mean Time
Between Failure (MTBF) of an IC is divided by 10 for every
300
o
C rise in the junction temperature [4]. Finally, system
performance is strongly determined by the temperature. With
increasing temperature, phonon concentration increases and
causes increased scattering. Thus the carrier mobility due to
lattice scattering decreases.
These facts explain the strong efforts that nowadays are be-
ing done in the area of thermal optimization in electronic cir-
cuits. Some of these efforts look for expensive heat dissipater
and sinks that improve the thermal dissipation but increase
the cost per chip by more than $1/W [5]. Some other re-
search works are being conducted to tackle the thermal prob-
lem at different levels of abstraction. Computer architects de-
velop thermal efficient processor architectures that optimize
the thermal behavior by proposing smart ways of sharing the
computer resources [6]. Also, temperature depends on the
placement of the units in the chip. Placing heavy power con-
suming units close together will intuitively generate an even
higher temperature area in the chip as temperature is additive
in nature. In contrast, placing power consuming units close to
units that have a moderate power consumption will allow the
heat generated to dissipate through these units. Therefore,
thermal-aware floorplanning is an intense area of research [7].
Finally, the software part can control the thermal profile of
many processor-based systems by the careful execution order
of tasks, the assignment of resources, and the code generation
phase. In this area, compilers can play an important role.
Due to its high utilization and relatively small area, the reg-
ister file has been shown to have the highest peak temperature
in several studies [8]. Reducing the register file power density
spots would lead to reduction in peak temperatures for both
the entire chip and the register file, which in turn would result
in improved reliability and reduced leakage power.
The thermal response of the register file is clearly deter-
mined by the assignment of registers to the variables defined
in the source code, as well as by the profile of accesses to this
device. Both parameters can be controlled by the compiler
from a software perspective and will lead to the definition
of our optimization policies. These techniques should be con-
ceived with a minimal impact on code size and execution time
of the application.
This paper proposes a thermal-aware compilation flow that
is embedded in a state-of-the-art compiler. The proposed
flow introduces a thermal-aware compiler register realloca-
221

tion based on application-specific information regarding reg-
ister accesses and frequency of execution, as well as the con-
trol flow graph (CFG) of such application. This technique is
shown to be an effective stand-alone mechanism for tempera-
ture optimization in the microprocessor architecture.
The main contributions of this paper are the followings:
Analysis of the thermal effects that current register allo-
cators and performance-oriented compiler optimizations
have on the register file of register window-based archi-
tectures.
Development of compiler techniques that improve the
thermal profile of the register file in terms of mean and
peak temperature, as well as percentage of hotspots.
Integration of the proposed mechanisms in the CoSy
compilation flow [9], a retargetable compiler for the gen-
eration of high-quality compiled code.
Simulation results show a significant enhancements in terms
of reduction of hotspots by 73% on average, as will be shown
in section 4.1.
2. RELATED WORK
In the last years, there has been an intense work at the com-
piler level in power-aware scheduling for VLIW processors,
that propose to turn off those unused units to save leakage
power [10, 11, 12]. Some of these works [13] explicitly target
the register file for energy saving. However, these approaches
do not consider temperature as the metric to be optimized
and, therefore, the thermal profile is not optimal. Some of
the first static approaches to thermal optimization are found
in [14, 15], where load balancing heuristics and high-level syn-
thesis techniques are considered.
In Narayanan et al. [16], several techniques are proposed
to minimize the thermal emergencies in NoC-based systems
through compiler-directed power density reduction. Also, sev-
eral thermal managing techniques for multicore architectures
are explored in Donald and Martonosi [17], and Patel et al. [18]
where register temperature reduction through register file and
register duplication are investigated.
As ours, a very recent the work by Zhou et al. [19] proposes
a register reallocation algorithm for power-density minimiza-
tion in the register file. However, they target a few specific
cases (high-power density registers) in VLIW architectures.
VLIW architectures have been also considered in [20], where
a thermal-aware instruction generation algorithm is proposed.
Our proposed work differs from previous static approaches
in: first, the minimization of several thermal- and reliability-
related metrics like the mean and peak temperature of the
register file, as well as the p ercentage of hotspots, with a neg-
ligible penalty; second, the development of techniques that
cope with the limitations exhibited by register files with reg-
ister windows; and finally, the integration in a high-quality
industrial compilation flow.
3. THERMAL-AWARE REGISTER ASSIGN-
MENT FLOW
Registers in register file can exhibit a high thermal profile
due to:
1. High frequency of accesses (self-heating effect).
2. Proximity of hot registers (mutual diffusion effect).
Based on the thermal profiles collected for different bench-
marks, the following observations were noticed:
1. The thermal response of a register will reach a steady
state if the frequency of access to such register exceeds
a certain threshold, provided the adjacent registers are
not allocated. This observation implies that it is more
thermal efficient to assign the same register to two vari-
ables (iff the frequency of accesses to one of these two
variables is above the mentioned threshold) than to al-
locate two different registers.
2. The thermal profile of the device is improved when the
registers are assigned from spread spots of the register
file, reducing in this way the mutual diffusion effect.
Based on these observations, register reallocation policies
should be designed in order to minimize the number of as-
signed registers, as well as to locate physically nonadjacent
registers. However, if the application exhibits a high register
pressure, both constraints could not be jointly satisfied.
Thus, a new thermal-aware compilation flow is proposed to
minimize the high thermal profile of the register file. Such
flow is designed to minimize the mutual diffusion, as well as
self-heating effect. The proposed flow is targeted for vari-
ous processing architectures but, it mainly targets register
window-based or register bank-based processing architectures.
As shown in Figure 3, the proposed flow is integrated with
the whole compilation process before the code generation or
code emission process, where the pseudo-code generated in
the previous phases is translated to target code. This flow is
divided into two stages that appear only in register window-
based architectures: Multi-window context switching and BBCS.
Then, a third stage that can be applied on any processing ar-
chitecture, DIST MAX, takes part.
Start compilation
process
Scannig , parsing,
code analysis and
CFG generation
Register
window-based
architecture?
Mult-window context
switching
BBCS
DIST _MAX
Code generation
End compilation
Yes
No
Figure 1: Proposed thermal-aware compilation flow.
Each one of these techniques is deployed to minimize the
thermal profile of the register file, but with a different per-
spective, as will be shown in the following subsections.
3.1 Multi-window context switching
The Multi-window context switching technique aims to re-
duce the mutual thermal diffusion between two adjacent win-
dows allocated due to functional (or sub-functional) calls. For
example, assuming a function F 1 is composed of two main
222

loops, the first loop contains a call to another function F 2
while the other loop contains no functional calls. In execu-
tion, register window i will be allocated to F 1. However, since
the first loop contains a functional call to F 2, F 2 will be ex-
ecuted in the adjacent window i 1. This will lead to the
usage of two adjacent register windows that will have a ther-
mal diffusion impact on each other. Thus, the overall thermal
profile of the register file will be worsened.
The multi-window context switching technique is proposed
such that each called function will be executed in a nonadja-
cent to the recently used window, hence the thermal diffusion
between register windows will be diminished in such scenario.
The proposed technique shifts from the working register win-
dow i to a new one; k, in case of a functional call.
For a register file having N register windows, k is calculated
from Equation 1
k = (i 3 + (N%2) + N )%N (1)
This new window reallocation allows the called function to use
window i 3 in case of an even number of register windows
within the register file, and window i 2 in case of an odd
number. These specific windows are selected since these are
the first windows that comes in normal sequence after the
adjacent window; i 1. However, if i 2 is chosen for a
register file with an even number of register windows, only
half of the register file will be utilized. Besides that, the
selection of different value for the next window instead of the
chosen values might have larger overhead impact and slightly
similar performance outcome.
This technique will ameliorate the sequence in which regis-
ter windows are deployed such that the spatial distance be-
tween two consecutively used windows will be increased, as
well as the temporal separation between two physically adja-
cent windows. For example, the sequence on the access to the
register windows of a register file with 8 windows would be
0 5 27416 3 0. while the sequence for a register
file with 9 windows would be 07 5 31 8 642 0.
The enhancements resulted from applying this technique
have an overhead cost, since additional instructions will be
needed for such movement. The available instructions
1
that
shift the register window can only manage one single window
per instruction. Therefore, to shift more than one window,
it is required to repeat the execution of the shifting instruc-
tions more than once. However, this overhead is found to be
negligible, as it will be shown in section 4.1.
3.2 Basic-Block Code Splitter (BBCS)
Basic-Block Code Splitter (or BBCS) aims to reduce the
self heating effect of a register window by allocating more
than a single register window to the same function, regardless
the existence of sub-functional calls in such function. This
technique will allow a procedure to use two register windows
i and i1 instead of just one window. However, these windows
are used sequentially not simultaneously (i.e. a portion of the
procedure will be executed using register window i, and the
rest will b e executed using i 1).
This technique explores the whole procedure via its control
flow graph (CFG). From the entry basic blo ck, the graph is
being explored in a breadth first fashion. For each block, the
predecessor and the successor blocks are identified and stored
in different lists; predecessor list and successor list.
Each block in the predecessor blocks is checked to be in
predecessor list. If it is not, such block index is inserted in
1
in SPARC-like architectures
another list; notfound list. After finishing the processing of
predecessor blocks, the current block index is inserted in pre-
decessor list.
After that, each block in the successor blocks is checked
to be in notfound list. If it is found, its index is removed
from that list. If it is not, such block index is inserted in
successor list. The splitting condition is fulfilled when there
are no block indices in notfound list and there is only one
block index in successor list. When splitting occurs, a micro-
code is injected to move the live registers to the new window,
in addition to the context switching instruction.
Such condition could be elaborated as follows: the blocks
executed before the splitting should be dead (i.e. they will not
be executed again) by the time the splitting point is reached.
This condition could also be rephrased as follows: in order to
make a successful splitting, all the nodes in the control flow
graph (CFG) should lead into the same basic block BB with
no dependency on a block that will be executed after BB.
When the splitting condition is satisfied, the compiler counts
the number of input live registers that should be available at
the new window, named N
liveR
. If N
liveR
is lower than a
certain threshold ( T H), then the splitting occurs. If not, the
algorithm will continue looking for another splitting point.
The mentioned threshold is related to the number of output
registers of the register file; N
OR
, and the remaining number
instructions after the potential splitting block; N
iB
. T H can
be calculated using Equation 2.
T H =
0.05N
iB
when 0.05N
iB
N
OR
N
OR
when otherwise
(2)
This equation can be interpreted as follows: the instruction
overhead resulting from moving the live registers from the old
window to the new one should not exceed 10% the number of
proceeding instructions until the end of the procedure; N
iB
.
Such limit of instruction overhead is program independent and
assumed with this value to diminish the overhead introduced
due to context switching from both thermal and code size
point of view. Moreover, the instruction overhead is limited
by the available number of output registers in the window
N
OR
, since it is architecture based limitation and it will not
be efficient to use the memory to move the live registers to
the new window.
The overhead resulting from moving one live register is 2
instructions; one instruction is required for moving the live
register to output register, while an other will be executed af-
ter switching to move the input register to its proper location.
Therefore, for N N
OR
registers, the overhead O V equals:
O V = 2N 2N
OR
(3)
And since this overhead cannot exceed 10% of the remaining
instructions, therefore:
O V 0. 1N
iB
(4)
By substituting the values of 3 in 4:
2N 0.1N
iB
(5)
T heref ore N 0.05N
iB
(6)
And since N N
OR
(7)
T heref ore N T H (8)
Where T H is the value defined in Equation 2. For example,
if there are 7 output registers and the remaining number of
223

instructions is greater than 140, then the threshold is 7. How-
ever, if the number of remaining instructions is less than 100,
then the threshold is 5.
3.3 DIST_MAX
DIST MAX aims to reduce the thermal diffusion effect be-
tween registers within the same register window in a regis-
ter window-based architecture, or any register file in various
architectures. This technique groups the registers in several
classification classes. The registers used in a function or a pro-
cedure are classified into these classes based on an estimation
of the number of accesses to such registers. They are classified
as heavy use, medium use, low use, zeros, and system. Zeros
class includes the registers with zero number of access (i.e.
unused registers), while system class contains registers used
by the system that cannot be reallo cated such as, the stack
pointer and the registers used in passing parameters.
Statistics of registers accesses are evaluated by the compiler,
and it is used to classify registers based on standard deviation
analysis. Assuming that K number of registers are used, and
each register R has a number of accesses N
R
. The mean of
the number of accesses M
R
is defined as:
M
R
=
K
i=1
N
R
i
K
(9)
And the standard deviation of the number of accesses σ
R
, can
be calculated as follows:
σ
R
=
K
i=1
(N
R
i
M
R
)
2
K
(10)
Using these values, the registers are classified into the men-
tioned groups where:
1. Heavy used registers have values of number of accesses
M
R
+
σ
R
2
.
2. Medium used registers have values of number of accesses
M
R
σ
R
2
and < M
R
+
σ
R
2
.
3. Low used registers have values of number of accesses
< M
R
σ
R
2
After the classification of the registers, DIST MAX reallo-
cates the registers as follows:
1. Heavy used registers are placed at a maximum distance
between each other.
2. Each one of these registers is surrounded by zero or low
used registers.
3. Medium used registers are reallocated as a second or
third surrounding layer to the heavy used ones.
4. The remaining zero and low used registers are placed in
the remaining locations that have not been reallocated.
It is clear that this reallocation mechanism has a direct
dependency on the physical layout of the register file. This
means that the same number of registers used in a program
might have different reallocation maps depending on the lay-
out of the register file. Generally, the layout of the register
file could be viewed as a 2D mesh [21]. With this assumption,
the term maximum distance could be achieved with numer-
ous mapping techniques. In this paper, a simple (yet effective)
method has been applied, where a register is identified by the
row and column it belongs to. First, heavy used registers are
reallocated such that the targeted locations are identified by
different rows and columns. Then, the reallo cation is contin-
ued as mentioned before. This method would reallocate the
registers from their preallocated positions in Figure 2(a) to
the new positions, as shown in Figure 2(b). This distribution
guarantees that each row would have at most a single heavy
used register, but the number of medium used, low used, and
zero are dependent on the number of used registers in each
program.
Z L Z Z
M M M Z
H H M Z
L L M Sys
(a)
M L
H
Z
Z M Z M
H
Z M L
L M Z Sys
(b)
Figure 2: Register reallocation map using DIST MAX.
DIST MAX could reallocate any register provided there are
no restrictions to the access of such register. However, there
are some registers that cannot be reallocated. For instance,
the sys register in Figure 2(a) is a system register (similar
to a stack pointer) and cannot be reallocated to any other
location, because it could affect the system execution as well
as the probability of misuse of memory contents by the use of
the wrong stack address.
4. CASE STUDY: SPARC V8
SPARC V8 architecture has been selected as one of the
register window-based architectures [22] (other examples of
such architecture are AMP 29k and Intel i960). SPARC V8
is a 32-bit RISC machine with different integer and floating
point register files. Register windows are found only in the
integer register file, while the floating point register file is a
single window, 32 registers register file.
In the SPARC architecture, the instruction format only al-
lows the assignment of registers within the same window. i.e.
within the same instruction, the source(s) and the destination
registers should belong to the same register window. More-
over, the registers within a single window are classified into
global, output, local, and input registers. The input and out-
put registers are used for passing and returning parameters in
case of a functional call. Thus, these registers cannot be real-
located because, in case of a functional call, wrong parameters
could be passed/returned.
In those cases, DIST MAX has a very small chance of get-
ting a major enhancement in the thermal profile. This is
illustrated in Figure 3 that shows the location of the registers
within a register file assuming a 2D layout with 8 registers
per row. Also, the specifications of SPARC V8 [22] show that
there are many limitations in the reallocation of the regis-
ters. Even within the same window, various constraints do
not work in favor of the reassignment of registers:
1. The stack pointer (O6) and the frame pointer (i6) can-
not be reallocated to another registers.
2. Register i7 is used to save the return address for a called
procedure.
3. Register o7 contains the address of the calling instruc-
tion.
4. The output registers used for passing parameters cannot
be reallocated (variates from a single register to all the
existing 5 output registers).
224

fp (i6)
i7
sp(o6) /
fp(i6)
oo7/ i7
sp(o6) /
fp(i6)
o7/ i7
sp(o6) /
fp(i6)
o7/ i7
sp(o6)
o7
g0
g1
g2
g3
g4
g5
g6
g7
Figure 3: Schematic diagram showing the register file of
the SPARC V8. Black registers are irreplaceable, gray
registers are replaceable within the same group.
5. The input registers containing the incoming parameters
cannot be reallocated.
6. The global registers contain global variables used within
more than one procedure. Thus, they cannot be inter-
changed with local, input, or output registers.
These constraints limits the beneficial capability of DIST MAX
on reallocating the registers within the same window. For-
tunately, multi-window context switching and BBCS are not
affected by the mentioned constraints. Hence, the percentage
of hotspots and peak temperature have been managed to be
reduced, as will be shown in simulation results.
4.1 Simulation results
The experimental work conducted in this work has been
performed using the HW-SW emulation platform presented
in [23]. This platform is required to extract the power traces
corresponding to the execution of the application. This emu-
lation environment allows to implement the core of the SPARC
architecture and extract the required thermal statistics, like
the profile of accesses to the register file.
The proposed compilation techniques have been embed-
ded in the professional CoSy compilation framework provided
by ACE [9]. All the results have been acquired assuming a
threshold of 51
for hotspots.
The deployed SPARC processor contained an 8 window reg-
ister file that contains completely 136 registers; 8 global and 16
register per window [22]. Benchmarks from MediaBench [24]
suite have been applied to measure the proposed flow perfor-
mance.
Figure 4 shows the rate change of hotspots raised during
the execution of the MPEG2 decoding benchmark. This fig-
ure shows the execution using the default compilation flow,
the combined BBCS and DIST MAX, multi-window context
switching, and all the proposed compilation flow. Although
combining all the techniques reduced the rate of hotspots sig-
nificantly, there is no significant improvement when using each
technique separately. This can be explained by the limitations
of the window-based register file, as previously discussed. The
multi-window context switching spread the usage of windows,
but it did not modify the behavior of the register file access
within a single window. However, it allowed BBCS to make
use of multiple windows for the same procedure. Thus, the
overall number of access of each register is diminished, which
results in a significant reduction of the hotspots.
Besides the reduction of hotspots, the proposed flow also
succeeded on the balance of the thermal profile of the register
file by reducing the thermal gradients. Figure 5 shows the rate
Figure 4: Rate change of hotspots in execution of MPEG2
benchmarks using various compilation techniques.
change of the thermal gradient computed as the difference
between the maximum and minimum temperatures found on
the chip surface per unit area. This figure shows that the
thermal gradient is lowered by 38% which means that the
variation of temperature within the register file is reduced.
This observation along with the reduction of hotspots, implies
a more uniform distribution of temperature within the register
file.
Figure 5: Rate change of thermal gradient of the register
file with MPEG.
The proposed compilation flow achieves a significant re-
duction in both the percentage of hotspots and the peak
temperature, as shown in Figures 6 and 7, respectively. It
can be noticed that, although the peak temp erature of both
G711 encode and G711 decode is reduced to values similar to
those exhibited by the other applications, the percentage of
hotspots is not very much diminished. This indicates that the
hotspots were found in a single window of the register file, and
that there was not an appreciable impact of neighboring win-
dows. On average, the percentage of hotspots is reduced by
73% with respect to such values of the original compilation, as
well as the p eak temperature is reduced by 8% with respect
to the original peak temperature. The maximum reduction
in percentage of hotspots and peak temperature reached 91%
and 11%, resp ectively with MPEG benchmark.
The proposed techniques have a small impact on the code
size, which was analyzed in the experimental setup. Table 1
shows the code size of the benchmarks used and the increase
due to the extra instructions included by the proposed com-
pilation techniques. The increase in the code size, as seen
in the table, can be considered negligible because it did not
exceed 0.2%. Since these instructions do not access the mem-
ory, there is not any overhead in the dynamic memory size. It
is also worth noticing that proposed technique introduced a
225

Citations
More filters
Patent

Administering thermal distribution among memory modules of a computing system

TL;DR: In this article, the garbage collector during garbage collection determines whether a temperature measurement of a temperature sensor indicates that a memory module is overheated and, if a temperature measured by a sensor indicates a memory unit is not overheated, reallocates one or more active memory regions on the overheated memory module to a non-overheated memory unit.
Journal ArticleDOI

Temperature-aware computing

TL;DR: A thorough review of the research done in the past decade or so in the field of thermal-aware computing and lists most of the relevant journal and conference papers on this topic can be found in this article.
Book

Temperature-Aware Design and Management for 3D Multi-Core Architectures

TL;DR: This paper explores the recent advanced cooling strategies, thermal modeling frameworks, design-time optimizations and run-time thermal management schemes that are primarily targeted for 3D MPSoCs.
Proceedings ArticleDOI

Wearout-aware compiler-directed register assignment for embedded systems

TL;DR: This work proposes different wearout-aware compiler-directed register assignment techniques that distribute the stress induced wearout throughout the registerfile, with the aim of improving the lifetime of the register file, with negligible performance overhead.
Patent

Thermal-aware source code compilation

TL;DR: In this paper, the authors present a thermal-aware source code compilation for a target computing system, where the target system specifies temperature sensors that measure temperature of a memory module, and the source code is compiled into an executable application including inserting in the executable application computer program instructions for thermal aware execution.
References
More filters
Journal ArticleDOI

Design challenges of technology scaling

Shekhar Borkar
- 01 Jul 1999 - 
TL;DR: In this article, the authors look closely at past trends in technology scaling and how well microprocessor technology and products have met these goals and project the challenges that lie ahead if these trends continue.
Journal ArticleDOI

Techniques for Multicore Thermal Management: Classification and New Exploration

TL;DR: This paper explores various thermal management techniques that exploit the distributed nature of multicore processors in terms of core throttling policy, whether that policy is applied locally to a core or to the processor as a whole, and process migration policies.
Journal ArticleDOI

Standby and Active Leakage Current Control and Minimization in CMOS VLSI Circuits

TL;DR: Circuit optimization and design automation techniques are introduced to bring leakage under control in CMOS circuits and present techniques for active leakage control.
Related Papers (5)
Frequently Asked Questions (13)
Q1. What is the effect of reducing the register file power density spots?

Reducing the register file power density spots would lead to reduction in peak temperatures for both the entire chip and the register file, which in turn would result in improved reliability and reduced leakage power. 

Besides the reduction of hotspots, the proposed flow also succeeded on the balance of the thermal profile of the register file by reducing the thermal gradients. 

Due to its high utilization and relatively small area, the register file has been shown to have the highest peak temperature in several studies [8]. 

The deployed SPARC processor contained an 8 window register file that contains completely 136 registers; 8 global and 16 register per window [22]. 

The proposed flow introduces a thermal-aware compiler register realloca-tion based on application-specific information regarding register accesses and frequency of execution, as well as the control flow graph (CFG) of such application. 

if the number of remaining instructions is less than 100, then the threshold is 5.DIST MAX aims to reduce the thermal diffusion effect between registers within the same register window in a register window-based architecture, or any register file in various architectures. 

if i − 2 is chosen for a register file with an even number of register windows, only half of the register file will be utilized. 

This observation along with the reduction of hotspots, implies a more uniform distribution of temperature within the register file. 

The proposed compilation flow achieves a significant reduction in both the percentage of hotspots and the peak temperature, as shown in Figures 6 and 7, respectively. 

The main contributions of this paper are the followings:• Analysis of the thermal effects that current register allocators and performance-oriented compiler optimizations have on the register file of register window-based architectures.• 

Some of these efforts look for expensive heat dissipater and sinks that improve the thermal dissipation but increase the cost per chip by more than $1/W [5]. 

In this paper the authors have presented an efficient register-assignment mechanism that, based on a uniform distribution of accesses is able to optimize the thermal profile of the register file. 

The thermal response of the register file is clearly determined by the assignment of registers to the variables defined in the source code, as well as by the profile of accesses to this device.