scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Active bank switching for temperature control of the register file in a microprocessor

11 Mar 2007-pp 231-234
TL;DR: Experimental results show that this periodic active bank switching scheme achieves 3.4°C of steady-state temperature reduction, with a mere 0.75% average performance penalty.
Abstract: An effective thermal management scheme, called active bank switching, for temperature control in the register file of a microprocessor is presented. The idea is to divide the physical register file into two equal-sized banks, and to alternate between the two banks when allocating new registers to the instruction operands. Experimental results show that this periodic active bank switching scheme achieves 3.4°C of steady-state temperature reduction, with a mere 0.75% average performance penalty.

Summary (3 min read)

1. INTRODUCTION

  • Peak power dissipation and the resulting temperature rise have become a limiting factor to microprocessor performance and a significant component of its cost.
  • Dynamic thermal management (DTM) has been proposed as a class of microarchitectural solutions and software strategies to achieve the highest processor performance under a peak temperature limit.
  • A DTM method specifically targeted toward temperature control in the register file was presented in [7].
  • This method, called activity migration, is quite effective, albeit it has a large area overhead.
  • The authors idea is based on the observation that the register file is not fully utilized over a program’s execution, i.e., the lifetime of registers/operands are short such that the authors only need a rather small number of physical registers to be active during most of the cpu cycles.

2.1 Register File Utilization

  • Many 32-bit instruction set architectures (ISA) are designed to have 32 architectural registers although modern superscalar processors have more than 32 physical registers.
  • Note that on average for about 90% the time, less than a half of the physical registers (32) are actually allocated.
  • This is because although a new instruction is dispatched and allocated to a physical register, much of the time this instruction is not issued and executed due to the data dependencies among instructions.

2.2 Periodic Bank Switching: The Idea

  • Based on the above observation, the authors propose to divide the register file into two equal-sized banks and use only one bank at a time, i.e., the number of physical registers available at any time is one half of the original count and registers are allocated from one of these two banks.
  • Here the authors designate the active bank as a primary bank and the other one as a secondary bank.
  • Registers are allocated first from the primary bank and only if the primary bank is full, the allocation is done from the secondary bank.
  • When bank switching occurs, there might still be some references to the nonactive bank.
  • These pending references will be relatively small compared to the number of references to the active bank.

2.3 Thermal Zones and Thermal Gradients

  • The authors have carried out detailed analysis of the temperature regions in terms of thermal gradients and classified them into two zones: 1) Fast Temperature Rise (FTR) zone:.
  • The rising thermal gradient is higher than the falling thermal gradient i.e., the temperature rises faster than it falls (when the chip is allowed to cool off).
  • Based on their simulations , the FTF zone is above the FTR zone.
  • This is fortunate because the temperature profile of a microprocessor chip is such that DTM techniques become more effective as the chip temperature rises.
  • If the ST lies in the FTF zone, then the DTM methods tend to work very well and the new ST of the chip will be significantly lower.

2.4 Thermal Model

  • To mathematically support the periodic active bank switching idea, the authors use a thermal model developed by Skadron et al. in [3].
  • After a time interval, the new temperature becomes: new oldT T T= + Δ (2) Let tinitial and tfinal denote two instances of time (and their difference be denoted by ∆t), respectively.
  • More precisely, a simple DTM policy where the authors regularly (i.e., at fixed timing intervals) switch between the primary and secondary banks is sufficient.
  • The authors have found that a fixed interval of 10M CPU cycles is adequate for their purposes and that the overall reduction in ST is not sensitive to the exact length of this interval.
  • Similarly, the actual rising thermal gradient in the newly active bank is smaller than equation (3) since some of the registers previously mapped to the sleep bank are alive and accessed from that bank for certain cycles.

2.5 Overhead

  • It is expected that the banked structure in physical register file needs extra control logics and the renaming logic need to be changed to allocate new registers from the active bank only.
  • These area penalties are much smaller than those for the activity migration method, which duplicates the entire register file.
  • Furthermore, the periodic active bank switching scheme does not have self-producing performance penalty as is the case for the activity migration method since the authors do not need to transfer the content of registers from one bank to the other.

3.2 Methodology

  • For the experiments, the authors integrate SimpleScalar [9], Wattch [10] and Hotspot [11] into one simulator.
  • The authors position this half sized register file in the center of the register file area and the surrounding area is kept empty ).
  • For the first 200K cycles of each benchmark program run, the authors obtain the typical power figure for the register file (along with other functional units).
  • The thermal simulation is carried out in order to find the steady-state temperature for the register file.
  • For the test applications, the authors used SPEC2000INT benchmarks [12] with reference/train input files, Mediabench program [13] and MPEG-2 decoder program [14].

3.3 Experimental Results

  • Next, the authors ran the application with a banked register file with active bank switching.
  • Note the relationship between the steady-state temperature and the IPC of each program: Compared to the upper curve, note that (a) the application program’s thermal behavior is maintained in the lower curves, and (b) the periodic active bank switching is observed between the two lower curves.
  • Note also that two lower curves lay one upon another with very small thermal differences.
  • The performance penalties reported correspond half sized (32) register file.

4. CONCLUSION

  • The authors presented an effective steady-state temperature reduction method by adopting a banked structure in the register file.
  • In their scheme, only one bank is active at a time and the authors keep switching between the two available banks.
  • With banking, the authors achieve a sizeable steady-state temperature reduction with a small performance penalty.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Active Bank Switching for Temperature Control of the
Register File in a Microprocessor
Kimish Patel, Wonbok Lee, Massoud Pedram
Department of Electrical Engineering
University of Southern California
Los Angeles CA 90089
{Kimishpa, wonbokle, pedram}@usc.edu
ABSTRACT
An effective thermal management scheme, called active bank
switching, for temperature control in the register file of a
microprocessor is presented. The idea is to divide the physical
register file into two equal-sized banks, and to alternate between
the two banks when allocating new registers to the instruction
operands. Experimental results show that this periodic active bank
switching scheme achieves 3.4 of steady-state temperature
reduction, with a mere 0.75% average performance penalty.
Categories and Subject Descriptors
B.7.2 [Hardware]: Design Aids
General Terms:
Design, Reliability
Keywords
Thermal model, Temperature-aware Design, Register File
1. INTRODUCTION
Peak power dissipation and the resulting temperature rise have
become a limiting factor to microprocessor performance and a
significant component of its cost. Expensive packaging and heat
removal solutions are needed to achieve acceptable substrate and
interconnect temperatures in high-performance microprocessors.
Current thermal solutions are designed to limit the peak processor
power dissipation to ensure its reliable operation under worst-case
scenarios. However, the peak processor power and ensuing peak
temperature are hardly ever observed. Dynamic thermal
management (DTM) has been proposed as a class of micro-
architectural solutions and software strategies to achieve the
highest processor performance under a peak temperature limit.
Most DTM methods are reactive due to the complex nature of
temperature variation in a processor; when a certain triggering
temperature is reached, DTM mechanisms become operational.
For example, in [1], Skadron et al. introduced a number of DTM
methods such as temperature driven frequency scaling, localized
toggling and computation migration to spare hardware units. The
same authors presented a hybrid DTM technique that combines
fetch gating and dynamic voltage scaling (DVS) in [2]. Reference
[3] described a feedback control theory based DTM method,
which determines the aggressiveness of the DTM methods based
on the distance of triggering temperature from the emergency
temperature. Recently, reference
[4] introduced a predictive DTM
method for multi-media applications whereby instruction window
resizing and switching among active functional blocks were
utilized to achieve the desired temperature control. All of these
methods characterize and/or predict the thermal behavior of a
processor typically on a functional block basis, calculate the
power density of functional blocks within a fixed time period, and
apply their temperature control policies as needed.
It is known that the register file is the hottest block in a modern
microprocessor chip
[1] [4]. As such, full-chip DTM methods,
such as fetch-toggling and instruction cache throttling (where the
number of fetched instruction is reduced as needed), [5] [6], have
been utilized to control this register file temperature. A DTM
method specifically targeted toward temperature control in the
register file was presented in
[7]. This method, called activity
migration, is quite effective, albeit it has a large area overhead.
In this paper, we present a DTM method that targets and
effectively reduces temperature of the register file. Our idea is
based on the observation that the register file is not fully utilized
over a program’s execution, i.e., the lifetime of registers/operands
are short such that we only need a rather small number of physical
registers to be active during most of the cpu cycles. Therefore, by
introducing two equal-sized banked structures in the physical
register file (one active bank and another sleep bank) and
alternately using these two banks, temperature of both banks can
be reduced while little performance penalty is incurred. This is
similar to what the authors proposed in
[7] except that we do not
introduce a redundant register file structure. Instead we divide the
existing register file structure into two banks and alternate
between the two while monitoring and respond to register file
utilization of the application program. In addition to area savings,
our method also avoids processor-wide performance penalty in the
sense of IPC degradation.
2. ACTIVE BANK SWITCHING BASED DTM
2.1 Register File Utilization
Many 32-bit instruction set architectures (ISA) are designed to
have 32 architectural registers although modern superscalar
processors have more than 32 physical registers. This discrepancy
is handled by register renaming, which assigns architecture
registers to physical registers while considering data/control
dependencies among the instructions. In practice, not all of the
physical registers are used all the time. In [8], Tran et al. showed
that physical register usages are typically in the range of 40% to
60%. This low utilization phenomenon arises mainly from the
dependencies among instructions in the instruction window.
This work was sponsored in part by a grant from the CISE directorate of
the National Science Foundation.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
GLSVLSI’07, March 11-13, 2007, Stresa-Lago Maggiore, Italy.
Copyright 2007 ACM 978-1-59593-605-9/07/0003...$5.00.

0
20
40
60
80
100
25 50 75
Utilization
mcf gcc bzip cjpeg
djpeg mpeg2dec gzip
Figure 1 Physical Register File Utilization
Figure 1 shows the utilization of physical registers according
to our own simulation data (the simulation methodology will be
explained later). Here the x-axis represents the number of physical
registers that are being utilized as a percentage of the original
register file size (64), whereas the y-axis represents the utilization
ratio as a percentage of total execution time. For example, for mcf,
25% of physical registers are actually in use during 42% of the
execution time. Note that on average for about 90% the time, less
than a half of the physical registers (32) are actually allocated.
Figure 2 shows the performance penalty if the register file size
is cut in half (from 64 to 32). Notice that for djpeg even though
for 25% of the execution time more than 32 registers are used, the
respective performance penalty is only 3%. This is because
although a new instruction is dispatched and allocated to a
physical register, much of the time this instruction is not issued
and executed due to the data dependencies among instructions.
0
0.5
1
1.5
2
2.5
3
mcf
g
c
c
bz
i
p
c
j
p
e
g
djpeg
mpeg2dec
gz
i
p
Performance Loss (%)
Figure 2 Performance Penalty When the Register Bank
Size Is
Cut in Half
2.2 Periodic Bank Switching: The Idea
Based on the above observation, we propose to divide the register
file into two equal-sized banks and use only one bank at a time,
i.e., the number of physical registers available at any time is one
half of the original count and registers are allocated from one of
these two banks. Here we designate the active bank as a primary
bank and the other one as a secondary bank. Registers are
allocated first from the primary bank and only if the primary bank
is full, the allocation is done from the secondary bank. Since only
a small number of physical registers are used during most of the
execution time of typical programs, the duration for which the
secondary bank will be in-use is relatively small. When bank
switching occurs, there might still be some references to the non-
active bank. However, these pending references will be relatively
small compared to the number of references to the active bank.
2.3 Thermal Zones and Thermal Gradients
We have carried out detailed analysis of the temperature regions
in terms of thermal gradients and classified them into two zones:
1) Fast Temperature Rise (FTR) zone: The rising thermal gradient
is higher than the falling thermal gradient i.e., the temperature
rises faster than it falls (when the chip is allowed to cool off). 2)
Fast Temperature Fall (FTF) zone: The falling thermal gradient is
equal to or higher than the rising thermal gradient i.e., the
temperature drops faster than it rises. Note that the DTM methods
are most effective in the FTF zone. Based on our simulations
(cf. Figure 3), the FTF zone is above the FTR zone. This is
fortunate because the temperature profile of a microprocessor chip
is such that DTM techniques become more effective as the chip
temperature rises.
Figure 3 Thermal Gradients and Temperature Zones
Depending on the type of packaging and cooling solutions, the
chip’s critical temperature (CT, the temperature beyond which
chip may not function correctly or may even get damaged) may lie
in any of these two zones. In the absence of a DTM technique,
any application program running on a microprocessor chip will
give rise to a steady-state temperature (ST) depending on the
program behavior e.g., in terms of its CPI. The goal of our
proposed DTM method is to minimize the chip ST while meeting a
performance loss constraint. If the ST lies in the FTF zone, then
the DTM methods tend to work very well and the new ST of the
chip will be significantly lower. Otherwise, the DTM techniques
are expected to be less effective.
2.4 Thermal Model
To mathematically support the periodic active bank switching idea,
we use a thermal model developed by Skadron et al. in [3]. Based
on this model, the temperature increase in the processor is
represented by:
(
)
old
th th th
T
P
Tt
CRC
Δ= Δ
(1)
where t is a time interval, P is the average power dissipated in an
interval, R
th
is a thermal resistance, C
th
is a thermal capacitance
and T
old
is the initial temperature of a time period, respectively.
After a time interval, the new temperature becomes:
new old
T
T
T
=+Δ
(2)
Let t
initial
and t
final
denote two instances of time (and their
difference be denoted by t), respectively. Then, the rising
thermal gradient with respect to time is represented as:
()
oldr
t
hthth
TTP
CRC
Δ
=−
Δ⋅
(3)
Hence when the active bank is switched, the new active bank’s
temperature rises according to equation (3). Whereas the other
bank experiences a temperature drop, and this temperature drop

follows:
()
f
old
t
hth
T
T
t
RC
Δ
=−
Δ⋅
(4)
If
f
r
T
T
tt
Δ
Δ
ΔΔ
(i.e., we are operating in the FTF zone), then active
bank switching will be quite effective in reducing the temperature.
The breakeven temperature (BT or T
BE
) above which the active
bank switching will be beneficial is obtained by solving the
following equation:
1
()()
2
f
rBEBE
B
Et
h
th th th th th
T
TTTP
T
PR
ttCRCRC
Δ
Δ
= −= =⋅
ΔΔ
(5)
Consider the case where the ST is above the BT. Conceptually,
we would like to identify a trigger temperature (TT) such that BT
TT ST and then switch between the two banks as soon as the
temperature of the active bank is about to go above the TT.
However, in practice, we have found it to be unnecessary to
identify such a trigger temperature level. More precisely, a simple
DTM policy where we regularly (i.e., at fixed timing intervals)
switch between the primary and secondary banks is sufficient. We
have found that a fixed interval of 10M CPU cycles is adequate
for our purposes and that the overall reduction in ST is not
sensitive to the exact length of this interval.
Note that the actual falling thermal gradient in the sleep bank
is smaller than equation (4) since some of the registers previously
mapped to this bank are alive for certain cycles even after
switching. Similarly, the actual rising thermal gradient in the
newly active bank is smaller than equation (3) since some of the
registers previously mapped to the sleep bank are alive and
accessed from that bank for certain cycles. However, the idea is
still the same.
2.5 Overhead
It is expected that the banked structure in physical register file
needs extra control logics and the renaming logic need to be
changed to allocate new registers from the active bank only.
However, these area penalties are much smaller than those for the
activity migration method, which duplicates the entire register file.
Furthermore, the periodic active bank switching scheme does not
have self-producing performance penalty as is the case for the
activity migration method since we do not need to transfer the
content of registers from one bank to the other.
3. EXPERIMENTAL RESULTS
3.1 Micro-architecture Simulation Data
Table 1 reports the architectural configuration that was assumed
in our simulations.
Table 1 Micro-Architecture Parameters
Main Memory Latency 32 cycles
L1 I/D Cache
32KB 32-way 32Byte block
1 cycle hit latency
I/D-TLB 4-way 1K entries 32 cycles miss latency
Branch Predictor Bimodal 128 Table
Functional Units
4 INT. ALU, 1 INT. MULT/DIV
4 FP ALU, 1 FP MULT/DIV
RUU/LSQ size 8/8
Instruction Fetch Queue
8
3.2 Methodology
For the experiments, we integrate SimpleScalar [9], Wattch [10]
and Hotspot [11] into one simulator The temperature data is
generated every 50K cycles and the initial/ambient temperatures
are set by 60/45, respectively. For the floor-plan in our thermal
simulation, we obtain a 2.6GHz Pentium IV 130nm floor-plan
from [15], estimate/extract the area information for each of the
functional unit, and provide this information to our integrated
simulator.
Figure 4 Detailed Floor-plan for the Register File
Figure 4 (a) shows the ‘integer execution core’ part of the
tagged die-photo obtained from [15]. As shown, the register file
area is in reality smaller than in the original floor-plan and is
roughly half of the original size. Hence, we divide the original
register file area into half to match our floor-plan with more
detailed description (cf. Figure 4 (a)). We position this half sized
register file in the center of the register file area and the
surrounding area is kept empty (cf. Figure 4 (b)). Since the
original register file area corresponds to the size for 128 registers
whereas in our experiments the register file has a size of 64, we
further divide this area into half (cf. Figure 4 (c)). For the banked
structure, we further cut the original register file area (cf. Figure 4
(c)) into half to denote two banks of size 32 (cf. Figure 4 (d)).
Our simulation setup is as follows. For the first 200K cycles of
each benchmark program run, we obtain the typical power figure
for the register file (along with other functional units). Next, we
use this power figure to mimic the thermal simulation without
actually simulating the application by continuously feeding this
typical power value to each functional unit. The thermal
simulation is carried out in order to find the steady-state
temperature for the register file. Once the steady-state is found, we
resume the actual thermal simulation of the application.
For the test applications, we used SPEC2000INT benchmarks
[12] with reference/train input files, Mediabench program [13]
and MPEG-2 decoder program [14]. Input files for mediabench
are custom made, input file for the MPEG-2 decoder program is
obtained from [14] and the input files of all programs are shown
in Table 3. Each program is compiled with the PISA compiler
using default optimization option. For the test platforms, two
Linux machines were used: Intel Pentium IV 2.8GHz with

512MB memory and Intel Pentium IV 1.8GHz with 2GB memory.
3.3 Experimental Results
At first, we ran each application in a monolithic physical register
file of size 64 and record the ST. Next, we ran the application
with a banked register file with active bank switching. In a banked
register file, the total number of physical registers is 64 but they
are divided into two banks, each of size 32.
Table 2 Steady-State Temperature and IPC
Steady-state Temp ()
Program
Monolithic
RF (64)
Banked
RF (2*32)
Thermal
reduction
()
IPC
mcf - inp.in in train 68.0 66.7 1.3 0.7707
gcc - input.source in ref
76.5 73.7 2.8 1.2748
bzip - input.log in ref 78.2 75.0 3.2 1.5022
gzip - input.log in ref 81.7 77.5 4.2 2.1069
cjpeg - custom.gif 83.0 79.0 4.0 2.2553
mpeg2dec-hhilong.m2v
82.0 77.7 4.3 2.2729
djpeg - custom.jpg 82.0 77.0 5.0 2.3825
In Table 2, the difference of steady-state temperatures between
the monolithic and the banked register file is shown (cf. the
thermal reduction column). The average steady-state temperature
reduction of the active bank switching scheme is 3.4. Note the
relationship between the steady-state temperature and the IPC of
each program: As a program workload increases, its steady-state
temperature increases as well.
Figure 5 An Example of Thermal Behaviors in gcc
Figure 5 shows the steady-state temperature behavior of the
gcc program. The upper thermal curve corresponds to the
monolithic register file and the lower two thermal curves
correspond to each bank in the banked register file. Compared to
the upper curve, note that (a) the application program’s thermal
behavior is maintained in the lower curves, and (b) the periodic
active bank switching is observed between the two lower curves.
Note also that two lower curves lay one upon another with very
small thermal differences. Each point in the x-axis corresponds to
10M cycles.
Table 3 shows the register file utilization in terms of percentage
of total execution cycles spent using 1/4, 1/2, 3/4 of the register
file, respectively. The performance penalties reported correspond
half sized (32) register file. Note that low performance penalty is
due to the lower utilization of register file.
Table 3 Register File Utilization and Performance
Register File Utilization (%)
Program
25% 50% 75%
Performance
Penalty (%)
mcf 42 92 96 0
gcc 43 86 98 0.16
bzip 54 93 99 0
djpeg 20 95 99 0.47
cjpeg 32 75 90 1.25
mpeg2de
c
38 91 99 0.69
gzip 32 75 90 2.68
4. CONCLUSION
We presented an effective steady-state temperature reduction
method by adopting a banked structure in the register file. In our
scheme, only one bank is active at a time and we keep switching
between the two available banks. With banking, we achieve a
sizeable steady-state temperature reduction with a small
performance penalty.
5. REFERENCES
[1] K. Skadron et al., “Temperature-Aware Micro-architecture,”
Proc. of Int’l Symp. on Computer Architecture, Jun. 2003.
[2] K. Skadron, “Hybrid Architectural Dynamic Thermal
Management,” Proc. of the Design Automation and Test in
Europe, 2004.
[3] K. Skadron et al., “Control-Theoretic Techniques and
Thermal-RC Modeling for Accurate and Localized Dynamic
Thermal Management,” Proc. of the Int’l Symp. on High-
Performance Computer Architecture, 2002.
[4] J. Srinivasan, S. V. Adve, “Predictive Dynamic Thermal
Management for Multimedia Application,” Proc. of Int’l
Conference on Supercomputing, Jun. 2003.
[5] D. Brooks et al., “Dynamic Thermal Management for High-
Performance Microprocessors,” Proc. of Int’l Symp. on
High-Performance Computer Architecture, 2001.
[6] H. Sanchez et al., “Thermal Management System for High
Performance PowerPC Microprocessor,” Proc. of IEEE
Computer Society Int’l Conference, 1997.
[7] S. Heo, K. Barr, K. Asanovic, “Reducing Power Density
through Activity Migration,” Proceedings of Int’l Symp. on
Low Power Electronics and Design, Aug. 2003.
[8] L. Tran et al., “Dynamically Reducing Pressure on the
Physical Register File through Simple Register Sharing,
Proc. of the Int’l Symp. on Performance Analysis of Systems
and Software, 2004.
[9] Simplescalar at: http://www.simplescalar.com
[10] D. Brooks et al., “Wattch: A Framework for Architectural-
Level Power Analysis and Optimizations,” Proc. of the Int’l
Symp. on Computer Architecture, Jun. 2000.
[11] HotSpot at: http://lava.cs.virginia.edu/HotSopt/
[12] SPEC2000INT benchmark at: http://www.spec.org/cpu
[13] Mediabench at: http://euler.sluedu/~fritts/mediabench
[14] MPEG-2 Programs at: http://www.mpeg2.de/video/
[15] Pentium IV floor-plan at: http://www.chip-architect.com
Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, a rotator hardware in the instruction decoder is introduced to facilitate a balanced solution for the two conflicting objectives of energy and thermal issues for register accesses in embedded systems.
Abstract: Energy and thermal issues are two important concerns for embedded system design. Diminished energy dissipation leads to a longer battery life, while reduced temperature hotspots decelerate the physical failure mechanisms. The instruction fetch logic associated with register access has a significant contribution towards the total energy consumption. Meanwhile, the register file has also been previously shown to exhibit the highest temperature compared to the rest of the components in an embedded processor. Therefore, the optimization of energy and the resolution of the thermal issue for register accesses are of great significance. In this article, register allocation techniques are studied to simultaneously reduce energy consumption and heat buildup on register accesses for embedded systems. Contrary to prevailing intuition, we observe that optimizing energy and optimizing temperature on register accesses conflict with each other. We introduce a rotator hardware in the instruction decoder to facilitate a balanced solution for the two conflicting objectives. Algorithms for register allocation and refinement are proposed based on the access patterns and the effects of the rotator. Experimental results show that the proposed algorithms obtain notable improvements of energy and peak temperature for embedded applications.

7 citations


Additional excerpts

  • ...Some work [Patel et al. 2007; Homayoun et al. 2010] try to optimize both power and thermal objectives....

    [...]

  • ...Some work [Patel et al. 2007; Homayoun et al. 2010] try to optimize both power and thermal objectives....

    [...]

Proceedings ArticleDOI
22 Mar 2010
TL;DR: This work demonstrated that the traditional renaming unit causes high temperature since it allocates registers imbalanced, so the overall power density is lowered and then the temperature is reduced.
Abstract: The manufacturing process of microprocessors becomes increasingly fine and the clock frequency is rapidly growing. Since the corresponding power consumption, however, is not reduced, the power density is increased dramatically. The high temperature and heat generated by the power density causes many problems: malfunction, aging, leakage power and cooling costs. The register file produces the highest temperature in the microprocessor because of extremely high access frequency and its small area. We demonstrated that the traditional renaming unit causes high temperature since it allocates registers imbalanced. Our idea is to redistribute evenly register allocations and accesses across the full range of the register file; consequently, the overall power density is lowered and then the temperature is reduced. The proposed method can be implemented by adding a small logic to the traditional renaming unit with around 1.5% overheads. The results are as follows. Temperature drop was up to 11% on average 6%; leakage power saving was up to 24% on average 13%; performance improvement was up to 103% on average 84%.

4 citations


Cites background from "Active bank switching for temperatu..."

  • ...Accordingly, current microprocessors show more features and faster performance even the smaller area compared to past microprocessors....

    [...]

Journal ArticleDOI
TL;DR: This letter proposes a compilation flow that utilizes the register windows to optimize the thermal profile and to reduce the hotspots and results show that the thermal Profile and reliability of the device is clearly improved.
Abstract: The development of compiler-based mechanisms to optimize the thermal profile of large register files to improve the processor reliability has become an important issue. Thermal hotspots have been known to cause severe reliability issues, while the thermal profile of the devices is also related to the leakage power consumption and the cooling cost. Register window-based architectures provide a relatively large register files. However, such large register files are not designed or utilized for thermal balancing or reliability enhancement. In this letter, we propose a compilation flow that utilizes the register windows to optimize the thermal profile and to reduce the hotspots. As a result, the thermal profile and reliability of the device is clearly improved. Simulation results show that the proposed flow achieves up to 91% reduction of hotspots and 11% reduction of the peak temperature in embedded processors.

3 citations


Cites background from "Active bank switching for temperatu..."

  • ...• Reliability of the system, since several processes are driven by the increase of temperature or the spatial and temporal gradients that appear during normal functioning....

    [...]

Journal ArticleDOI
TL;DR: This work demonstrates that the existing renaming unit causes high temperature since it allocates registers imbalanced and redistributes evenly register allocations and accesses across the full range of the register file; consequently, the overall power density is reduced and then the temperature is lowered.
Abstract: The manufacturing process of microprocessors becomes increasingly fine and the clock frequency is rapidly growing. Since the corresponding power consumption, however, is not reduced, power density is increased dramatically. The generated heat by the power density induces high temperature. The high temperature causes many problems: calculation errors, aging, leakage power, and cooling costs. Register file produces the highest temperature in a microprocessor because of extremely high access frequency and its small area. We demonstrated that the existing renaming unit causes high temperature since it allocates registers imbalanced. Our idea is to redistribute evenly register allocations and accesses across the full range of the register file; consequently, the overall power density is reduced and then the temperature is lowered. The proposed method can be implemented by adding a small logic to the traditional renaming unit with around 1.5% overheads. As a result, temperature drop was up to 10.4°C (11%) on average 4.6°C (6%). We also achieved leakage power savings and performance improvements by the temperature drop.

2 citations


Cites background from "Active bank switching for temperatu..."

  • ...Thus, the power density on the microprocessor is rapidly increased, and the high temperature caused by this increase brings about many problems: malfunction, poor durability, increased leakage power, and cooling costs....

    [...]

Proceedings ArticleDOI
13 Jun 2010
TL;DR: A temperature reduction methodology that leverages recently introduced configurable caches is proposed, in order to not only assign to the task a cache partition commensurate to its current demand but also to minimize the associated power density and temperature.
Abstract: On-chip caches have been known to be a major contributor to leakage power as they occupy a sizable fraction of the chip's real estate and as such have been the target of power optimization techniques. However, many of these techniques do not consider the effects of temperature on leakage power and can hence be suboptimal since leakage power rises rapidly with temperature. When large fractions of the cache are disabled and only a small partition is used, the power density increases significantly which leads to increased temperature and leakage. We propose a temperature reduction methodology that leverages recently introduced configurable caches, in order to not only assign to the task a cache partition commensurate to its current demand but also to minimize the associated power density and temperature. In order to counteract the effect of elevated power density and achieve temperature reductions, in the proposed technique each such cache partition is replicated and only one of the replicas is active at any time. The inactive partition replicas are placed into a low-power drowsy mode while the primary partition services the task's instruction requests. By periodically switching the tasks association between replica cache partitions, the power density and hence the temperature are reduced.

1 citations


Cites background from "Active bank switching for temperatu..."

  • ...Thermal-aware architectures have been proposed for register files and caches [11], [5] where registers or cache blocks are maintained...

    [...]

References
More filters
Proceedings ArticleDOI
01 May 2000
TL;DR: Wattch is presented, a framework for analyzing and optimizing microprocessor power dissipation at the architecture-level and opens up the field of power-efficient computing to a wider range of researchers by providing a power evaluation methodology within the portable and familiar SimpleScalar framework.
Abstract: Power dissipation and thermal issues are increasingly significant in modern processors. As a result, it is crucial that power/performance tradeoffs be made more visible to chip architects and even compiler writers, in addition to circuit designers. Most existing power analysis tools achieve high accuracy by calculating power estimates for designs only after layout or floorplanning are complete. In addition to being available only late in the design process, such tools are often quite slow, which compounds the difficulty of running them for a large space of design possibilities.This paper presents Wattch, a framework for analyzing and optimizing microprocessor power dissipation at the architecture-level. Wattch is 1000X or more faster than existing layout-level power tools, and yet maintains accuracy within 10% of their estimates as verified using industry tools on leading-edge designs. This paper presents several validations of Wattch's accuracy. In addition, we present three examples that demonstrate how architects or compiler writers might use Wattch to evaluate power consumption in their design process.We see Wattch as a complement to existing lower-level tools; it allows architects to explore and cull the design space early on, using faster, higher-level tools. It also opens up the field of power-efficient computing to a wider range of researchers by providing a power evaluation methodology within the portable and familiar SimpleScalar framework.

2,848 citations

Proceedings ArticleDOI
01 May 2003
TL;DR: HotSpot is described, an accurate yet fast model based on an equivalent circuit of thermal resistances and capacitances that correspond to microarchitecture blocks and essential aspects of the thermal package that shows that power metrics are poor predictors of temperature, and that sensor imprecision has a substantial impact on the performance of DTM.
Abstract: With power density and hence cooling costs rising exponentially, processor packaging can no longer be designed for the worst case, and there is an urgent need for runtime processor-level techniques that can regulate operating temperature when the package's capacity is exceeded. Evaluating such techniques, however, requires a thermal model that is practical for architectural studies.This paper describes HotSpot, an accurate yet fast model based on an equivalent circuit of thermal resistances and capacitances that correspond to microarchitecture blocks and essential aspects of the thermal package. Validation was performed using finite-element simulation. The paper also introduces several effective methods for dynamic thermal management (DTM): "temperature-tracking" frequency scaling, localized toggling, and migrating computation to spare hardware units. Modeling temperature at the microarchitecture level also shows that power metrics are poor predictors of temperature, and that sensor imprecision has a substantial impact on the performance of DTM.

1,252 citations

Proceedings ArticleDOI
20 Jan 2001
TL;DR: This work investigates dynamic thermal management as a technique to control CPU power dissipation and explores the tradeoffs between several mechanisms for responding to periods of thermal trauma and the effects of hardware and software implementations.
Abstract: With the increasing clock rate and transistor count of today's microprocessors, power dissipation is becoming a critical component of system design complexity. Thermal and power-delivery issues are becoming especially critical for high-performance computing systems. In this work, we investigate dynamic thermal management as a technique to control CPU power dissipation. With the increasing usage of clock gating techniques, the average power dissipation typically seen by common applications is becoming much less than the chip's rated maximum power dissipation. However system designers still must design thermal heat sinks to withstand the worse-case scenario. We define and investigate the major components of any dynamic thermal management scheme. Specifically we explore the tradeoffs between several mechanisms for responding to periods of thermal trauma and we consider the effects of hardware and software implementations. With approximate dynamic thermal management, the CPU can be designed for a much lower maximum power rating, with minimal performance impact for typical applications.

882 citations


"Active bank switching for temperatu..." refers methods in this paper

  • ...The same authors presented a hybrid DTM technique that combines fetch gating and dynamic voltage scaling (DVS) in [2]....

    [...]

Proceedings ArticleDOI
02 Feb 2002
TL;DR: A thermal model based on lumped thermal resistances and thermal capacitances is developed, which cuts the performance loss of DTM by 65% compared to the previously described fetch toggling technique that uses a response of fixed magnitude.
Abstract: This paper proposes the use of formal feedback control theory as a way to implement adaptive techniques in the processor architecture. Dynamic thermal management (DTM) is used as a test vehicle, and variations of a PID controller (Proportional-Integral-Differential) are developed and tested for adaptive control of fetch "toggling." To accurately test the DTM mechanism being proposed, this paper also develops a thermal model based on lumped thermal resistances and thermal capacitances. This model is computationally efficient and tracks temperature at the granularity of individual functional blocks within the processor. Because localized heating occurs much faster than chip-wide heating, some parts of the processor are more likely, to be "hot spots" than others. Experiments using Wattch and the SPEC2000 benchmarks show that the thermal trigger threshold can be set within 0.2/spl deg/ of the maximum temperature and yet never enter thermal emergency. This cuts the performance loss of DTM by 65% compared to the previously described fetch toggling technique that uses a response of fixed magnitude.

415 citations


"Active bank switching for temperatu..." refers background in this paper

  • ...Expensive packaging and heat removal solutions are needed to achieve acceptable substrate and interconnect temperatures in high-performance microprocessors....

    [...]

Proceedings ArticleDOI
25 Aug 2003
TL;DR: In this article, the authors examined the use of activity migration which reduces peak junction temperature by moving computation between multiple replicated units using a thermal model that includes the temperature dependence of leakage power, and showed that sustainable power dissipation can be increased by nearly a factor of two for a given junction temperature limit.
Abstract: Power dissipation is unevenly distributed in modern microprocessors leading to localized hot spots with significantly greater die temperature than surrounding cooler regions Excessive junction temperature reduces reliability and can lead to catastrophic failure We examine the use of activity migration which reduces peak junction temperature by moving computation between multiple replicated units Using a thermal model that includes the temperature dependence of leakage power, we show that sustainable power dissipation can be increased by nearly a factor of two for a given junction temperature limit Alternatively, peak die temperature can be reduced by 124/spl deg/C at the same clock frequency The model predicts that migration intervals of around 20-200 /spl mu/s are required to achieve the maximum sustainable power increase We evaluate several different forms of replication and migration policy control

317 citations

Frequently Asked Questions (1)
Q1. What are the contributions in "Active bank switching for temperature control of the register file in a microprocessor" ?

An effective thermal management scheme, called active bank switching, for temperature control in the register file of a microprocessor is presented.