scispace - formally typeset
Open AccessJournal ArticleDOI

Compilers for leakage power reduction

Reads0
Chats0
TLDR
This article proposes a framework for analyzing data flow for estimating the component activities at fixed points of programs whilst considering pipeline architectures and proposes a set of scheduling policies that are effective in reducing leakage power in microprocessors.
Abstract
Power leakage constitutes an increasing fraction of the total power consumption in modern semiconductor technologies. Recent research efforts indicate that architectures, compilers, and software can be optimized so as to reduce the switching power (also known as dynamic power) in microprocessors. This has lead to interest in using architecture and compiler optimization to reduce leakage power (also known as static power) in microprocessors. In this article, we investigate compiler-analysis techniques that are related to reducing leakage power. The architecture model in our design is a system with an instruction set to support the control of power gating at the component level. Our compiler provides an analysis framework for utilizing instructions to reduce the leakage power. We present a framework for analyzing data flow for estimating the component activities at fixed points of programs whilst considering pipeline architectures. We also provide equations that can be used by the compiler to determine whether employing power-gating instructions in given program blocks will reduce the total energy requirements. As the duration of power gating on components when executing given program routines is related to the number and complexity of program branches, we propose a set of scheduling policies and evaluate their effectiveness. We performed experiments by incorporating our compiler analysis and scheduling policies into SUIF compiler tools and by simulating the energy consumptions on Wattch toolkits. The experimental results demonstrate that our mechanisms are effective in reducing leakage power in microprocessors.

read more

Content maybe subject to copyright    Report

Compilers for Leakage Power Reduction
YI-PING YOU, CHINGREN LEE, and JENQ KUEN LEE
National Tsing Hua University
Power leakage constitutes an increasing fraction of the total power consumption in modern semi-
conductor technologies. Recent research efforts indicate that architectures, compilers, and software
can be optimized so as to reduce the switching power (also known as dynamic power) in micropro-
cessors. This has lead to interest in using architecture and compiler optimization to reduce leakage
power (also known as static power) in microprocessors. In this article, we investigate compiler-
analysis techniques that are related to reducing leakage power. The architecture model in our
design is a system with an instruction set to support the control of power gating at the component
level. Our compiler provides an analysis framework for utilizing instructions to reduce the leakage
power. We present a framework for analyzing data flow for estimating the component activities at
fixed points of programs whilst considering pipeline architectures. We also provide equations that
can be used by the compiler to determine whether employing power-gating instructions in given
program blocks will reduce the total energy requirements. As the duration of power gating on com-
ponents when executing given program routines is related to the number and complexity of program
branches, we propose a set of scheduling policies and evaluate their effectiveness. We performed
experiments by incorporating our compiler analysis and scheduling policies into SUIF compiler
tools and by simulating the energy consumptions on Wattch toolkits. The experimental results
demonstrate that our mechanisms are effective in reducing leakage power in microprocessors.
Categories and Subject Descriptors: D.3.4 [Programming Languages]: Processors—Compilers,
optimization
General Terms: Algorithms, Experimentation Languages
Additional Key Words and Phrases: Compilers for low power, leakage-power reduction, power-
gating mechanisms
1. INTRODUCTION
The demands of power-constrained mobile and embedded computing applica-
tions are increasing rapidly, which makes the reduction of power consumption
This work was supported in part by the National Science Council (under grant nos. NSC-93-2213-
E-007-025, NSC-93-2220-E-077-020, NSC-93-2220-E-007-019, and NSC-93-2752-E-007-004-PAE),
Ministry of Economic Affairs (under grant nos. 93-ED-17-A-03-S1-0002 and 94-EC-17-A-01-S1-
034), and ITRI (under an ITRI/NTHU research grant).
Authors’ address: Department of Computer Science, National Tsing Hua University, 101, Section 2
Kuang Fu Road, Hsinchu 30013, Taiwan; email: {ypyou,crlee}@pllab.cs.nthu.edu.tw; jklee@cs.
nthu.edu.tw.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is
granted without fee provided that copies are not made or distributed for profit or direct commercial
advantage and that copies show this notice on the first page or initial screen of a display along
with the full citation. Copyrights for components of this work owned by others than ACM must be
honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers,
to redistribute to lists, or to use any component of this work in other works requires prior specific
permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 1515
Broadway, New York, NY 10036 USA, fax: +1 (212) 869-0481, or permissions@acm.org.
C
2006 ACM 1084-4309/06/0100-0147 $5.00
ACM Transactions on Design Automation of Electronic Systems, Vol. 11, No. 1, January 2006, Pages 147–164.

148
Y.-P. You et al.
a crucial challenge for software and hardware developers. The continuing size
reductions and increasing speeds of transistors increases the importance of
leakage-power dissipation in the absence of any switching activities. Recent
theoretical analyses have attempted to characterize engineering equations and
cost models for analyzing static powers [Thompson et al. 1998; De and Borkar
1999; Doyle et al. 2002]. One such analysis produced the following relation-
ship: P
static
= V
CC
· N · k
design
·
ˆ
I
leak
, where V
CC
is the supply voltage, N is the
number of transistors in the design, k
design
is the characteristic of an average de-
vice, and
ˆ
I
leak
is a technology parameter describing the per-device subthreshold
leakage [Butts and Sohi 2000].
In this article, we discuss compiler analysis techniques used to reduce the
number of devices, N , in the static power equation above to ease the problem of
leakage power. The architecture model in our design is a system with an instruc-
tion set that supports the control of power gating at the component level. We
attempt to reduce the number of devices by turning devices off when they not
being used. Our work provides compiler solutions for the analysis and schedul-
ing of the power-gating control at the component level. A data-flow analysis
framework is given that estimates the component activities at fixed points in
programs whilst considering pipeline architectures. We also provide equations
that can be used by the compiler to determine whether employing power-gating
instructions in given program blocks will reduce the total energy requirements.
As the duration of power gating on components in given program routines is
related to the number and complexity of program branches, we propose a set of
scheduling policies (Basic
Blk Sched, MIN Path Sched, and AVG Path Sched)
and evaluate their effectiveness. Our proposed framework are effective for ma-
chines with in-order executions. Additional cares have to be taken when one
deals with out-of-order issues. For out-of-order issues, we suggest power-gating
operations on a function unit should be considered dependent to normal op-
erations on this unit. Our experiments are performed by incorporating our
compiler analysis and scheduling policy into SUIF compiler tools [Smith 1998;
Stanford Compiler Group 1995] and by simulating the energy consumptions
on Wattch [Brooks et al. 2000] toolkits. We also revise Wattch/SimpleScalar to
adopt our proposed schemes to deal with out-of-order issues. The experimental
results demonstrate that our mechanisms are very effective in reducing leak-
age power in microprocessors. In summary, the key contributions of our work
include the presentations of data flow analysis framework for component activ-
ities, the scheduling policies for power-gating instructions going beyond basic
blocks, and the suggestions of hardware refinements for out-of-order issues to
work with our proposed methods.
The remainder of this article is organized as follows: Section 2 presents
our machine architectures with power-gating controls. Section 3 presents
our data-flow analysis framework for component activities. Next, Section 4
provides scheduling policies for leakage power reductions by utilizing gath-
ered component information. Experimental results will then be presented in
Section 5. Finally, Section 6 describes related work and Section 7 concludes this
article.
ACM Transactions on Design Automation of Electronic Systems, Vol. 11, No. 1, January 2006.

Compilers for Leakage Power Reduction
149
Fig. 1. Machine architecture model with power-gating control.
2. MACHINE ARCHITECTURE
The architecture model in our design is a system with an instruction set that
supports the control of power gating at the component level. Figure 1 shows an
example of our target machine architecture on which our optimization is based.
We focus on the reduction of the power consumption of the certain function units
by invoking the “power-gating” technology. Power gating is analogous to clock
gating—power gating turns off devices by switching off their supply voltage
rather than switching off the clock. This can be achieved by forcing transistors
to turn off or using multithreshold voltage CMOS technology (MTCMOS) to
increase the threshold voltage [Butts and Sohi 2000; Kao and Chandrakasan
2000; Roy 1998].
We built the experimental architecture within the Wattch simulation envi-
ronment [Brooks et al. 2000]. In this simulation environment we can measure
the power consumption of every microprocessor component throughout the ex-
perimental program. This architecture is essentially compatible with the DEC
Alpha 21264 processor [Compaq Computer Corporation 1999]; the major differ-
ence between these two architectures is the additional power-gating design and
the static pipeline scheduling in our experimental architecture. The compiler
approach proposed in this article is basically for in-order issue processors, but
we also propose a solution to make our methodology feasible for out-of-order is-
sue processors shown later in Section 5.3. We implemented the proposed mech-
anism into SimpleScalar and evaluated our approach with out-of-order issue
processors.
The power-gated function units in our experimental architecture are Inte-
ger Multiplier, Floating-Point Adder, Floating-Point Multiplier, and Floating-
Point Divider. The power gating of each function unit can be controlled by the
“power-gating control register” (PGCR). The PGCR is a 64-bit integer register.
ACM Transactions on Design Automation of Electronic Systems, Vol. 11, No. 1, January 2006.

150
Y.-P. You et al.
In this case, only the lowest four bits of this register can affect the power-
gating status. The 0th bit of the lowest four bits of the PGCR controls the
power gating of the Integer Multiplier: setting this bit will cause the Integer
Multiplier on, and clearing it will turn off the corresponding function unit in
the next clock cycle. The 1st, 2nd, 3rd bits of these four bits are used for the
Floating-Point Adder, Floating-Point Multiplier, and Floating-Point Divider, re-
spectively. It is worth mentioning that the integer ALU unit within the archi-
tecture is also involved in general program execution, since it also performs
data movements to the PGCR. This means that the integer ALU is always re-
quired, and so this function unit is always on. In addition, we invoke a new
instruction in the simulation environment to specify the access direction of
PGCR. This instruction can operate those four power-gated function units at
once by moving the appropriate value from a general-purpose register to the
PGCR.
3. COMPONENT-ACTIVITY DATA-FLOW ANALYSIS
In this section, we investigate the compiler analysis techniques used to reduce
the leakage power. We present a data-flow analysis framework for a compiler
to analyze the state of components in a microprocessor. The process collects the
information of the utilization of components at various points in a program. We
first construct basic blocks and control flow graphs of given programs, and then
develop a data-flow equation for the summary of component usages at given
program points. To gather the data-flow information, we define comp
gen[B],
comp
kill[B], comp in[B], and comp out[B] for each block B.
We say that a component-activity c is generated at a block B if a component
is required for this execution, symbolized as comp
gen[B], and that it is killed
if the component is released by the last request, symbolized as comp
kill[B].
We then create the two groups of equations shown below. The first group of
equations follows from the observation that comp
in[B] is the union of activities
arriving from all the predecessors of B. The second group is the activities at the
end of a block that are either generated within the block, or those entering at
the beginning but not killed as control flows through the block. The data-flow
equation for these two groups is as follows:
comp
in[B] =
P a predessor
of B
comp out[P]
comp
out[B] = comp gen[B] (comp in[B] comp kill[B]).
We use an iterative approach to compute the desired results of comp
in and
comp
out after comp gen has been computed for each block. The algorithm
is sketched in Figure 2. This is an iterative algorithm for data-flow equa-
tions [Aho et al. 1986] with the addition of resource management structures. A
two-dimension array, called RemainingCycle, is used to maintain the number of
cycles that are required to fulfill requests for each component and block. In ad-
dition, a resource-utilization table is adopted to give the resource requirement
for each instruction of the given microprocessor. The resource-utilization table
can be used to give the initial values of RemainingCycle. The remaining cycles
ACM Transactions on Design Automation of Electronic Systems, Vol. 11, No. 1, January 2006.

Compilers for Leakage Power Reduction
151
Fig. 2. Data-flow analysis algorithm for component activities.
of a component decrease by one for each propagation. Initially, both comp in
and com
kill are set to be empty. The iteration continues until comp in (and
hence comp
out) converges. As comp out[B] never decreases in size for any B,
the algorithm will eventually halt when all comp
out are in the steady state.
Intuitively, the algorithm propagates activities of components as far as they will
go by simulating all possible execution paths of the program. This algorithm
provides the state of utilization of components for each point of a program.
4. LEAKAGE-POWER REDUCTION
In this section, we present a cost model for the compiler to determine whether
power-gating control should be applied, and a set of scheduling policies to place
power-gating instructions within given programs.
4.1 Cost Model
With the utilization of components obtained from Section 3, we can insert power-
gating instructions into programs at the appropriate points (i.e., the beginning
and of an inactive block) to turn off and on unused components so as to re-
duce the leakage power. However, both shut-down and wake-up procedures are
ACM Transactions on Design Automation of Electronic Systems, Vol. 11, No. 1, January 2006.

Citations
More filters
Journal ArticleDOI

A Framework for Power-Gating Functional Units in Embedded Microprocessors

TL;DR: A new framework for power gating the functional units in embedded system microprocessors without degradation in performance is developed, including an efficient algorithm for idle time estimation, appropriate insertion of sleep instructions within the code, and a method for reactivating the sleeping units only when needed without the use of wakeup instructions.
Journal ArticleDOI

State-Retentive Power Gating of Register Files in Multicore Processors Featuring Multithreaded In-Order Cores

TL;DR: This work proposes specific techniques to implement state-retentive power gating for three different multicore processor configurations based on the multithreading model: 1) coarse-grained multith Reading, 2) fine-graining multithread, and 3) simultaneous multithreads.
Journal ArticleDOI

Compilation for compact power-gating controls

TL;DR: This article presents a sink-n-hoist framework for a compiler to generate balanced scheduling of power-gating instructions that attempts to merge several power- gating instructions into a single compound instruction, thereby reducing the amount ofPower leakage instructions issued.
Journal ArticleDOI

Efficient and scalable compiler-directed energy optimization for realtime applications

TL;DR: A compilation technique that targets realtime applications running on embedded processors with combined dynamic voltage scaling (DVS) and adaptive body biasing (ABB) capabilities that improves the runtime by more than three orders of magnitude, while producing improved results.
Patent

Power-aware compiling method

TL;DR: In this article, the power model of an application program is established via building and analyzing the control flow chart and the data flow chart of the application program; each functional unit is assigned a power mode; a judgment is undertaken to determine whether the idle functional units are independent.
References
More filters
Book

Compilers: Principles, Techniques, and Tools

TL;DR: This book discusses the design of a Code Generator, the role of the Lexical Analyzer, and other topics related to code generation and optimization.
Journal ArticleDOI

The SimpleScalar tool set, version 2.0

TL;DR: This document describes release 2.0 of the SimpleScalar tool set, a suite of free, publicly available simulation tools that offer both detailed and high-performance simulation of modern microprocessors.
Proceedings ArticleDOI

Wattch: a framework for architectural-level power analysis and optimizations

TL;DR: Wattch is presented, a framework for analyzing and optimizing microprocessor power dissipation at the architecture-level and opens up the field of power-efficient computing to a wider range of researchers by providing a power evaluation methodology within the portable and familiar SimpleScalar framework.
Journal ArticleDOI

Low-power CMOS digital design

TL;DR: In this paper, techniques for low power operation are presented which use the lowest possible supply voltage coupled with architectural, logic style, circuit, and technology optimizations to reduce power consumption in CMOS digital circuits while maintaining computational throughput.
Journal Article

Low-Power CMOS Digital Design

TL;DR: An architecturally based scaling strategy is presented which indicates that the optimum voltage is much lower than that determined by other scaling considerations, and is achieved by trading increased silicon area for reduced power consumption.
Related Papers (5)
Frequently Asked Questions (12)
Q1. What have the authors contributed in "Compilers for leakage power reduction" ?

In this article, the authors investigate compileranalysis techniques that are related to reducing leakage power. The authors present a framework for analyzing data flow for estimating the component activities at fixed points of programs whilst considering pipeline architectures. The authors also provide equations that can be used by the compiler to determine whether employing power-gating instructions in given program blocks will reduce the total energy requirements. As the duration of power gating on components when executing given program routines is related to the number and complexity of program branches, the authors propose a set of scheduling policies and evaluate their effectiveness. The authors performed experiments by incorporating their compiler analysis and scheduling policies into SUIF compiler tools and by simulating the energy consumptions on Wattch toolkits. 

Future research directions include investigating the effects of using AVG Path Sched mechanism with path profiling and edge profiling schemes in experiments. 

The architecture model in their design is a system with an instruction set that supports the control of power gating at the component level. 

In summary, the key contributions of their work include the presentations of data flow analysis framework for component activities, the scheduling policies for power-gating instructions going beyond basic blocks, and the suggestions of hardware refinements for out-of-order issues to work with their proposed methods. 

Their cost model after incorporating latency becomes the following:ThresholdC = MAX(BreakEvenC, LatencyC), where LatencyC is the power-gating latency of component C. In addition, the authors attempt to insert the wake-up operations of power-gating control ahead of the time at which the corresponding components are required, in order to avoid program stalling whilst waiting for the wake-up latency. 

A two-dimension array, called RemainingCycle, is used to maintain the number of cycles that are required to fulfill requests for each component and block. 

to ensure the execution order of power-on and power-off instructions, the authors enforce a power-gating instruction be stalled until an another power-gating instruction prior to the power-gating instruction are issued. 

With regard to the impact on performance, the cycle counts of execution provided by the Wattch (i.e., SimpleScalar) show that their approach has a light impact (less than 2%) on performance. 

The arguments (C, B, Branched, Edge, and Count) represent the type of the component in analysis for power-gating control, the node ID of the CFG, a Boolean variable that shows whether the current traverse comes through a branch, the type of the outgoing edge, and the accumulated inactive length so far, respectively. 

This instruction can operate those four power-gated function units at once by moving the appropriate value from a general-purpose register to the PGCR. 

The authors use a DEC-Alpha-compatible architecture with power-gating control and instruction sets described in Figure 1 as the target architecture for their experiments. 

The work done by Rele et al. [2002] is a concurrent work to ours by using compiler technique and microarchitecture support to guide power-gating controls.