What have the authors stated for future works in "Compilers for leakage power reduction" ?

Future research directions include investigating the effects of using AVG Path Sched mechanism with path profiling and edge profiling schemes in experiments.

What is the cost model after incorporating latency?

Their cost model after incorporating latency becomes the following:ThresholdC = MAX(BreakEvenC, LatencyC), where LatencyC is the power-gating latency of component C. In addition, the authors attempt to insert the wake-up operations of power-gating control ahead of the time at which the corresponding components are required, in order to avoid program stalling whilst waiting for the wake-up latency.

What is the purpose of the proposed compiler framework for power-gating control?

to ensure the execution order of power-on and power-off instructions, the authors enforce a power-gating instruction be stalled until an another power-gating instruction prior to the power-gating instruction are issued.

How much impact does the Wattch have on performance?

With regard to the impact on performance, the cycle counts of execution provided by the Wattch (i.e., SimpleScalar) show that their approach has a light impact (less than 2%) on performance.

What is the type of the component in analysis for power gating control?

The arguments (C, B, Branched, Edge, and Count) represent the type of the component in analysis for power-gating control, the node ID of the CFG, a Boolean variable that shows whether the current traverse comes through a branch, the type of the outgoing edge, and the accumulated inactive length so far, respectively.

What is the target architecture for their experiments?

The authors use a DEC-Alpha-compatible architecture with power-gating control and instruction sets described in Figure 1 as the target architecture for their experiments.

What is the work of Rele et al.?

The work done by Rele et al. [2002] is a concurrent work to ours by using compiler technique and microarchitecture support to guide power-gating controls.

(Open Access) Compilers for leakage power reduction (2006) | Yi-Ping You

Q: What have the authors contributed in "Compilers for leakage power reduction" ?

In this article, the authors investigate compileranalysis techniques that are related to reducing leakage power. The authors present a framework for analyzing data flow for estimating the component activities at fixed points of programs whilst considering pipeline architectures. The authors also provide equations that can be used by the compiler to determine whether employing power-gating instructions in given program blocks will reduce the total energy requirements. As the duration of power gating on components when executing given program routines is related to the number and complexity of program branches, the authors propose a set of scheduling policies and evaluate their effectiveness. The authors performed experiments by incorporating their compiler analysis and scheduling policies into SUIF compiler tools and by simulating the energy consumptions on Wattch toolkits.

Q: What is the architecture model in your design?

The architecture model in their design is a system with an instruction set that supports the control of power gating at the component level.

Q: What are the key contributions of the work?

In summary, the key contributions of their work include the presentations of data flow analysis framework for component activities, the scheduling policies for power-gating instructions going beyond basic blocks, and the suggestions of hardware refinements for out-of-order issues to work with their proposed methods.

Compilers for Leakage Power Reduction

YI-PING YOU, CHINGREN LEE, and JENQ KUEN LEE

National Tsing Hua University

Power leakage constitutes an increasing fraction of the total power consumption in modern semi-

conductor technologies. Recent research efforts indicate that architectures, compilers, and software

can be optimized so as to reduce the switching power (also known as dynamic power) in micropro-

cessors. This has lead to interest in using architecture and compiler optimization to reduce leakage

power (also known as static power) in microprocessors. In this article, we investigate compiler-

analysis techniques that are related to reducing leakage power. The architecture model in our

design is a system with an instruction set to support the control of power gating at the component

level. Our compiler provides an analysis framework for utilizing instructions to reduce the leakage

power. We present a framework for analyzing data ﬂow for estimating the component activities at

ﬁxed points of programs whilst considering pipeline architectures. We also provide equations that

can be used by the compiler to determine whether employing power-gating instructions in given

program blocks will reduce the total energy requirements. As the duration of power gating on com-

ponents when executing given program routines is related to the number and complexity of program

branches, we propose a set of scheduling policies and evaluate their effectiveness. We performed

experiments by incorporating our compiler analysis and scheduling policies into SUIF compiler

tools and by simulating the energy consumptions on Wattch toolkits. The experimental results

demonstrate that our mechanisms are effective in reducing leakage power in microprocessors.

Categories and Subject Descriptors: D.3.4 [Programming Languages]: Processors—Compilers,

optimization

General Terms: Algorithms, Experimentation Languages

Additional Key Words and Phrases: Compilers for low power, leakage-power reduction, power-

gating mechanisms

1. INTRODUCTION

The demands of power-constrained mobile and embedded computing applica-

tions are increasing rapidly, which makes the reduction of power consumption

This work was supported in part by the National Science Council (under grant nos. NSC-93-2213-

E-007-025, NSC-93-2220-E-077-020, NSC-93-2220-E-007-019, and NSC-93-2752-E-007-004-PAE),

Ministry of Economic Affairs (under grant nos. 93-ED-17-A-03-S1-0002 and 94-EC-17-A-01-S1-

034), and ITRI (under an ITRI/NTHU research grant).

Authors’ address: Department of Computer Science, National Tsing Hua University, 101, Section 2

Kuang Fu Road, Hsinchu 30013, Taiwan; email: {ypyou,crlee}@pllab.cs.nthu.edu.tw; jklee@cs.

nthu.edu.tw.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is

granted without fee provided that copies are not made or distributed for proﬁt or direct commercial

advantage and that copies show this notice on the ﬁrst page or initial screen of a display along

with the full citation. Copyrights for components of this work owned by others than ACM must be

honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers,

to redistribute to lists, or to use any component of this work in other works requires prior speciﬁc

permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 1515

Broadway, New York, NY 10036 USA, fax: +1 (212) 869-0481, or permissions@acm.org.



2006 ACM 1084-4309/06/0100-0147 $5.00

ACM Transactions on Design Automation of Electronic Systems, Vol. 11, No. 1, January 2006, Pages 147–164.

148

•

Y.-P. You et al.

a crucial challenge for software and hardware developers. The continuing size

reductions and increasing speeds of transistors increases the importance of

leakage-power dissipation in the absence of any switching activities. Recent

theoretical analyses have attempted to characterize engineering equations and

cost models for analyzing static powers [Thompson et al. 1998; De and Borkar

1999; Doyle et al. 2002]. One such analysis produced the following relation-

ship: P

static

= V

· N · k

design

leak

, where V

is the supply voltage, N is the

number of transistors in the design, k

design

is the characteristic of an average de-

vice, and

leak

is a technology parameter describing the per-device subthreshold

leakage [Butts and Sohi 2000].

In this article, we discuss compiler analysis techniques used to reduce the

number of devices, N , in the static power equation above to ease the problem of

leakage power. The architecture model in our design is a system with an instruc-

tion set that supports the control of power gating at the component level. We

attempt to reduce the number of devices by turning devices off when they not

being used. Our work provides compiler solutions for the analysis and schedul-

ing of the power-gating control at the component level. A data-ﬂow analysis

framework is given that estimates the component activities at ﬁxed points in

programs whilst considering pipeline architectures. We also provide equations

that can be used by the compiler to determine whether employing power-gating

instructions in given program blocks will reduce the total energy requirements.

As the duration of power gating on components in given program routines is

related to the number and complexity of program branches, we propose a set of

scheduling policies (Basic

Blk Sched, MIN Path Sched, and AVG Path Sched)

and evaluate their effectiveness. Our proposed framework are effective for ma-

chines with in-order executions. Additional cares have to be taken when one

deals with out-of-order issues. For out-of-order issues, we suggest power-gating

operations on a function unit should be considered dependent to normal op-

erations on this unit. Our experiments are performed by incorporating our

compiler analysis and scheduling policy into SUIF compiler tools [Smith 1998;

Stanford Compiler Group 1995] and by simulating the energy consumptions

on Wattch [Brooks et al. 2000] toolkits. We also revise Wattch/SimpleScalar to

adopt our proposed schemes to deal with out-of-order issues. The experimental

results demonstrate that our mechanisms are very effective in reducing leak-

age power in microprocessors. In summary, the key contributions of our work

include the presentations of data ﬂow analysis framework for component activ-

ities, the scheduling policies for power-gating instructions going beyond basic

blocks, and the suggestions of hardware reﬁnements for out-of-order issues to

work with our proposed methods.

The remainder of this article is organized as follows: Section 2 presents

our machine architectures with power-gating controls. Section 3 presents

our data-ﬂow analysis framework for component activities. Next, Section 4

provides scheduling policies for leakage power reductions by utilizing gath-

ered component information. Experimental results will then be presented in

Section 5. Finally, Section 6 describes related work and Section 7 concludes this

article.

ACM Transactions on Design Automation of Electronic Systems, Vol. 11, No. 1, January 2006.

Compilers for Leakage Power Reduction

•

149

Fig. 1. Machine architecture model with power-gating control.

2. MACHINE ARCHITECTURE

The architecture model in our design is a system with an instruction set that

supports the control of power gating at the component level. Figure 1 shows an

example of our target machine architecture on which our optimization is based.

We focus on the reduction of the power consumption of the certain function units

by invoking the “power-gating” technology. Power gating is analogous to clock

gating—power gating turns off devices by switching off their supply voltage

rather than switching off the clock. This can be achieved by forcing transistors

to turn off or using multithreshold voltage CMOS technology (MTCMOS) to

increase the threshold voltage [Butts and Sohi 2000; Kao and Chandrakasan

2000; Roy 1998].

We built the experimental architecture within the Wattch simulation envi-

ronment [Brooks et al. 2000]. In this simulation environment we can measure

the power consumption of every microprocessor component throughout the ex-

perimental program. This architecture is essentially compatible with the DEC

Alpha 21264 processor [Compaq Computer Corporation 1999]; the major differ-

ence between these two architectures is the additional power-gating design and

the static pipeline scheduling in our experimental architecture. The compiler

approach proposed in this article is basically for in-order issue processors, but

we also propose a solution to make our methodology feasible for out-of-order is-

sue processors shown later in Section 5.3. We implemented the proposed mech-

anism into SimpleScalar and evaluated our approach with out-of-order issue

processors.

The power-gated function units in our experimental architecture are Inte-

ger Multiplier, Floating-Point Adder, Floating-Point Multiplier, and Floating-

Point Divider. The power gating of each function unit can be controlled by the

“power-gating control register” (PGCR). The PGCR is a 64-bit integer register.

ACM Transactions on Design Automation of Electronic Systems, Vol. 11, No. 1, January 2006.

150

•

Y.-P. You et al.

In this case, only the lowest four bits of this register can affect the power-

gating status. The 0th bit of the lowest four bits of the PGCR controls the

power gating of the Integer Multiplier: setting this bit will cause the Integer

Multiplier on, and clearing it will turn off the corresponding function unit in

the next clock cycle. The 1st, 2nd, 3rd bits of these four bits are used for the

Floating-Point Adder, Floating-Point Multiplier, and Floating-Point Divider, re-

spectively. It is worth mentioning that the integer ALU unit within the archi-

tecture is also involved in general program execution, since it also performs

data movements to the PGCR. This means that the integer ALU is always re-

quired, and so this function unit is always on. In addition, we invoke a new

instruction in the simulation environment to specify the access direction of

PGCR. This instruction can operate those four power-gated function units at

once by moving the appropriate value from a general-purpose register to the

PGCR.

3. COMPONENT-ACTIVITY DATA-FLOW ANALYSIS

In this section, we investigate the compiler analysis techniques used to reduce

the leakage power. We present a data-ﬂow analysis framework for a compiler

to analyze the state of components in a microprocessor. The process collects the

information of the utilization of components at various points in a program. We

ﬁrst construct basic blocks and control ﬂow graphs of given programs, and then

develop a data-ﬂow equation for the summary of component usages at given

program points. To gather the data-ﬂow information, we deﬁne comp

gen[B],

comp

kill[B], comp in[B], and comp out[B] for each block B.

We say that a component-activity c is generated at a block B if a component

is required for this execution, symbolized as comp

gen[B], and that it is killed

if the component is released by the last request, symbolized as comp

kill[B].

We then create the two groups of equations shown below. The ﬁrst group of

equations follows from the observation that comp

in[B] is the union of activities

arriving from all the predecessors of B. The second group is the activities at the

end of a block that are either generated within the block, or those entering at

the beginning but not killed as control ﬂows through the block. The data-ﬂow

equation for these two groups is as follows:

comp

in[B] =



P a predessor

of B

comp out[P]

comp

out[B] = comp gen[B] ∪ (comp in[B] − comp kill[B]).

We use an iterative approach to compute the desired results of comp

in and

comp

out after comp gen has been computed for each block. The algorithm

is sketched in Figure 2. This is an iterative algorithm for data-ﬂow equa-

tions [Aho et al. 1986] with the addition of resource management structures. A

two-dimension array, called RemainingCycle, is used to maintain the number of

cycles that are required to fulﬁll requests for each component and block. In ad-

dition, a resource-utilization table is adopted to give the resource requirement

for each instruction of the given microprocessor. The resource-utilization table

can be used to give the initial values of RemainingCycle. The remaining cycles

ACM Transactions on Design Automation of Electronic Systems, Vol. 11, No. 1, January 2006.

Compilers for Leakage Power Reduction

•

151

Fig. 2. Data-ﬂow analysis algorithm for component activities.

of a component decrease by one for each propagation. Initially, both comp in

and com

kill are set to be empty. The iteration continues until comp in (and

hence comp

out) converges. As comp out[B] never decreases in size for any B,

the algorithm will eventually halt when all comp

out are in the steady state.

Intuitively, the algorithm propagates activities of components as far as they will

go by simulating all possible execution paths of the program. This algorithm

provides the state of utilization of components for each point of a program.

4. LEAKAGE-POWER REDUCTION

In this section, we present a cost model for the compiler to determine whether

power-gating control should be applied, and a set of scheduling policies to place

power-gating instructions within given programs.

4.1 Cost Model

With the utilization of components obtained from Section 3, we can insert power-

gating instructions into programs at the appropriate points (i.e., the beginning

and of an inactive block) to turn off and on unused components so as to re-

duce the leakage power. However, both shut-down and wake-up procedures are

ACM Transactions on Design Automation of Electronic Systems, Vol. 11, No. 1, January 2006.

Compilers for leakage power reduction

Figures

Citations

A Framework for Power-Gating Functional Units in Embedded Microprocessors

State-Retentive Power Gating of Register Files in Multicore Processors Featuring Multithreaded In-Order Cores

Compilation for compact power-gating controls

Efficient and scalable compiler-directed energy optimization for realtime applications

Power-aware compiling method

References

Compilers: Principles, Techniques, and Tools

The SimpleScalar tool set, version 2.0

Wattch: a framework for architectural-level power analysis and optimizations

Low-power CMOS digital design

Low-Power CMOS Digital Design

Related Papers (5)

Optimizing Static Power Dissipation by Functional Units in Superscalar Processors

A static power model for architects

Microarchitectural techniques for power gating of execution units

MiBench: A free, commercially representative embedded benchmark suite

Wattch: a framework for architectural-level power analysis and optimizations

Frequently Asked Questions (12)

Q1. What have the authors contributed in "Compilers for leakage power reduction" ?

Q2. What have the authors stated for future works in "Compilers for leakage power reduction" ?

Q3. What is the architecture model in your design?

Q4. What are the key contributions of the work?

Q5. What is the cost model after incorporating latency?

Q6. What is the name of the algorithm used to maintain the number of cycles that are required to fulfill?

Q7. What is the purpose of the proposed compiler framework for power-gating control?

Q8. How much impact does the Wattch have on performance?

Q9. What is the type of the component in analysis for power gating control?

Q10. How can the instruction operate the four function units at once?

Q11. What is the target architecture for their experiments?

Q12. What is the work of Rele et al.?