scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Compilation for compact power-gating controls

TL;DR: This article presents a sink-n-hoist framework for a compiler to generate balanced scheduling of power-gating instructions that attempts to merge several power- gating instructions into a single compound instruction, thereby reducing the amount ofPower leakage instructions issued.
Abstract: Power leakage constitutes an increasing fraction of the total power consumption in modern semiconductor technologies due to the continuing size reductions and increasing speeds of transistors. Recent studies have attempted to reduce leakage power using integrated architecture and compiler power-gating mechanisms. This approach involves compilers inserting instructions into programs to shut down and wake up components, as appropriate. While early studies showed this approach to be effective, there are concerns about the large amount of power-control instructions being added to programs due to the increasing amount of components equipped with power-gating controls in SoC design platforms. In this article we present a sink-n-hoist framework for a compiler to generate balanced scheduling of power-gating instructions. Our solution attempts to merge several power-gating instructions into a single compound instruction, thereby reducing the amount of power-gating instructions issued. We performed experiments by incorporating our compiler analysis and scheduling policies into SUIF compiler tools and by simulating the energy consumption using Wattch toolkits. The experimental results demonstrate that our mechanisms are effective in reducing the amount of power-gating instructions while further reducing leakage power compared to previous methods.

Summary (5 min read)

1. INTRODUCTION

  • Minimizing power dissipation can be considered at algorithmic, architectural, logic, and circuit levels [Chandrakasan et al. 1992].
  • Leakage power is coming to represent a greater proportion of total power dissipation as the feature size of semiconductor technology continues to reduce as shown in Figure 1.
  • The authors framework attempts to merge several power-gating instructions into a single compound instruction, thereby reducing the amount of power-gating instructions issued.
  • The lefthand panel of the figure shows two different components in use, the center panel illustrates the current practice of attempting to issue power-on and power-off ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 51, Pub. date: Sept. 2007.
  • Section 2 describes a machine architecture for the target platform, Section 3 overviews the ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 51, Pub. date: Sept. 2007.

2. MACHINE ARCHITECTURE

  • The architecture model in their design has an instruction set that supports powergating control at the component level.
  • Power gating is analogous to clock gating, except that devices are powered off by switching off their supply voltage, rather than the clock.
  • This can be implemented by forcing transistors to be off or using MTCMOS (multithreshold voltage CMOS technology) to increase the threshold voltage [Butts and Sohi 2000; Kao and Chandrakasan 2000; Roy and Prasad 1992; Hu et al. 2004].
  • Figure 3 illustrates an example of their target machine architecture based on a DEC Alpha 21264 processor with an instruction fetch, issue, and retire unit (Ibox), a block of integer-function units (Ebox), a block of floating-point-function units (Fbox), a memory reference unit (Mbox), and an external cache and system interface unit (Cbox) [Compaq 1999].
  • The power state of each unit is controlled by the 64-bit integer power-gating control register (PGCR).

3. LEAKAGE-POWER-REDUCTION FRAMEWORK

  • This section presents the compiler framework for implementing power-gating mechanisms to reduce leakage-power dissipation.
  • The authors have previously presented a data-flow analysis framework, called component-activity data-flow analysis , to estimate the component activities on a microprocessor within a given program [You et al. 2002, 2006].
  • Powergating-instruction scheduling is then performed to determine whether, where, and when power-gating controls should be employed so as to produce power reduction.
  • The authors solution attempts to merge several power-gating instructions into a single compound instruction.
  • Leftmost items show the case without power-gating controls; middle items show the case when steps I, II, III, and V in the framework are applied; and the rightmost items show the case when all phases in the framework are applied, also known as Three scenarios are considered.

3.1 Component-Activity Data-Flow Analysis

  • The goal of CADFA is to determine the utilization of components at each point in a program using a set of data-flow equations.
  • The predicates of the data-flow equations for collecting component-activity information are given as follows: —COMPONENTloc(b) is a set of components that are required for the first cycle of execution.
  • ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 51, Pub. date: Sept. 2007. — INACTIVITY(b) is a set of components that are not active at block b.
  • In fact, INACTIVITY(b) is the complementary set to COMPONENTout(b), that is, INACTIVITY(b) = − COMPONENTout(b), where is the universal set.

3.2 Power-Gating-Instruction Scheduling

  • Once the utilization information of components has been obtained, the authors can insert power-gating instructions into programs at the appropriate points (i.e., beginning and end of an inactive block) to power off and on unused components so as to reduce the leakage power.
  • Accordingly, the authors have a break-even length of idle intervals for each component C, called BE-ITVLidleC , that sustains the aforementioned inequality BE−ITVLidleC = ⌈ Eoff (C) + Eon(C) Pleak(C) − Prleak(C) ⌉ .
  • The obtained component-activity information and cost model for deciding whether power-gating instructions should be employed allow us to consider scheduling mechanisms when inserting the power-gating instructions into given programs.
  • Only one of the branchings may benefit from power gating, in which case instigating power-gating control in one branch when the other is instead taken may not reduce the power requirements.
  • To accommodate this, the authors propose an eclectic policy, called AVG Path Sched, to schedule power-gating instructions.

4. SINK-N-HOIST ANALYSIS

  • The main idea of sink-n-hoist analysis is to reduce the problem of excessive addition of instructions with code-motion techniques.
  • The approach attempts to merge several power-gating instructions into one compound instruction by “sinking” power-off instructions and “hoisting” power-on instructions; that is, postponing the issuing of power-off instructions and bringing forward the issuing of power-on.
  • A cost model is given next to determine the feasibility.
  • In consequence, the authors have a maximum sinkable slack for each component C, called MAX−SINK−SLKC, that sustains the 2In the following context, “statement” and “instruction” are used interchangeably, since a statement at the assembly code level means an instruction.
  • Figure 6 shows the algorithm for sink-n-hoist analysis.

4.1 Sinkable Analysis and Grouping-Off Analysis

  • The predicates for collecting SINKABLE and GROUP−OFF information are given as follows.
  • Moreover, the value of each SINK−SLKbC is decreased by one in accordance with the following definition.
  • In fact, SINKABLEout(b) presents the set of power-off statements (whether sunk or not) that can be issued at block b.
  • Block b belongs to the group it enumerates and is the beginning block of a set of successive blocks if GROUP−OFFloc(b) is not empty.
  • To reduce the amount of power-gating instructions issued, the authors apply sinkable analysis.

4.2 Hoistable and Grouping-On Analysis

  • Hoistable and grouping-on analyses are similar to sinkable and grouping-off analyses, except that hoistable analysis is a backward data-flow analysis.
  • Moreover, the value of each HOIST−SLKbC is decreased by one in accordance with the following definition.
  • HOIST-SLKbC = MINs∈Succ(b)(HOIST-SLKsC) − 1 —HOISTABLEin(b) is a set of power-on statements that can be safely moved to the start of block b. HOISTABLEin(b) = HOISTABLEloc(b) ∪ (HOISTABLEout(b) − HOISTABLEblk(b)).
  • Block b belongs to the group it enumerates and is the beginning block of a set of successive blocks if GROUP−ONloc(b) is not empty.
  • In addition, the authors can replace all of the GROUP−ONout set of its predecessors by GROUP−ONin(b) if the GROUP−ONout set of the predecessor of b is not empty.

4.3 Grouping-Switch Analysis

  • In order to collect more grouping information for later analysis, the authors introduce grouping-switch analysis, which groups together all power-on and power-off instructions that might be merged.
  • The analysis is similar to grouping-off and grouping-on analyses.
  • The predicates for computing GROUP−SWH are as follows: —GROUP−SWHloc(b) is a set with at most one element (i.e., a singleton or empty set) in which the element (if it exists) is an integer representing a group number and never appears in other sets of GROUP−SWHloc.
  • Block b belongs to the group it enumerates and is the beginning block of a set of successive blocks if GROUP−SWHloc(b) is not empty.
  • In addition, the authors can also replace all of the GROUP−SWHout set of its predecessors by GROUP−ONin(b) if the GROUP−SWHout set of the predecessor of b is not empty.

4.4 Power-Gating-Instruction Placement

  • The authors use information from the SINKABLEout, HOISTABLEin, GROUP−OFFout, GROUP−ONout, and GROUP−SWHout predicates described in Sections 4.1, 4.2, and 4.3 to determine how to place power-gating instructions, that is, whether power-gating instructions should be combined or issued separately.
  • Figure 9 outlines an algorithm for placing power-gating instructions in a group-by-group manner.
  • It then uses an energy-cost model (including leakage energy, the energy associated with issuing power-off instructions, etc.) to determine which policy results in the lowest energy consumption.
  • Towards the actual time spent in their experiments the process only contributes a very small fraction: less than 0.6% of their proposed framework.
  • In the following, the authors elaborate the idea by continuing the example presented in Section 4.1.

5.1 Platform

  • The authors used a DEC-Alpha-compatible architecture with the power-gating controls and instruction sets as described in Figure 3 as the target architecture for their experiments.
  • By default, the simulator performed out-of-order executions.
  • The benchmarks used in their experiments were from the floating-point version ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 51, Pub. date: Sept. 2007. of the DSPstone benchmark suite [Zivojnovic et al. 1994].
  • The instruction stores the value of register $24 into the memory address below zero, which is an invalid memory address ($31 is a constant zero register) and should never be generated by standard compilers.
  • The energy consumption of fetching and decoding a power-gating instruction was assumed to be 2 times the leakage power.

5.2 Results and Discussion

  • The results from three types of experiment are compared: (1) no power-gating mechanism ; (2) CADFA as from a previous work [You et al. 2006, 2002] in which only steps I, II, and III of Figure 4 were performed; and (3) sinkn-hoist analysis involving all phases in Figure 4.
  • Figures 12–14 give the compilation and simulation results of two approaches: CADFA and CADFA with sink-n-hoist when the integer multiplier, floatingpoint adder, and floating-point multiplier are considered for power gating, and the comparison baseline in these figures is the one without power-gating controls.
  • The energy consumption was measured by 5 categories: the dynamic energy dissipated by clock circuits and that by the whole processor except for clock circuits, the leakage energy dissipated by power-gatable units and that by the whole processor except for power-gatable units, and the overhead energy consumption due to extra powergating instructions.
  • Therefore, fir2dim and matrix execute more power-gating operations, and thus consume more execution cycles.
  • It shows that their technique is effective in helping leakage control at/beyond new technology generations.

7. CONCLUSION

  • In summary, their experiments have demonstrated that the sink-n-hoist analysis framework proposed in this article improves code size, energy consumption, and performance.
  • It reduces the overall energy consumption and code size growth by an average of about 0.9% and 47.8% , respectively, compared with the CADFA scheme without their sink-n-hoist approach, and impacts performance by an average of less than 1%.
  • As the compiler phase is done one phase after another, their framework provides a sound theoretical foundation capable of working with other improvements, such as adding more slackness for low power.
  • The authors are currently in the process of incorporating more components (such as cryptography modules) into their architecture and simulator.
  • The authors expect that their scheme will be even more beneficial as more extensible modules are equipped with powergating controls in SoC design platforms.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

51
Compilation for Compact Power-Gating
Controls
YI-PING YOU, CHUNG-WEN HUANG, and JENQ KUEN LEE
National Tsing Hua University
Power leakage constitutes an increasing fraction of the total power consumption in modern semi-
conductor technologies due to the continuing size reductions and increasing speeds of transistors.
Recent studies have attempted to reduce leakage power using integrated architecture and compiler
power-gating mechanisms. This approach involves compilers inserting instructions into programs
to shut down and wake up components, as appropriate. While early studies showed this approach
to be effective, there are concerns about the large amount of power-control instructions being added
to programs due to the increasing amount of components equipped with power-gating controls in
SoC design platforms. In this article we present a sink-n-hoist framework for a compiler to gen-
erate balanced scheduling of power-gating instructions. Our solution attempts to merge several
power-gating instructions into a single compound instruction, thereby reducing the amount of
power-gating instructions issued. We performed experiments by incorporating our compiler anal-
ysis and scheduling policies into SUIF compiler tools and by simulating the energy consumption
using Wattch toolkits. The experimental results demonstrate that our mechanisms are effective in
reducing the amount of power-gating instructions while further reducing leakage power compared
to previous methods.
Categories and Subject Descriptors: D.3.4 [Programming Languages]: Processors—Compilers;
optimization
General Terms: Algorithms, Experimentation, Languages
Additional Key Words and Phrases: Compilers for low power, data-flow analysis, leakage-power
reduction, balanced scheduling, power-gating mechanisms
ACM Reference Format:
You, Y.-P, Huang, C.-W., and Lee, J. K. 2007. Compilation for compact power-gating controls.
ACM Trans. Des. Automat. Electron. Syst. 12, 4, Article 51 (September 2007), 26 pages. DOI =
10.1145/1278349.1278364 http://doi.acm.org/10.1145/1278349.1278364
This work was supported in part by the National Science Council Grants NSC 95-2220-E-007-001
and NSC 95-2220-E-007-002, the Ministry of Economic Affairs Grants 95-EC-17-A-01-S1-034 and
96-EC-17-A-01-S1-034, and ITRI under an ITRI/NTHU research grant.
Authors’ addresses: Y.-P. You, C.-W. Huang, J. K. Lee, (corresponding author), Department
of Computer Science, National Tsing Hua University, Hsinchu 30013, Taiwan; email: {ypyou,
cwhuang}@pllab.cs.nthu.edu.tw; jklee@cs.nthu.edu.tw.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is
granted without fee provided that copies are not made or distributed for profit or direct commercial
advantage and that copies show this notice on the first page or initial screen of a display along
with the full citation. Copyrights for components of this work owned by others than ACM must be
honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers,
to redistribute to lists, or to use any component of this work in other works requires prior specific
permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn
Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or permissions@acm.org.
C
2007 ACM 1084-4309/2007/09-ART51 $5.00 DOI 10.1145/1278349.1278364 http://doi.acm.org/
10.1145/1278349.1278364
ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 51, Pub. date: Sept. 2007.

51:2
Y.-P. You et al.
1. INTRODUCTION
Minimizing power dissipation can be considered at algorithmic, architectural,
logic, and circuit levels [Chandrakasan et al. 1992]. Numerous studies in the
literature on low-power design have proposed various techniques for synthe-
sizing designs with reduced transitional activities. Recently, the prospect of
combining architecture design and software arrangement at the instruction
level has been addressed to help reduce power consumption [Bellas et al. 2000;
Chang and Pedram 1995; Horowitz et al. 1994; Lee et al. 2003; 1997; Su and
Despain 1995; Tiwari et al. 1998, 1997] For example, several types of software
rearrangement have been used to reduce the dynamic power, such as utilizing
the value locality of registers [Chang and Pedram 1995], swapping operands for
Booth multipliers [Lee et al. 1997], scheduling VLIW instructions to reduce the
power consumption on the instruction bus [Lee et al. 2003], gating the clock to
reduce workloads [Horowitz et al. 1994; Tiwari et al. 1998, 1997], utilizing cache
subbanking mechanisms [Su and Despain 1995], and an instruction cache for
loops [Bellas et al. 2000].
Leakage power is coming to represent a greater proportion of total power
dissipation as the feature size of semiconductor technology continues to reduce
as shown in Figure 1. It is predicted that leakage power will become comparable
to dynamic power within only a few generations [Doyle et al. 2002; Karnik et al.
2002; Kim et al. 2003; Semiconductor Industry 2004; Jones 2004]. Therefore,
power gating to reduce leakage power should be used in addition to clock gating,
which is only able to reduce the dynamic power [Kao and Chandrakasan 2000;
Butts and Sohi 2000; Hu et al. 2004]. Recent studies have attempted to reduce
leakage power using integrated architecture and compiler power-gating mech-
anisms [Dropsho et al. 2002; Yang et al. 2002; You et al. 2002, 2006; Rele et al.
2002; Zhang et al. 2003]. This approach involves compilers inserting instruc-
tions into programs to shut down and wake up components whenever appro-
priate, based on a data-flow analysis or profiling analysis. While early studies
showed this approach to be effective, there are concerns about the amount of
power-control instructions being added to programs with increasing numbers
of components being equipped with power-gating controls in system-on-a-chip
(SoC) design platforms for embedded systems. Note that architecture design-
ers can customize the processor with unique operation functions [Ip et al. 2002;
Gonzalez 2000; Tsutsui et al. 2002]. For example, one may have extensible in-
structions for modules of cryptography, 3D graphics, and motion estimation, as
well as variety of wireless communication modules, etc.
In this article we present a sink-n-hoist framework for a compiler to generate
balanced scheduling of power-gating instructions. Our framework attempts to
merge several power-gating instructions into a single compound instruction,
thereby reducing the amount of power-gating instructions issued. Note that
whilst power-gating instructions can significantly reduce leakage power, they
produce recovery penalties and increase the execution time and code size of pro-
grams. Figure 2 illustrates an example of power-gating control. The lefthand
panel of the figure shows two different components in use, the center panel
illustrates the current practice of attempting to issue power-on and power-off
ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 51, Pub. date: Sept. 2007.

Compilation for Compact Power-Gating Controls
51:3
Fig. 1. Leakage power trend.
Fig. 2. Scenarios of power-gating controls (the shaded components are those in use).
instructions for these two hardware components separately, and the righthand
panel shows our scheme that attempts to merge these instructions. In this ar-
ticle we provide a cost model and software foundation to guide this process.
Our solution includes a set of data-flow equations for code motion of power-
gating instructions. Our work combines a theoretical foundation and step-by-
step framework for moving, grouping, and merging power-gating instructions.
We have performed experiments that incorporate our compiler analysis and
scheduling policies into SUIF compiler tools, and simulate the energy consump-
tion using Wattch toolkits [Brooks et al. 2000]. Experimental results obtained
using the DSPstone benchmark suite demonstrate that our mechanisms are
effective in reducing both the amount of power-gating instructions and the
power consumption relative to previous methods. Our sink-n-hoist framework
for merging power-gating instructions reduces the code size by an average of
47.8%, and also further reduces the energy consumption due to the block ver-
sion of power-gating instructions, giving better power and performance than
the pointwise power-gating instructions.
The remainder of this article is organized as follows. Section 2 describes
a machine architecture for the target platform, Section 3 overviews the
ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 51, Pub. date: Sept. 2007.

51:4
Y.-P. You et al.
Fig. 3. DEC Alpha 21264 architecture with power-gating support.
leakage-power reduction-framework, Section 4 presents our analysis and merg-
ing techniques for reducing the amount of power-gating instructions, Section 5
gives the experimental results of our study, Section 6 describes related work,
and Section 7 concludes.
2. MACHINE ARCHITECTURE
The architecture model in our design has an instruction set that supports power-
gating control at the component level. We focus on reducing the power consump-
tion of certain components by invoking power-gating technology. Power gating
is analogous to clock gating, except that devices are powered off by switching
off their supply voltage, rather than the clock. This can be implemented by
forcing transistors to be off or using MTCMOS (multithreshold voltage CMOS
technology) to increase the threshold voltage [Butts and Sohi 2000; Kao and
Chandrakasan 2000; Roy and Prasad 1992; Hu et al. 2004].
Figure 3 illustrates an example of our target machine architecture based on
a DEC Alpha 21264 processor with an instruction fetch, issue, and retire unit
(Ibox), a block of integer-function units (Ebox), a block of floating-point-function
units (Fbox), a memory reference unit (Mbox), and an external cache and sys-
tem interface unit (Cbox) [Compaq 1999]. In the adapted DEC Alpha 21264
architecture model, Ebox and Fbox were equipped with power-gated functions.
The power state of each unit is controlled by the 64-bit integer power-gating
control register (PGCR). In this case, 1 bit is used for the integer multiplier
unit and 3 for the floating-point function units. Setting the power-gating bit
to true powers on the corresponding module, and clearing the bit to 0 powers
off the corresponding module immediately in the following clock cycle. A new
ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 51, Pub. date: Sept. 2007.

Compilation for Compact Power-Gating Controls
51:5
Fig. 4. The leakage-power-reduction framework.
instruction was implemented to control units with the power-gated function by
moving the appropriate value from a general-purpose register to the PGCR.
The integer ALU unit is always powered on, since it takes the responsibility for
moving data to the PGCR.
3. LEAKAGE-POWER-REDUCTION FRAMEWORK
This section presents the compiler framework for implementing power-gating
mechanisms to reduce leakage-power dissipation. We have previously pre-
sented a data-flow analysis framework, called component-activity data-flow
analysis (CADFA), to estimate the component activities on a microprocessor
within a given program [You et al. 2002, 2006]. The analysis collects the infor-
mation of the utilization of components at each point in the program. Power-
gating-instruction scheduling is then performed to determine whether, where,
and when power-gating controls should be employed so as to produce power
reduction. Finally, power-gating instructions are inserted into the program ac-
cordingly. In the current study, we present a sink-n-hoist framework, applied
in the phase immediately before power-gating instructions are inserted, to gen-
erate balanced scheduling of power-gating instructions. Our solution attempts
to merge several power-gating instructions into a single compound instruction.
Figure 4 presents the compiler flow of the leakage-power-reduction framework.
In the figure, steps I, II, and III are conventional [You et al. 2006, 2002], and
steps IV and V are proposed in this article to merge power-gating instruc-
tions. Steps I and II involve performing a component-activity data-flow analy-
sis, step III decides if and where power-gating instructions should be inserted,
step IV attempts to merge the power-gating instructions with our proposed
sink-n-hoist framework, and step V produces the power-gating instructions. A
motivating example of power-gating control in three floating-point units (ALU,
multiplier, and divider) with this framework is illustrated in Figure 5, where
each item shows the status of a component on a timeline, and a shaded item
represents one that it is in use. Three scenarios are considered: leftmost items
show the case without power-gating controls; middle items show the case when
steps I, II, III, and V in the framework are applied; and the rightmost items
show the case when all phases in the framework are applied. The number of
power-gating instructions inserted can be decreased from six to two when the
sink-n-hoist Analysis is applied.
ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 4, Article 51, Pub. date: Sept. 2007.

Citations
More filters
Proceedings ArticleDOI
28 Oct 2010
TL;DR: Feedbacks from students show that these laboratory modules for Android systems in embedded system software are interesting to students and give them essential training of adopting Android components for embedded software development.
Abstract: Technologies for handheld devices with open-platforms have made rapid progresses recently which gives rise to the necessities of bringing embedded system education and training material up to date Android system plays a leading role among all of the open-platforms for embedded systems and makes impacts on daily usages of mobile devices In this paper, we present our experience of incorporating Android-based lab modules in embedded system courses Our lab modules include system software labs and embedded application labs The Android embedded application lab modules contain computer vision, audio signal processing and speech recognitions, and 3D graphics materials Lab modules for Android systems in embedded system software cover topics on embedded compiler, HW/SW co-design, and power optimization We also illustrate how these laboratory modules can be integrated into embedded system curriculum Feedbacks from students show that these laboratory modules are interesting to students and give them essential training of adopting Android components for embedded software development

8 citations

Journal ArticleDOI
TL;DR: A multithread power-gating framework composed of multith read power- gating analysis (MTPGA) and predicated power-Gating (PPG) energy management mechanisms for reducing the leakage power when executingMultithread programs on simultaneous multithreading (SMT) machines is presented.
Abstract: Multithread programming is widely adopted in novel embedded system applications due to its high performance and flexibility. This article addresses compiler optimization for reducing the power consumption of multithread programs. A traditional compiler employs energy management techniques that analyze component usage in control-flow graphs with a focus on single-thread programs. In this environment the leakage power can be controlled by inserting on and off instructions based on component usage information generated by flow equations. However, these methods cannot be directly extended to a multithread environment due to concurrent execution issues.This article presents a multithread power-gating framework composed of multithread power-gating analysis (MTPGA) and predicated power-gating (PPG) energy management mechanisms for reducing the leakage power when executing multithread programs on simultaneous multithreading (SMT) machines. Our multithread programming model is based on hierarchical bulk-synchronous parallel (BSP) models. Based on a multithread component analysis with dataflow equations, our MTPGA framework estimates the energy usage of multithread programs and inserts PPG operations as power controls for energy management. We performed experiments by incorporating our power optimization framework into SUIF compiler tools and by simulating the energy consumption with a post-estimated SMT simulator based on Wattch toolkits. The experimental results show that the total energy consumption of a system with PPG support and our power optimization method is reduced by an average of 10.09p for BSP programs relative to a system without a power-gating mechanism on leakage contribution set to 30p; and the total energy consumption is reduced by an average of 4.27p on leakage contribution set to 10p. The results demonstrate our mechanisms are effective in reducing the leakage energy of BSP multithread programs.

7 citations


Cites background or methods from "Compilation for compact power-gatin..."

  • ...Various studies have attempted to reduce the leakage power using integrated architectures and compiler-based power gating mechanisms [Dropsho et al. 2002; Yang et al. 2002; You et al. 2002, 2007; Rele et al. 2002; Zhang et al. 2003; Li and Xue 2004]....

    [...]

  • ...Memory access latency is caused by the memory hierarchy, such as cache miss. Pipelining latency and memory access latency are both discussed in traditional power-gating analyses for single-thread environments such as CADFA [You et al. 2006] and sink-n-hoist [You et al. 2005, 2007]....

    [...]

  • ...A conventional power-gating optimization framework [You et al. 2005, 2007] can be employed for candidates used by a single thread, with the compiler inserting instructions into the program to shut down and wake up components as ap­propriate....

    [...]

  • ...Steps 5 and 6 further merge the generated power-gating controls into a single compound instruction based on the sink-n-hoist framework [You et al. 2005, 2007]....

    [...]

  • ...The Sink-N-Hoist framework [You et al. 2005, 2007] has been used to reduce the number of power-gating instructions generated by compilers....

    [...]

Proceedings ArticleDOI
24 Oct 2010
TL;DR: In this paper, the authors propose a power aware simulation framework on embedded multicore DSP subsystems for SID framework, which includes two phases, IP level power modeling and system level power prower profiling.
Abstract: The embedded multicore DSP systems are playing increasingly important role for consumer electronic design. Such systems try to optimize the objective for both performance and power with mobile devices. Embedded application developers will then devise designs to optimize embedded applications for not only performance but also power. However, currently there are no power metrics support for popular application design platforms such as QEMU and SID, where application developers develop their applications. This hinders application developers to help tune optimizations for power. In this paper, we propose a power aware simulation framework on embedded multicore DSP subsystems for SID framework. To the best of our knowledge, this is the first work to attempt to build a power aware simulator based on SID simulation framework. The power estimation flow includes two phases, IP level power modeling and system level power prower profiling. In the IP level power modeling, PowerMixerIP is employed to build up the power model for PAC DSP and major IPs. In the system level power profiling, we provide a power profiling hierarchy that meets the demand of embedded software developers. The granularity of power profiling can be configured to the whole simulation stage or any specific time slot in the simulation such as a dedicated function loop. In our experiments, DSP programs with SIMD intrinsics for DSPStone benchmark are examined with our proposed power aware simulator. In addition, a face detection application is deployed as a running example on multi-core DSP systems to show how our power simulator can be used to help collaborate with developers in the optimization process to illustrate views of power dissipations of applications.

6 citations

Journal ArticleDOI
TL;DR: This article presents an energy-aware code-motion framework for a compiler to generate concentrated accesses to input and output (I/O) buffers inside a GPU, and attempts to gather the I/O buffer accesses into clusters, thereby extending the time period during which the I-O buffers are clock or power gated.
Abstract: Graphics processing units (GPUs) are now being widely adopted in system-on-a-chip designs, and they are often used in embedded systems for manipulating computer graphics or even for general-purpose computation. Energy management is of concern to both hardware and software designers. In this article, we present an energy-aware code-motion framework for a compiler to generate concentrated accesses to input and output (I/O) buffers inside a GPU. Our solution attempts to gather the I/O buffer accesses into clusters, thereby extending the time period during which the I/O buffers are clock or power gated. We performed experiments in which the energy consumption was simulated by incorporating our compiler-analysis and code-motion framework into an in-house compiler tool. The experimental results demonstrated that our mechanisms were effective in reducing the energy consumption of the shader processor by an average of 13.1p and decreasing the energy-delay product by 2.2p.

4 citations


Cites background from "Compilation for compact power-gatin..."

  • ...Recent studies have attempted to reduce the leakage power consumption using integrated architecture and compiler power-gating mecha­nisms [You et al. 2002, 2005, 2006, 2007; Rele et al. 2002; Dropsho et al. 2002; Yang et al. 2002; Zhang et al. 2003]....

    [...]

Journal ArticleDOI
TL;DR: The design and experiments of a SID-based power-aware simulation framework for embedded multicore systems are presented and it is demonstrated via case studies and experiments how application developers can use the SID -based power simulator for optimizing the power consumption of their applications.
Abstract: Embedded multicore systems are playing increasingly important roles in the design of consumer electronics. The objective of such systems is to optimize both performance and power characteristics of mobile devices. However, currently there are no power metrics supporting popular application design platforms (such as SID) that application developers use to develop their applications. This hinders the ability of application developers to optimize power consumption. In this article we present the design and experiments of a SID-based power-aware simulation framework for embedded multicore systems. The proposed power estimation flow includes two phases: IP-level power modeling and power-aware system simulation. The first phase employs PowerMixerIP to construct the power model for the processor IP and other major IPs, while the second phase involves a power abstract interpretation method for summarizing the simulation trace, then, with a CPE module, estimating the power consumption based on the summarized trace information and the input of IP power models. In addition, a Manager component is devised to map each digital signal processor (DSP) component to a host thread and maintain the access to shared resources. The aim is to maintain the simulation performance as the number of simulated DSP components increases. A power-profiling API is also supported that developers of embedded software can use to tune the granularity of power-profiling for a specific code section of the target application. We demonstrate via case studies and experiments how application developers can use our SID-based power simulator for optimizing the power consumption of their applications. We characterize the power consumption of DSP applications with the DSPstone benchmark and discuss how compiler optimization levels with SIMD intrinsics influence the performance and power consumption. A histogram application and an augmented-reality application based on human-face-based RMS (recognition, mining, and synthesis) application are deployed as running examples on multicore systems to demonstrate how our power simulator can be used by developers in the optimization process to illustrate different views of power dissipations of applications.

3 citations


Cites methods from "Compilation for compact power-gatin..."

  • ...The work on leakage power optimization includes using power gating alone [Butts and Sohi 2000; Hu et al. 2004] and integrated architecture and compiler power-gating mechanisms [Semeraro et al. 2002; Yang et al. 2002; You et al. 2006, 2007; Rele et al. 2002; Zhang et al. 2003]....

    [...]

References
More filters
Proceedings ArticleDOI
01 May 2000
TL;DR: Wattch is presented, a framework for analyzing and optimizing microprocessor power dissipation at the architecture-level and opens up the field of power-efficient computing to a wider range of researchers by providing a power evaluation methodology within the portable and familiar SimpleScalar framework.
Abstract: Power dissipation and thermal issues are increasingly significant in modern processors. As a result, it is crucial that power/performance tradeoffs be made more visible to chip architects and even compiler writers, in addition to circuit designers. Most existing power analysis tools achieve high accuracy by calculating power estimates for designs only after layout or floorplanning are complete. In addition to being available only late in the design process, such tools are often quite slow, which compounds the difficulty of running them for a large space of design possibilities.This paper presents Wattch, a framework for analyzing and optimizing microprocessor power dissipation at the architecture-level. Wattch is 1000X or more faster than existing layout-level power tools, and yet maintains accuracy within 10% of their estimates as verified using industry tools on leading-edge designs. This paper presents several validations of Wattch's accuracy. In addition, we present three examples that demonstrate how architects or compiler writers might use Wattch to evaluate power consumption in their design process.We see Wattch as a complement to existing lower-level tools; it allows architects to explore and cull the design space early on, using faster, higher-level tools. It also opens up the field of power-efficient computing to a wider range of researchers by providing a power evaluation methodology within the portable and familiar SimpleScalar framework.

2,848 citations


"Compilation for compact power-gatin..." refers methods in this paper

  • ...We have performed experiments that incorporate our compiler analysis and scheduling policies into SUIF compiler tools, and simulate the energy consumption using Wattch toolkits [Brooks et al. 2000]....

    [...]

  • ...V supply voltage [Brooks et al. 2000]....

    [...]

  • ...We have performed experiments that incorporate our compiler analysis and scheduling policies into SUIF compiler tools, and simulate the energy consump­tion using Wattch toolkits [Brooks et al. 2000]....

    [...]

Journal ArticleDOI
TL;DR: In this paper, techniques for low power operation are presented which use the lowest possible supply voltage coupled with architectural, logic style, circuit, and technology optimizations to reduce power consumption in CMOS digital circuits while maintaining computational throughput.
Abstract: Motivated by emerging battery-operated applications that demand intensive computation in portable environments, techniques are investigated which reduce power consumption in CMOS digital circuits while maintaining computational throughput. Techniques for low-power operation are shown which use the lowest possible supply voltage coupled with architectural, logic style, circuit, and technology optimizations. An architecturally based scaling strategy is presented which indicates that the optimum voltage is much lower than that determined by other scaling considerations. This optimum is achieved by trading increased silicon area for reduced power consumption. >

2,690 citations

Journal Article
TL;DR: An architecturally based scaling strategy is presented which indicates that the optimum voltage is much lower than that determined by other scaling considerations, and is achieved by trading increased silicon area for reduced power consumption.
Abstract: Motivated by emerging battery-operated applications that demand intensive computation in portable environments, techniques are investigated which reduce power consumption in CMOS digital circuits while maintaining computational throughput Techniques for low-power operation are shown which use the lowest possible supply voltage coupled with architectural, logic style, circuit, and technology optimizations An architecturally based scaling strategy is presented which indicates that the optimum voltage is much lower than that determined by other scaling considerations This optimum is achieved by trading increased silicon area for reduced power consumption >

2,337 citations


"Compilation for compact power-gatin..." refers background in this paper

  • ...Minimizing power dissipation can be considered at algorithmic, architectural, logic, and circuit levels [Chandrakasan et al. 1992]....

    [...]

  • ...INTRODUCTION Minimizing power dissipation can be considered at algorithmic, architectural, logic, and circuit levels [Chandrakasan et al. 1992]....

    [...]

Journal ArticleDOI
TL;DR: The other source of power dissipation in microprocessors, dynamic power, arises from the repeated capacitance charge and discharge on the output of the hundreds of millions of gates in today's chips.
Abstract: Off-state leakage is static power, current that leaks through transistors even when they are turned off. The other source of power dissipation in today's microprocessors, dynamic power, arises from the repeated capacitance charge and discharge on the output of the hundreds of millions of gates in today's chips. Until recently, only dynamic power has been a significant source of power consumption, and Moore's law helped control it. However, power consumption has now become a primary microprocessor design constraint; one that researchers in both industry and academia will struggle to overcome in the next few years. Microprocessor design has traditionally focused on dynamic power consumption as a limiting factor in system integration. As feature sizes shrink below 0.1 micron, static power is posing new low-power design challenges.

1,233 citations

Journal ArticleDOI
TL;DR: A general model of partial constraint satisfaction is proposed and standard backtracking and local consistency techniques for solving constraint satisfaction problems can be adapted to cope with, and take advantage of, the differences between partial and complete constraint satisfaction.

686 citations

Frequently Asked Questions (16)
Q1. What contributions have the authors mentioned in the paper "Compilation for compact power-gating controls" ?

In this article the authors present a sink-n-hoist framework for a compiler to generate balanced scheduling of power-gating instructions. The authors performed experiments by incorporating their compiler analysis and scheduling policies into SUIF compiler tools and by simulating the energy consumption using Wattch toolkits. The experimental results demonstrate that their mechanisms are effective in reducing the amount of power-gating instructions while further reducing leakage power compared to previous methods. 

Moreover, their scheme also further reduces total energy consumption compared to that without the sink-n-hoist framework, which is due to the block version of the power-gating instructions giving better power and performance characteristics than the pointwise version. 

Their sink-n-hoist framework for a compiler solution attempts to merge several power-gating instructions into a single compound instruction so as to reduce the amount of power-gating instructions. 

Since the time required to instigate power-gating controls on components is influenced by conditional branches in programs, the authors propose the following set of scheduling policies with power-gating instructions: Basic Blk Sched, MIN Path Sched, and AVG Path Sched. 

the additional phase has little or no influence on performance; it only inserts power-gating instructions and thus barely affects execution behavior. 

The SINKABLE predicate gives that to collect the information required to determine how far the power-off instructions of component activities can be sunk, and the GROUP−OFF predicate gives that to partition power-off instructions into groups. 

Powergating-instruction scheduling is then performed to determine whether, where, and when power-gating controls should be employed so as to produce power reduction. 

In the current study, the authors present a sink-n-hoist framework, applied in the phase immediately before power-gating instructions are inserted, to generate balanced scheduling of power-gating instructions. 

The predicates for computing GROUP−SWH are as follows:—GROUP−SWHloc(b) is a set with at most one element (i.e., a singleton or empty set) in which the element (if it exists) is an integer representing a group number and never appears in other sets of GROUP−SWHloc. 

since Wattch does not model leakage at the component level per se, the authors assumed that leakage power contributes 10% of the total power consumption. 

Towards the actual time spent in their experiments the process only contributes a very small fraction: less than 0.6% of their proposed framework. 

The authors are currently in the process of incorporating more components (such as cryptography modules) into their architecture and simulator. 

there are concerns about the amount of power-control instructions being added to programs as increasing numbers of components are equipped with power-gating controls in SoC design platforms. 

a maximum number of cycles to be sunk or hoisted should be set, since sinking or hoisting a power-gating instruction will increase leakage dissipation. 

The authors used a DEC-Alpha-compatible architecture with the power-gating controls and instruction sets as described in Figure 3 as the target architecture for their experiments. 

Figure 14 shows that the performance impact of power-gating mechanisms is less than 5% for most of the benchmarks for both CADFA and CADFA with sink-n-hoist.