scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Mobile multicores: use them or waste them

03 Nov 2013-pp 12
TL;DR: This work develops a policy in Linux that exploits the fact that core offlining leads to very modest savings in the best circumstances, with a heavy penalty in others, and shows the cause of this to be low per-core idle power.
Abstract: Energy management is a primary consideration in the design of modern smartphones, made more interesting by the recent proliferation of multi-core processors in this space. We investigate how core offlining and DVFS can be used together on these systems to reduce energy consumption. We show that core offlining leads to very modest savings in the best circumstances, with a heavy penalty in others, and show the cause of this to be low per-core idle power. We develop a policy in Linux that exploits this fact, and show that it improves up to 25% on existing implementations.
Citations
More filters
Proceedings ArticleDOI
13 Apr 2015
TL;DR: POET as discussed by the authors is an open-source C library and runtime system that takes a specification of the platform resources and optimizes the application execution to achieve predictable timing and energy reduction.
Abstract: Embedded real-time systems must meet timing constraints while minimizing energy consumption. To this end, many energy optimizations are introduced for specific platforms or specific applications. These solutions are not portable, however, and when the application or the platform change, these solutions must be redesigned. Portable techniques are hard to develop due to the varying tradeoffs experienced with different application/platform configurations. This paper addresses the problem of finding and exploiting general tradeoffs, using control theory and mathematical optimization to achieve energy minimization under soft real-time application constraints. The paper presents POET, an open-source C library and runtime system that takes a specification of the platform resources and optimizes the application execution. We test POET’s ability to portably deliver predictable timing and energy reduction on two embedded systems with different tradeoff spaces – the first with a mobile Intel Haswell processor, and the second with an ARM big.LITTLE System on Chip. POET achieves the desired latency goals with small error while consuming, on average, only 1.3% more energy than the dynamic optimal oracle on the Haswell and 2.9% more on the ARM. We believe this open-source, librarybased approach to resource management will simplify the process of writing portable, energy-efficient code for embedded systems.

104 citations

Proceedings ArticleDOI
14 Mar 2015
TL;DR: This paper proposes LEO, a probabilistic graphical model-based learning system that provides accurate online estimates of an application's power and performance as a function of system configuration, and finds that LEO produces the most accurate estimates and near optimal energy savings.
Abstract: In many deployments, computer systems are underutilized -- meaning that applications have performance requirements that demand less than full system capacity. Ideally, we would take advantage of this under-utilization by allocating system resources so that the performance requirements are met and energy is minimized. This optimization problem is complicated by the fact that the performance and power consumption of various system configurations are often application -- or even input -- dependent. Thus, practically, minimizing energy for a performance constraint requires fast, accurate estimations of application-dependent performance and power tradeoffs. This paper investigates machine learning techniques that enable energy savings by learning Pareto-optimal power and performance tradeoffs. Specifically, we propose LEO, a probabilistic graphical model-based learning system that provides accurate online estimates of an application's power and performance as a function of system configuration. We compare LEO to (1) offline learning, (2) online learning, (3) a heuristic approach, and (4) the true optimal solution. We find that LEO produces the most accurate estimates and near optimal energy savings.

90 citations


Cites background from "Mobile multicores: use them or wast..."

  • ...This strategy incurs almost no runtime overhead, but may be suboptimal in terms of energy, since maximum resource allocation is not always the best solution to the energy minimization equation (1) [7, 21, 32]....

    [...]

Proceedings ArticleDOI
Duc Hoang Bui1, Yunxin Liu2, Hyosu Kim1, Insik Shin1, Feng Zhao2 
07 Sep 2015
TL;DR: This work aims to reduce the energy consumed to load web pages on smartphones, preferably without increasing page load time and compromising user experience, and derives general design principles for energy-efficient web page loading, and applies these principles to the open-source Chromium browser.
Abstract: Web browsing is a key application on mobile devices. However, mobile browsers are largely optimized for performance, imposing a significant burden on power-hungry mobile devices. In this work, we aim to reduce the energy consumed to load web pages on smartphones, preferably without increasing page load time and compromising user experience. To this end, we first study the internals of web page loading on smartphones and identify its energy-inefficient behaviors. Based on our findings, we then derive general design principles for energy-efficient web page loading, and apply these principles to the open-source Chromium browser and implement our techniques on commercial smartphones. Experimental results show that our techniques are able to achieve a 24.4% average system energy saving for Chromium on a latest-generation big.LITTLE smartphone using WiFi (a 22.5% saving when using 3G), while not increasing average page load time. We also show that our proposed techniques can bring a 10.5% system energy saving on average with a small 1.69\% increase in page load time for mobile Firefox web browser. User study results indicate that such a small increase in page load time is hardly perceivable.

65 citations


Cites background from "Mobile multicores: use them or wast..."

  • ...If a thread can finish its task faster, the CPU can go to sleep sooner to save energy [16]....

    [...]

Proceedings ArticleDOI
19 Aug 2015
TL;DR: A geometrical framework for analyzing the energy optimality of resource allocation under performance constraints is presented and it is found that race-to-idle is near optimal on older systems, but can consume as much as 3× more energy than the optimal strategy.
Abstract: The problem of minimizing energy for a performance constraint (e.g., Real-time deadline or quality-of-service requirement) has been widely studied, both in theory and in practice. Theoretical models have indicated large potential energy savings, but practical concerns have made these savings hard to realize. Instead, practitioners often rely on heuristic solutions, which achieve good results in practice but tend to be system-specific in efficacy. An example is the race-to-idle heuristic, which makes all resources available until a task completes and then idles. Theory predicts poor energy savings, but practitioners have reported good empirical results. To help bridge the gap between theory and practice, this paper presents a geometrical framework for analyzing the energy optimality of resource allocation under performance constraints. The geometry of the problem allows us to derive an optimal strategy and three commonly used heuristics: 1) race-to-idle, 2) pace-to-idle a near-optimal idling strategy, and 3) no-idle which never idles. We then implement all strategies and test them empirically for seven benchmarks on four different multicore systems, including both x86 and ARM. We find that race-to-idle is near optimal on older systems, but can consume as much as 3× more energy than the optimal strategy. In contrast, pace-to-idle is never more than 12% worse than optimal.

55 citations


Cites background from "Mobile multicores: use them or wast..."

  • ...We then discuss some practical concerns and empirical studies....

    [...]

  • ...While theoretical models have long demonstrated the potential energy savings of careful resource orchestration, the assumptions required to realize these savings often could not be implemented in practice....

    [...]

Proceedings ArticleDOI
Aaron Carroll1, Gernot Heiser1
15 Apr 2014
TL;DR: This paper proposes a simple policy that integrates core offlining with frequency scaling and implement it in a Linux-based frequency governor called medusa, and shows that it obtains energy savings that are as good or better than governors presently shipping on the studied phones and approaches the static optimal setting.
Abstract: Energy efficiency is a primary design criterion of the modern smartphone due to limitations in battery capacity. Multi-core processors are now commonplace in these devices, which adds a new dimension, the number cores used, to energy management. In this paper we investigate how the mechanisms of frequency scaling and core offlining interact, and how to use them to reduce energy consumption. We find surprising differences in the characteristics of latest-generation smartphones, specifically in the importance of static power. This implies that policies that work well on one processor can lead to poor results on another. We propose a simple policy that integrates core offlining with frequency scaling and implement it in a Linux-based frequency governor called medusa. We show that, despite its simplicity, medusa obtains energy savings that are as good or better than governors presently shipping on the studied phones and approaches the static optimal setting.

42 citations


Cites background from "Mobile multicores: use them or wast..."

  • ...In particular, deep sleep states mean that the race-to-halt approach may be beneficial, as it minimises the accumulation of static energy loss....

    [...]

References
More filters
Proceedings ArticleDOI
27 Feb 2006
TL;DR: This work addresses the problem of dynamically optimizing power consumption of a parallel application that executes on a many-core CMP under a given performance constraint by presenting simple, low-overhead heuristics for dynamic optimization that significantly cut down on the search effort along both dimensions of the optimization space.
Abstract: Previous proposals for power-aware thread-level parallelism on chip multiprocessors (CMPs) mostly focus on multiprogrammed workloads. Nonetheless, parallel computation of a single application is critical in light of the expanding performance demands of important future workloads. This work addresses the problem of dynamically optimizing power consumption of a parallel application that executes on a many-core CMP under a given performance constraint. The optimization space is two-dimensional, allowing changes in the number of active processors and applying dynamic voltage/frequency scaling. We demonstrate that the particular optimum operating point depends nontrivially on the power-performance characteristics of the CMP, the application's behavior, and the particular performance target. We present simple, low-overhead heuristics for dynamic optimization that significantly cut down on the search effort along both dimensions of the optimization space. In our evaluation of several parallel applications with different performance targets, these heuristics quickly lock on a configuration that yields optimal power savings in virtually all cases.

286 citations


"Mobile multicores: use them or wast..." refers background in this paper

  • ...Li and Martinez [LM06] develop a number of heuristics to reduce the optimisation search space and algorithms to search for the optimal operating point....

    [...]

  • ...[LM06] J. Li and J.F. Martinez....

    [...]

  • ...Li and Martinez [LM06] develop a number of heuristics to reduce the optimisation search space and algorithms to search for the optimal operating point....

    [...]

Proceedings ArticleDOI
22 Jun 2002
TL;DR: The critical power slope concept is introduced to explain and capture the power-performance characteristics of systems with power management features, and it is shown that in some cases, it may be energy efficient not to reduce voltage below a certain point.
Abstract: Energy efficiency is becoming an increasingly important feature for both mobile and high-performance server systems. Most processors designed today include power management features that provide processor operating points which can be used in power management algorithms. However, existing power management algorithms implicitly assume that lower performance points are more energy efficient than higher performance points. Our empirical observations indicate that for many systems, this assumption is not valid.We introduce a new concept called critical power slope to explain and capture the power-performance characteristics of systems with power management features. We evaluate three systems - a clock throttled Pentium laptop, a frequency scaled PowerPC platform, and a voltage scaled system to demonstrate the benefits of our approach. Our evaluation is based on empirical measurements of the first two systems, and publicly available data for the third. Using critical power slope, we explain why on the Pentium-based system, it is energy efficient to run only at the highest frequency, while on the PowerPC-based system, it is energy efficient to run at the lowest frequency point. We confirm our results by measuring the behavior of a web serving benchmark. Furthermore, we extend the critical power slope concept to understand the benefits of voltage scaling when combined with frequency scaling. We show that in some cases, it may be energy efficient not to reduce voltage below a certain point.

273 citations

Proceedings ArticleDOI
01 Apr 2009
TL;DR: This work presents Koala, a platform which uses a pre-characterised model at run-time to predict the performance and energy consumption of a piece of software, and an arbitrary policy can then be applied in order to dynamically trade performance andEnergy consumption.
Abstract: Managing the power consumption of computing platforms is a complicated problem thanks to a multitude of hardware configuration options and characteristics. Much of the academic research is based on unrealistic assumptions, and has, therefore, seen little practical uptake. We provide an overview of the difficulties facing power management schemes when used in real systems.We present Koala, a platform which uses a pre-characterised model at run-time to predict the performance and energy consumption of a piece of software. An arbitrary policy can then be applied in order to dynamically trade performance and energy consumption. We have implemented this system in a recent Linux kernel, and evaluated it by running a variety of benchmarks on a number of different platforms. Under some conditions, we observe energy savings of 26% for a 1% performance loss.

210 citations


"Mobile multicores: use them or wast..." refers methods in this paper

  • ...The most promising path seems to be to extend the approach taken by Koala [SLSPH09]: use a parameterised hardware model, characterised offline, which observes the application behaviour and uses performance counters to predict on-line the system’s performance and energy response to changes in operating points....

    [...]

  • ...Indeed, this result is well known from the single-core DVFS literature [SLSPH09]....

    [...]

  • ...The most promis­ing path seems to be to extend the approach taken by Koala [SLSPH09]: use a parameterised hardware model, characterised of.ine, which observes the application be­haviour and uses performance counters to predict on-line the system s performance and energy response to changes in operating points....

    [...]

  • ...Koala: A platform for OS-level power management....

    [...]

Proceedings ArticleDOI
25 Jul 2011
TL;DR: The model is used to implement various DVFS policies as Linux “green” governors to continuously optimize for various power-efficiency metrics such as EDP or ED2P, or achieve energy savings with a user-specified limit on performance loss.
Abstract: We present Continuously Adaptive Dynamic Voltage/Frequency scaling in Linux systems running on Intel i7 and AMD Phenom II processors. By exploiting slack, inherent in memory-bound programs, our approach aims to improve power efficiency even when the processor does not sit idle. Our underlying methodology is based on a simple first-order processor performance model in which frequency scaling is expressed as a change (in cycles) of the main memory latency. Utilizing available monitoring hardware we show that our model is powerful enough to i) predict with reasonable accuracy the effect of frequency scaling (in terms of performance loss) and ii) predict the core energy under different V/f combinations. To validate our approach we perform highly accurate, fine-grained power measurements directly on the off-chip voltage regulators. We use our model to implement various DVFS policies as Linux “green” governors to continuously optimize for various power-efficiency metrics such as EDP or ED2P, or achieve energy savings with a user-specified limit on performance loss. Our evaluation shows that, for SPEC2006 workloads, our governors achieve dynamically the same optimal EDP or ED2P (within 2% on avg.) as an exhaustive search of all possible frequencies. Energy savings can reach up to 56% in memory-bound workloads with corresponding improvements of about 55% for EDP or ED2P.

123 citations


"Mobile multicores: use them or wast..." refers background in this paper

  • ...The scalability requirement can be relaxed by replacing the assumption of workload-independent dynamic power by the approximation that Pdynamic is proportional to instructions per cycle [SKK11], from which it follows that Edynamic is proportional to the number of executed instructions, which is constant for a fixed workload....

    [...]

Proceedings Article
13 Jun 2012
TL;DR: Experimental evaluations use client applications and usage scenarios seen on mobile devices and a unique testbed comprised of heterogeneous cores to highlight the need for uncore-awareness and uncore scalability to maximize intended efficiency gains from heterogeneity cores.
Abstract: Heterogeneous multicore processors (HMPs), consisting of cores with different performance/power characteristics, have been proposed to deliver higher energy efficiency than symmetric multicores. This paper investigates the opportunities and limitations in using HMPs to gain energy-efficiency. Unlike previous work focused on server systems, we focus on the client workloads typically seen in modern end-user devices. Further, beyond considering core power usage, we also consider the 'uncore' subsystem shared by all cores, which in modern platforms, is an increasingly important contributor to total SoC power. Experimental evaluations use client applications and usage scenarios seen on mobile devices and a unique testbed comprised of heterogeneous cores, with results that highlight the need for uncore-awareness and uncore scalability to maximize intended efficiency gains from heterogeneous cores.

49 citations