This paper derives simple, yet fundamental formulas to describe the interplay between parallelism of an application, program performance, and energy consumption and derives optimal frequencies allocated to the serial and parallel regions in an application to either minimize the total energy consumption or minimize the energy-delay product.
Abstract:
This paper derives simple, yet fundamental formulas to describe the interplay between parallelism of an application, program performance, and energy consumption. Given the ratio of serial and parallel portions in an application and the number of processors, we derive optimal frequencies allocated to the serial and parallel regions in an application to either minimize the total energy consumption or minimize the energy-delay product. The impact of static power is revealed by considering the ratio between static and dynamic power and quantifying the advantages of adding to the architecture capability to turn off individual processors and save static energy. We further determine the conditions under which one can obtain both energy and speed improvement, as well as the amount of improvement. While the formulas we obtain use simplifying assumptions, they provide valuable theoretical insights into energy-aware processor resource management. Our results form a basis for several interesting research directions in the area of energy-aware multicore processor architectures.
TL;DR: In this article, the relative merits between different approaches in the face of technology constraints are analyzed for U-cores and the predictive power of their model depends upon U-core-specific parameters derived by measuring performance and power of tuned applications on today's state-of-the-art multicores, GPUs, FPGAs, and ASICs.
TL;DR: How the energy-cognizant scheduler's role has been extended beyond simple energy minimization to also include related issues like the avoidance of negative thermal effects as well as addressing asymmetric multicore architectures is explored.
TL;DR: A new FPGA memory architecture called Connected RAM (CoRAM) is proposed to serve as a portable bridge between the distributed computation kernels and the external memory interfaces to improve performance and efficiency and to improve an application's portability and scalability.
TL;DR: Investigation of energy-efficient scheduling of sequential tasks with precedence constraints on multiprocessor computers with dynamically variable voltage and speed makes initial contribution to analytical performance study of heuristic power allocation and scheduling algorithms for precedence constrained sequential tasks.
TL;DR: Which application/platform characteristics are necessary for a successful energy-performance trade-off of large scale parallel applications and how cluster power consumption characteristics together with application sensitivity to frequency scaling determine the energy effectiveness of the DVFS technique is analyzed.
TL;DR: A software energy estimation methodology is presented that avoids explicit characterization of instruction energy consumption and pre-dicts energy consumption to within 3% accuracy for a set of bench-mark programs evaluated on the StrongARM SA-1100 and Hitachi SH-4 microprocessors.
TL;DR: In a 20-core CMP, the combination of variation-aware application scheduling and LinOpt increases the average throughput by 12-17% and reduces the average ED2 by 30-38% - all relative to using variation- aware scheduling together with a simple extension to Intel's Foxton power management algorithm.
TL;DR: This work uses DVS (Dynamic Voltage Scaling) technology now available in high-performance microprocessors to reduce power consumption during parallel application runs when peak CPU performance is not necessary due to load imbalance, communication delays, etc.
TL;DR: This paper describes the implementation of the IBM POWER5TM chip, a two-way simultaneous multithreaded dual-core chip, and systems based on it, and how it allows system scalability to 64 physical processors.
TL;DR: This work addresses the problem of dynamically optimizing power consumption of a parallel application that executes on a many-core CMP under a given performance constraint by presenting simple, low-overhead heuristics for dynamic optimization that significantly cut down on the search effort along both dimensions of the optimization space.
Q1. What are the contributions mentioned in the paper "On the interplay of parallelization, program performance, and energy consumption" ?
This paper derives simple, yet fundamental formulas to describe the interplay between parallelism of an application, program performance, and energy consumption. The authors further determine the conditions under which one can obtain both energy and speed improvement, as well as the amount of improvement. While the formulas the authors obtain use simplifying assumptions, they provide valuable theoretical insights into energy-aware processor resource management.
Q2. What have the authors stated for future works in "On the interplay of parallelization, program performance, and energy consumption" ?
In this paper, the authors developed an analytical framework to study the trade-offs between parallelization, program performance, and energy consumption. The authors considered two machine models ; one assumes that individual processors can not be turned off independently, and the other assumes that they can. When processors can be individually turned off, the analysis indicates that the minimum total energy is independent of the number of processors used for executing the parallel section, while the energy-delay product is minimized when the maximum number of available processors are used during the parallel execution section. The demonstrated substantial power advantage that can be gained from turning off individual processors is a great incentive to designing multicore processors with the capability of turning off individual processors.