This paper derives simple, yet fundamental formulas to describe the interplay between parallelism of an application, program performance, and energy consumption and derives optimal frequencies allocated to the serial and parallel regions in an application to either minimize the total energy consumption or minimize the energy-delay product.
Abstract:
This paper derives simple, yet fundamental formulas to describe the interplay between parallelism of an application, program performance, and energy consumption. Given the ratio of serial and parallel portions in an application and the number of processors, we derive optimal frequencies allocated to the serial and parallel regions in an application to either minimize the total energy consumption or minimize the energy-delay product. The impact of static power is revealed by considering the ratio between static and dynamic power and quantifying the advantages of adding to the architecture capability to turn off individual processors and save static energy. We further determine the conditions under which one can obtain both energy and speed improvement, as well as the amount of improvement. While the formulas we obtain use simplifying assumptions, they provide valuable theoretical insights into energy-aware processor resource management. Our results form a basis for several interesting research directions in the area of energy-aware multicore processor architectures.
TL;DR: In this article, the relative merits between different approaches in the face of technology constraints are analyzed for U-cores and the predictive power of their model depends upon U-core-specific parameters derived by measuring performance and power of tuned applications on today's state-of-the-art multicores, GPUs, FPGAs, and ASICs.
TL;DR: How the energy-cognizant scheduler's role has been extended beyond simple energy minimization to also include related issues like the avoidance of negative thermal effects as well as addressing asymmetric multicore architectures is explored.
TL;DR: A new FPGA memory architecture called Connected RAM (CoRAM) is proposed to serve as a portable bridge between the distributed computation kernels and the external memory interfaces to improve performance and efficiency and to improve an application's portability and scalability.
TL;DR: Investigation of energy-efficient scheduling of sequential tasks with precedence constraints on multiprocessor computers with dynamically variable voltage and speed makes initial contribution to analytical performance study of heuristic power allocation and scheduling algorithms for precedence constrained sequential tasks.
TL;DR: Which application/platform characteristics are necessary for a successful energy-performance trade-off of large scale parallel applications and how cluster power consumption characteristics together with application sensitivity to frequency scaling determine the energy effectiveness of the DVFS technique is analyzed.
TL;DR: It is shown that dynamic energy improvement due to parallelization has a function rising faster with the increasing number of processors than the speed improvement function given by the well-known Amdahl's Law.
TL;DR: This paper contributes novel techniques for tight and flexible static timing analysis, particularly well-suited for dynamic scheduling schemes, and proposes a parametric approach toward bounding the WCET statically with respect to the frequency.
TL;DR: An analytical model that puts together parallel efficiency, granularity of parallelism, and voltage/frequency scaling, to establish a formal connection with the power consumption and performance of a parallel code running on a CMP is developed and experiments show the effect of a limited power budget on the application's scalability curve.
TL;DR: The noise measurements and simulation both show that the shorted core power grid design has less noise and a higher maximum frequency than the split core power supply design.
TL;DR: A new model accounts for parallel overhead and predicts the power-aware performance and energy-delay products for various system configurations (i.e. processor counts and frequencies) on NAS parallel benchmark codes.
Q1. What are the contributions mentioned in the paper "On the interplay of parallelization, program performance, and energy consumption" ?
This paper derives simple, yet fundamental formulas to describe the interplay between parallelism of an application, program performance, and energy consumption. The authors further determine the conditions under which one can obtain both energy and speed improvement, as well as the amount of improvement. While the formulas the authors obtain use simplifying assumptions, they provide valuable theoretical insights into energy-aware processor resource management.
Q2. What have the authors stated for future works in "On the interplay of parallelization, program performance, and energy consumption" ?
In this paper, the authors developed an analytical framework to study the trade-offs between parallelization, program performance, and energy consumption. The authors considered two machine models ; one assumes that individual processors can not be turned off independently, and the other assumes that they can. When processors can be individually turned off, the analysis indicates that the minimum total energy is independent of the number of processors used for executing the parallel section, while the energy-delay product is minimized when the maximum number of available processors are used during the parallel execution section. The demonstrated substantial power advantage that can be gained from turning off individual processors is a great incentive to designing multicore processors with the capability of turning off individual processors.