This paper derives simple, yet fundamental formulas to describe the interplay between parallelism of an application, program performance, and energy consumption and derives optimal frequencies allocated to the serial and parallel regions in an application to either minimize the total energy consumption or minimize the energy-delay product.
Abstract:
This paper derives simple, yet fundamental formulas to describe the interplay between parallelism of an application, program performance, and energy consumption. Given the ratio of serial and parallel portions in an application and the number of processors, we derive optimal frequencies allocated to the serial and parallel regions in an application to either minimize the total energy consumption or minimize the energy-delay product. The impact of static power is revealed by considering the ratio between static and dynamic power and quantifying the advantages of adding to the architecture capability to turn off individual processors and save static energy. We further determine the conditions under which one can obtain both energy and speed improvement, as well as the amount of improvement. While the formulas we obtain use simplifying assumptions, they provide valuable theoretical insights into energy-aware processor resource management. Our results form a basis for several interesting research directions in the area of energy-aware multicore processor architectures.
TL;DR: Analytical models based on an energy consumption metric show clearly that greater parallelism is the most important factor affecting energy saving.
TL;DR: An accurate energy model for many-core systems which includes switching latency of modern power saving techniques is presented and it is demonstrated that the model accurately forecasts the behavior on an ARM multicore platform, and is not significantly influenced by variances in common type workloads.
TL;DR: Compared to existing methods, these algorithms can reduce the energy consumption by up to 54% for the considered multimedia workloads, and the evaluation shows that these algorithms are near optimal even with inaccurate predictions.
TL;DR: This work presents an application-aware, multi-dimensional power allocation framework to support power-bounded parallel computing on NUMA-enabled multicore systems and implements a hierarchical power coordination method that leverages applications' performance and power scalability to efficiently identify an ideal power distribution.
TL;DR: A new model and evaluation show that from a software development perspective, Turbo Boost aggravates the speedup limitations obtained under Amdahl’s law by making parallelization of sequential codes less profitable.
TL;DR: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today.
TL;DR: In this paper, the authors argue that the organization of a single computer has reached its limits and that truly significant advances can be made only by interconnection of a multiplicity of computers in such a manner as to permit cooperative solution.
TL;DR: The parallel landscape is frame with seven questions, and the following are recommended to explore the design space rapidly: • The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems • The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS each development dollar.
TL;DR: This paper proposes a simple model of job scheduling aimed at capturing some key aspects of energy minimization, and gives an off-line algorithm that computes, for any set of jobs, a minimum-energy schedule.
Q1. What are the contributions mentioned in the paper "On the interplay of parallelization, program performance, and energy consumption" ?
This paper derives simple, yet fundamental formulas to describe the interplay between parallelism of an application, program performance, and energy consumption. The authors further determine the conditions under which one can obtain both energy and speed improvement, as well as the amount of improvement. While the formulas the authors obtain use simplifying assumptions, they provide valuable theoretical insights into energy-aware processor resource management.
Q2. What have the authors stated for future works in "On the interplay of parallelization, program performance, and energy consumption" ?
In this paper, the authors developed an analytical framework to study the trade-offs between parallelization, program performance, and energy consumption. The authors considered two machine models ; one assumes that individual processors can not be turned off independently, and the other assumes that they can. When processors can be individually turned off, the analysis indicates that the minimum total energy is independent of the number of processors used for executing the parallel section, while the energy-delay product is minimized when the maximum number of available processors are used during the parallel execution section. The demonstrated substantial power advantage that can be gained from turning off individual processors is a great incentive to designing multicore processors with the capability of turning off individual processors.