# On the Interplay of Parallelization, Program Performance, and Energy Consumption

TL;DR: This paper derives simple, yet fundamental formulas to describe the interplay between parallelism of an application, program performance, and energy consumption and derives optimal frequencies allocated to the serial and parallel regions in an application to either minimize the total energy consumption or minimize the energy-delay product.

Abstract: This paper derives simple, yet fundamental formulas to describe the interplay between parallelism of an application, program performance, and energy consumption. Given the ratio of serial and parallel portions in an application and the number of processors, we derive optimal frequencies allocated to the serial and parallel regions in an application to either minimize the total energy consumption or minimize the energy-delay product. The impact of static power is revealed by considering the ratio between static and dynamic power and quantifying the advantages of adding to the architecture capability to turn off individual processors and save static energy. We further determine the conditions under which one can obtain both energy and speed improvement, as well as the amount of improvement. While the formulas we obtain use simplifying assumptions, they provide valuable theoretical insights into energy-aware processor resource management. Our results form a basis for several interesting research directions in the area of energy-aware multicore processor architectures.

## Summary (4 min read)

### 1 INTRODUCTION

- A surge of attention is being paid to parallel processingwith the recent emergence of commodity multicore processors.
- While the increased amount of on-chip computing resources promises higher performance through parallel execution of applications, suppressing the power and energy consumption remains an even more stringent constraint to the design and management of such processors [5], [26].
- Under this condition, the ratio between the processor speed of the serial and parallel section to minimize energy is N1= .

### 2.1 Problem Formulation and Assumptions

- Therefore, the maximum clock frequency, Fmax, has a relative speed of 1 and the program has the serial portion whose amount of work is represented with s, and the parallel portion with p (or 1 s).
- To be general, the authors also assume that the power consumption of each processor consists of two components, a frequency-dependent component that can be controlled by changing the frequency of the processor (DVFS), and a frequency-independent component that is not controlled by DVFS.
- The authors call these two components as “dynamic” and “static,” respectively.
- Specifically, for systems with relatively large , the frequency-independent component of power dominates the total energy consumption, and thus, applying DVFS techniques will not produce any appreciable energy savings.
- For clear presentation and intuitive discussion, the authors will not consider the impact of the constant-speed operations on program execution time (and hence, energy consumption) in the following three sections.

### 2.2 Two Machine Models

- In the problem formulation in (5), the authors assumed that processors consume static energy in both the serial and the parallel regions.
- Naturally, one would regard an idle processor as an opportunity to save energy consumption if processors can be turned off when not busy, given a mechanism to turn off individual processors.
- Hence, the authors will study in this work two machine models: one without and one with the capability to turn off individual processors.
- Throughout this paper, the authors refer to these two machine models MA and MB.
- Given the same processor speed setting, the dynamic energy consumption of MB is the same as that of MA: sum of the first two terms in (5).

### 3.1 The Case of x ¼ 1

- While one can further reduce the dynamic energy by slowing down processors, considering the case of x ¼ 1 provides us with interesting insights as well as a basis for later discussions.
- The curves are also higher than those given by Amdahl’s law (not shown).
- Note that the optimal solution obtained for f s and f p is feasible since both the frequencies are smaller than the maximum frequency Fmax ¼ 1.

### 3.2 The Case of Unrestricted x

- Amdahl’s law explores the effect of parallelization on speedup, and the authors have described the effect of parallelization on energy consumption when the program execution time is unchanged, i.e., x ¼ 1.
- The authors relax the program execution time constraint and revisit the same problem of minimizing the total energy consumption.
- In other words, the total energy consumption is minimized when the dynamic energy consumption is 1=ð 1Þ times the static energy.
- In this figure, the values of are divided into three regions.

### 3.3 Optimal Energy Consumption Given a Speedup

- The authors have thus far considered the problem of calculating the optimal speeds of processors (hence, program speedup) to minimize the total energy consumption given p, , and N .
- Moreover, the largest energy improvement ratio occurs at a smaller program speedup.
- For machine modelMA considered in the previous section, there was no reason for using fewer than the available N processors during parallel execution.
- While n assumes discrete values, the authors will treat it as a continuous function when they derive the conditions for optimal energy consumption.

### 4.1 Conditions for Optimal Energy Consumption

- That is, the energy-optimal time allocated to the serial section (thus the speed of the single processor used) does not depend on the number of processors, n.
- The above equations illustrate that the optimal energy is obtained when f s ¼ f p , i.e., all the processors are given the same speed during program execution regardless of the serial or the parallel section.
- This condition is greatly relaxed compared with that ofMA, where the condition was < ð 1Þ=N .
- Fig. 8 depicts the minimum energy consumption of machine model MB at different values.

### 4.2 Optimal Energy Consumption Given a Speedup

- The authors consider the problem of obtaining the minimum total energy consumption when a desired speedup x (equivalently, program execution time y) is specified.
- Fig. 9 shows how the minimum energy consumption changes as the authors target a different program speedup, x, along with the contributions of the dynamic and static energy consumption.
- At the maximum speedup dictated by Amdahl’s law, when fs ¼ fp ¼ Fmax ¼ 1, the dynamic energy consumption reaches 1 (i.e., same as the dynamic energy consumption of sequential execution) and the total energy reaches 1þ (i.e., same as the total energy consumption of sequential execution).
- Fig. 10 further shows how the improvement ratio of the minimum energy changes with the different value of Fig.
- Running a program having a serial portion s with (a) two processors and (b) one processor.

### 4.3 Energy Advantage of Turning

- Off Idle Processors Fig. 11 compares the two machine models, MA and MB, with respect to the minimum energy consumption at different program speedups.
- It is shown that the energy consumption ofMB is strictly lower than that ofMA at any desired speedup.
- At the maximum program speedup, MB has the same energy consumption as sequential execution, regardless of the number of processors used.
- Finally, the authors consider the energy consumption of the two machine models at the maximum program speedup given by Amdahl’s law, 1=ðsþ p=NÞ.
- That is, as expected, the advantage of turning off processors increases when the static power is larger.

### 5 MINIMIZING ENERGY-DELAY PRODUCT

- In the previous sections, the authors have considered the problem of obtaining the minimum energy given the two machine models MA and MB.
- In many systems, however, it is desirable to strike a trade-off between energy consumption and performance by minimizing the energy-delay product rather than the total energy.
- Processors Cannot be Turned Off Individually, also known as 1 MA.
- This similarity also appears in the calculation of energy.
- Numeric techniques are needed to obtain a solution.

### 6 EFFECT OF CONSTANT-SPEED OPERATIONS

- In their problem formulation in Section 2 and derivations in the previous three sections, the authors assumed that processor speed (processor’s clock frequency) solely determines the runtime of a program region.
- The authors will discuss how constant-speed operations, such as memory access and I/O processing, affect program execution time and, hence, their derivations.
- Let us assume that the constant-speed operations (e.g., memory access) are distributed uniformly across the work and their aggregated amount is m (m < 1).
- If the authors assume that memory accesses from multiple processor cores can be overlapped, then the total amount of time spent on the constant-speed operations (tm) is: m ðsþ p=NÞ.
- Given that translating a problem formulated using Fig. 13 into a problem based on their original problem formulation (Fig. 2) is straightforward, the authors do not pursue in this paper the task of deriving new formulas that incorporate the impact of constant-speed operations.

### 8 CONCLUSIONS

- The authors developed an analytical framework to study the trade-offs between parallelization, program performance, and energy consumption.
- The main simplification inherited from Amdahl’s law is that the parallel section of an application is fully parallelizable.
- Both the minimum energy and the minimum energy-delay are obtained when the speed of the serial section fs is N ð1= Þ, the speed in the parallel section, fp.
- It also provides for a simple way to determine the effect of the static/ dynamic power ratio on the aforementioned trade-offs.
- When processors can be individually turned off, the analysis indicates that the minimum total energy is independent of the number of processors used for executing the parallel section, while the energy-delay product is minimized when the maximum number of available processors are used during the parallel execution section.

Did you find this useful? Give us your feedback

...read more

##### Citations

242 citations

151 citations

150 citations

### Cites background from "On the Interplay of Parallelization..."

...Cho and Melhem [26], [27] derive analytical models to study the potential of DPM and DVFS to reduce energy consumption for parallelizable applications on multicore systems....

[...]

74 citations

74 citations

##### References

11,485 citations

### "On the Interplay of Parallelization..." refers background in this paper

...Amdahl’s law [3], being of a very simple form, has inspired much work in the domain of computer architecture and parallel processing [1], [12]....

[...]

^{1}

3,559 citations

2,228 citations

### "On the Interplay of Parallelization..." refers background in this paper

...The impact of static power is revealed by considering the ratio between static and dynamic power and quantifying the advantages of adding to the architecture capability to turn off individual processors and save static energy....

[...]

^{1}

1,497 citations

### "On the Interplay of Parallelization..." refers background in this paper

...Energy saving techniques that utilize available timing slack with DVFS have been extensively studied, especially in the domain of real-time task scheduling [18], [25], [29], [35], [36]....

[...]

...Ç...

[...]

1,182 citations

### "On the Interplay of Parallelization..." refers background or methods in this paper

...Using Amdahl’s law, recently, Hill and Marty [ 13 ] looked into the trade-off between processor core types and sizes in a multicore processor, parallelism in applications, and processor performance (i.e., large, high-performance, power-hungry cores versus small, low-performance, low-power cores)....

[...]

...Woo and Lee [34] extended [ 13 ] to consider energy efficiency of processor architectures using different core types and confirmed that a heterogeneous architecture with a full-blown core along with many small, power-efficient cores is a viable alternative to homogeneous many-core architectures....

[...]

##### Related Papers (5)

[...]

##### Frequently Asked Questions (2)

###### Q2. What have the authors stated for future works in "On the interplay of parallelization, program performance, and energy consumption" ?

In this paper, the authors developed an analytical framework to study the trade-offs between parallelization, program performance, and energy consumption. The authors considered two machine models ; one assumes that individual processors can not be turned off independently, and the other assumes that they can. When processors can be individually turned off, the analysis indicates that the minimum total energy is independent of the number of processors used for executing the parallel section, while the energy-delay product is minimized when the maximum number of available processors are used during the parallel execution section. The demonstrated substantial power advantage that can be gained from turning off individual processors is a great incentive to designing multicore processors with the capability of turning off individual processors.