Using early phase termination to eliminate load imbalances at barrier synchronization points

doi:10.1145/1297027.1297055

Proceedings ArticleDOI

Using early phase termination to eliminate load imbalances at barrier synchronization points

Martin Rinard

- Vol. 42, Iss: 10, pp 369-386

Chats0

TLDR

A general computational pattern that works well with early phase termination is identified and it is explained why computations that exhibit this pattern can tolerate the early termination of parallel tasks without producing unacceptable results.

Abstract:

We present a new technique, early phase termination, for eliminating idle processors in parallel computations that use barrier synchronization. This technique simply terminates each parallel phaseas soon as there are too few remaining tasks to keep all of the processors busy. Although this technique completely eliminates the idling that would other wise occur at barrier synchronization points, it may also change the computation and therefore the result that the computation produces. We address this issue by providing probabilistic distortion models that characterize how the use of early phase termination distorts the result that the computation produces. Our experimental results show that for our set of benchmark applications, 1) early phase termination can improve the performance of the parallel computation, 2) the distortion is small (or can be made to be small with the use of an appropriate compensation technique) and 3) the distortion models provide accurate and tight distortion bounds. These bounds can enable users to evaluate the effect of early phase termination and confidently accept results from parallel computations that use this technique if they find the distortion bounds to be acceptable. Finally, we identify a general computational pattern that works well with early phase termination and explain why computations that exhibit this pattern can tolerate the early termination of parallel tasks without producing unacceptable results.

Using early phase termination to eliminate load imbalances at barrier synchronization points

Citations

Managing performance vs. accuracy trade-offs with loop perforation

Green: a framework for supporting energy-conscious programming using controlled approximation

Dynamic knobs for responsive power-aware computing

SAGE: self-tuning approximation for graphics engines

Quality of service profiling

References

MapReduce: simplified data processing on large clusters

MapReduce: simplified data processing on large clusters

A hierarchical O(N log N) force-calculation algorithm

Transactional memory: architectural support for lock-free data structures

A Hierarchical O(N) Force Calculation Algorithm

Related Papers (5)

Managing performance vs. accuracy trade-offs with loop perforation

Green: a framework for supporting energy-conscious programming using controlled approximation

PetaBricks: a language and compiler for algorithmic choice

EnerJ: approximate data types for safe and general low-power computation

Dynamic knobs for responsive power-aware computing