Low-Span Parallel Algorithms for the Binary-Forking Model

doi:10.1145/3409964.3461802

Proceedings ArticleDOI

Low-Span Parallel Algorithms for the Binary-Forking Model

- pp 22-34

TLDR

In this paper, a randomized comparison-based sorting algorithm with optimal O(log n) span and O(n log n) work was proposed for the binary-forking model.

Abstract:

The binary-forking model is a parallel computation model, formally defined by Blelloch et al., in which a thread can fork a concurrent child thread, recursively and asynchronously. The model incurs a cost of Θ(log n) to spawn or synchronize n tasks or threads. The binary-forking model realistically captures the performance of parallel algorithms implemented using modern multithreaded programming languages on multicore shared-memory machines. In contrast, the widely studied theoretical PRAM model does not consider the cost of spawning and synchronizing threads, and as a result, algorithms achieving optimal performance bounds in the PRAM model may not be optimal in the binary-forking model. Often, algorithms need to be redesigned to achieve optimal performance bounds in the binary-forking model and the non-constant synchronization cost makes the task challenging. In this paper, we show that in the binary-forking model we can achieve optimal or near-optimal span with negligible or no asymptotic blowup in work for comparison-based sorting, Strassen's matrix multiplication (MM), and the Fast Fourier Transform (FFT). Our major results are as follows: (1) A randomized comparison-based sorting algorithm with optimal O(log n) span and O(nlog n) work, both w.h.p. in n. (2) An optimal O(log n) span algorithm for Strassen's matrix multiplication (MM) with only a loglog n -factor blow-up in work as well as a near-optimal O(log n loglog log n) span algorithm with no asymptotic blow-up in work. (3) A near-optimal O(log n logloglog n) span Fast Fourier Transform (FFT) algorithm with less than a log n-factor blow-up in work for all practical values of n (i.e., n le 10 ^10,000 ).

Low-Span Parallel Algorithms for the Binary-Forking Model

Citations

Automatic HBM Management: Models and Algorithms

High-Performance and Flexible Parallel Algorithms for Semisort and Related Problems

Optimal Parallel Sorting with Comparison Errors

A Work-Efficient Parallel Algorithm for Longest Increasing Subsequence

References

An algorithm for the machine calculation of complex Fourier series

Gaussian elimination is not optimal

The implementation of the Cilk-5 multithreaded language

Parallel merge sort

On computing the discrete Fourier transform

Related Papers (5)

On parallel hashing and integer sorting

On parallel prefix computation

Connected Components on a PRAM in Log Diameter Time

Efficient PRAM simulation on a distributed memory machine

The Power of Parallel Prefix.