scispace - formally typeset
Journal ArticleDOI

Parallel Quicksort using fetch-and-add

Philip Heidelberger, +2 more
- 01 Jan 1990 - 
- Vol. 39, Iss: 1, pp 133-138
Reads0
Chats0
TLDR
A parallelization of the Quicksort algorithm that is suitable for execution on a shared memory multiprocessor with an efficient implementation of the fetch-and-add operation is presented.
Abstract
A parallelization of the Quicksort algorithm that is suitable for execution on a shared memory multiprocessor with an efficient implementation of the fetch-and-add operation is presented. The partitioning phase of Quicksort, which has been considered a serial bottleneck, is cooperatively executed in parallel by many processors through the use of fetch-and-add. The parallel algorithm maintains the in-place nature of Quicksort, thereby allowing internal sorting of large arrays. A class of fetch-and-add-based algorithms for dynamically scheduling processors to subproblems is presented. Adaptive scheduling algorithms in this class have low overhead and achieve effective processor load balancing. The basic algorithm is shown to execute in an average of O(log(N)) time on an N-processor PRAM (parallel random-access machine) assuming a constant-time fetch-and-add. Estimated speedups, based on simulations, are also presented for cases when the number of items to be sorted is much greater than the number of processors. >

read more

Citations
More filters
Proceedings ArticleDOI

TurKit: tools for iterative tasks on mechanical Turk

TL;DR: Part of the proposal is a toolkit called TurKit which facilitates deployment of iterative tasks on MTurk, an alternative iterative paradigm, in which workers build on or evaluate each other's work.
Journal ArticleDOI

GPU-Quicksort: A practical Quicksort algorithm for graphics processors

TL;DR: GPU-Quicksort, an efficient Quicksort algorithm suitable for highly parallel multicore graphics processors, is described and shown that in CUDA, NVIDIA's programing platform for general-purpose computations on graphical processors, it performs better than the fastest-known sorting implementations for graphics processors.
Proceedings ArticleDOI

A highly-efficient wait-free universal construction

TL;DR: A new simple wait-free universal construction, called Sim, that uses just a Fetch&Add and an LL/SC object and performs a constant number of shared memory accesses and is implemented in a real shared-memory machine.
Book ChapterDOI

A Practical Quicksort Algorithm for Graphics Processors

TL;DR: GPU-Quicksort is presented, an efficient Quicksort algorithm suitable for highly parallel multi-core graphics processors that often performs better than the fastest known sorting implementations for graphics processors, such as radix and bitonic sort.
Proceedings ArticleDOI

A simple, fast parallel implementation of Quicksort and its performance evaluation on SUN Enterprise 10000

TL;DR: This work has implemented sample sort and a parallel version of Quicksort on a cache-coherent shared address space multiprocessor: the SUN ENTERPRISE 10000, and shows that parallel quicksort outperforms sample sort.
References
More filters
Journal ArticleDOI

Data parallel algorithms

TL;DR: The success of data parallel algorithms—even on problems that at first glance seem inherently serial—suggests that this style of programming has much wider applicability than was previously thought.
Journal ArticleDOI

Parallel merge sort

TL;DR: A parallel implementation of merge sort on a CREW PRAM that uses n processors and O(logn) time; the constant in the running time is small.
Journal ArticleDOI

The NYU Ultracomputer—Designing an MIMD Shared Memory Parallel Computer

TL;DR: The design for the NYU Ultracomputer is presented, a shared-memory MIMD parallel machine composed of thousands of autonomous processing elements that uses an enhanced message switching network with the geometry of an Omega-network to approximate the ideal behavior of Schwartz's paracomputers model of computation.
Journal ArticleDOI

“Hot spot” contention and combining in multistage interconnection networks

TL;DR: The technique of message combining was found to be an effective means of eliminating this problem if it arises due to lock or synchronization contention, severely degrading all memory access, not just access to shared lock locations, due to an effect the authors call tree saturation.