Proceedings ArticleDOI

# Brief announcement: efficient cache oblivious algorithms for randomized divide-and-conquer on the multicore model

25 Jun 2012-pp 74-76

TL;DR: In this paper, a cache-oblivious framework for randomized divide and conquer algorithms on the multicore model with private cache is presented, where the number of processors, the size of an individual cache memory and the block size are assumed to be fixed.

AbstractIn this paper we present a cache-oblivious framework for randomized divide and conquer algorithms on the multicore model with private cache. We first derive an O(n/p log n + log n log log n) expected parallel depth algorithm for sorting n numbers with expected O(n/B logM n) cache misses where p,M and B respectively denote the number of processors, the size of an individual cache memory and the block size respectively. Although similar results have been obtained recently for sorting, we feel that our approach is simpler and general and we apply it to obtain an algorithm for 3D convex hulls with similar bounds.We also present a simple randomized processor allocation technique without the explicit knowledge of the number of processors that is likely to find additional applications in resource oblivious environments.

##### References
More filters
Journal ArticleDOI
TL;DR: This paper presents an optimal parallel randomized algorithm for computing intersection of half spaces in three dimensions that is randomized in the sense that they use a total of only polylogarithmic number of random bits and terminate in the claimed time bound with probability of 1 - n - \alpha for any fixed $\alpha > 0$.
Abstract: Further applications of random sampling techniques which have been used for deriving efficient parallel algorithms are presented by J. H. Reif and S. Sen [Proc. 16th International Conference on Parallel Processing, 1987]. This paper presents an optimal parallel randomized algorithm for computing intersection of half spaces in three dimensions. Because of well-known reductions, these methods also yield equally efficient algorithms for fundamental problems like the convex hull in three dimensions, Voronoi diagram of point sites on a plane, and Euclidean minimal spanning tree. The algorithms run in time $T = O(\log n)$ for worst-case inputs and use $P = O(n)$ processors in a CREW PRAM model where n is the input size. They are randomized in the sense that they use a total of only polylogarithmic number of random bits and terminate in the claimed time bound with probability $1 - n^{ - \alpha }$ for any fixed $\alpha > 0$. They are also optimal in $P\cdot T$ product since the sequential time bound for all thes...

48 citations

Book ChapterDOI
06 Jul 2010
TL;DR: A new deterministic sorting algorithm that interleaves the partitioning of a sample sort with merging with an optimal number of cache misses is presented, which improves on previous bounds for deterministic sample sort.
Abstract: We present a new deterministic sorting algorithm that interleaves the partitioning of a sample sort with merging. Sequentially, it sorts n elements in O(n log n) time cache-obliviously with an optimal number of cache misses. The parallel complexity (or critical path length) of the algorithm is O(log n log log n), which improves on previous bounds for deterministic sample sort. Given a multicore computing environment with a global shared memory and p cores, each having a cache of size M organized in blocks of size B, our algorithm can be scheduled effectively on these p cores in a cache-oblivious manner. We improve on the above cache-oblivious processor-aware parallel implementation by using the Priority Work Stealing Scheduler (PWS) that we presented recently in a companion paper [12]. The PWS scheduler is both processor- and cache-oblivious (i.e., resource oblivious), and it tolerates asynchrony among the cores. Using PWS, we obtain a resource oblivious scheduling of our sorting algorithm that matches the performance of the processor-aware version. Our analysis includes the delay incurred by false-sharing. We also establish good bounds for our algorithm with the randomized work stealing scheduler.

38 citations

Posted Content
Neeraj Sharma
TL;DR: A cache-oblivious framework for randomized divide and conquer algorithms on the multicore model with private cache and a simple randomized processor allocation technique without the explicit knowledge of the number of processors that is likely to find additional applications in resource oblivious environments are presented.
Abstract: In this paper we present randomized algorithms for sorting and convex hull that achieves optimal performance (for speed-up and cache misses) on the multicore model with private cache model. Our algorithms are cache oblivious and generalize the randomized divide and conquer strategy given by Reischuk and Reif and Sen. Although the approach yielded optimal speed-up in the PRAM model, we require additional techniques to optimize cache-misses in an oblivious setting. Under a mild assumption on input and number of processors our algorithm will have optimal time and cache misses with high probability. Although similar results have been obtained recently for sorting, we feel that our approach is simpler and general and we apply it to obtain an optimal parallel algorithm for 3D convex hulls with similar bounds. We also present a simple randomized processor allocation technique without the explicit knowledge of the number of processors that is likely to find additional applications in resource oblivious environments.

1 citations

##### Related Papers (5)
Matteo Frigo
09 Jun 2003
Michael A. Bender