Low depth cache-oblivious algorithms
Citations
816 citations
143 citations
141 citations
Cites methods from "Low depth cache-oblivious algorithm..."
...Comparison Sort: We use a low-depth cache-efficient sample sort [9]....
[...]
87 citations
Cites methods from "Low depth cache-oblivious algorithm..."
...There have been various techniques proposed to address these algorithmic changes, either using compiler assisted optimization [27], using cache-oblivious algorithms [6] or specialized languages like Sequoia [21]....
[...]
84 citations
Cites background from "Low depth cache-oblivious algorithm..."
...However, all future references .t into cache until reaching a supertask that does not .t in cache, at which point the Problem Span Cache Complexity Q * Scan (pre.x sums, etc.) O(log n) O(ln/Bl) Matrix Transpose (n × m matrix) [20] v v O(log(n + m))v O(lnm/Bl)v Matrix Multiplication ( n × n matrix) [20] v v O( n)v O(ln 1.5/Bl/ M + 1) v Matrix Inversion ( n × n matrix) O( n) O(ln 1.5/Bl/ M + 1) Quicksort [22] O(log2 n) O(ln/Bl(1 + logln/(M + 1)l)) Sample Sort [10] O(log2 n) O(ln/BlllogM+2 nl) Sparse-Matrix Vector Multiply [10] (m nonzeros, nE edge separators) O(log2 n) O(lm/B + n/(M + 1)1-El) Convex Hull (e.g., see [8]) O(log2 n) O(ln/BlllogM+2 nl) Barnes Hut tree (e.g., see [8]) O(log2 n) O(ln/Bl(1 + logln/(M + 1)l)) Table 1: Cache complexities of some algorithms analyzed in the PCO model....
[...]
...Unfortunately, current dynamic parallelism approaches have important limitations: they either apply to hierarchies of only private or only shared caches [1,9,10,16,21], require some strict balance criteria [7,15], or require a joint algorithm/scheduler analysis [7, 13–16]....
[...]
...Sample Sort [10] O(log(2) n) O(dn/BedlogM+2 ne) Sparse-Matrix Vector Multiply [10] O(log(2) n) O(dm/B + n/(M + 1)1− e) (m nonzeros, n edge separators)...
[...]
...A pair of common abstract measures for capturing parallel cache based locality are the number of misses given a sequential ordering of a parallel computation [1, 9, 10, 21], and the depth (span, critical path length) of the computation....
[...]
References
203 citations
"Low depth cache-oblivious algorithm..." refers methods in this paper
...Vishkin’s deterministic coin tossing [32] to find a O(log log n)ruling set and then convert the ruling set to an independent set of size at least n/3 in O(log log n) rounds....
[...]
185 citations
179 citations
"Low depth cache-oblivious algorithm..." refers methods in this paper
...Our parallel sorting algorithm is based on a version of sample sort [37, 45], and has optimal cache complexity....
[...]
175 citations
"Low depth cache-oblivious algorithm..." refers methods in this paper
...Our parallel sorting algorithm is based on a version of sample sort [37, 45], and has optimal cache complexity....
[...]
175 citations