Low depth cache-oblivious algorithms
Citations
816 citations
143 citations
141 citations
Cites methods from "Low depth cache-oblivious algorithm..."
...Comparison Sort: We use a low-depth cache-efficient sample sort [9]....
[...]
87 citations
Cites methods from "Low depth cache-oblivious algorithm..."
...There have been various techniques proposed to address these algorithmic changes, either using compiler assisted optimization [27], using cache-oblivious algorithms [6] or specialized languages like Sequoia [21]....
[...]
84 citations
Cites background from "Low depth cache-oblivious algorithm..."
...However, all future references .t into cache until reaching a supertask that does not .t in cache, at which point the Problem Span Cache Complexity Q * Scan (pre.x sums, etc.) O(log n) O(ln/Bl) Matrix Transpose (n × m matrix) [20] v v O(log(n + m))v O(lnm/Bl)v Matrix Multiplication ( n × n matrix) [20] v v O( n)v O(ln 1.5/Bl/ M + 1) v Matrix Inversion ( n × n matrix) O( n) O(ln 1.5/Bl/ M + 1) Quicksort [22] O(log2 n) O(ln/Bl(1 + logln/(M + 1)l)) Sample Sort [10] O(log2 n) O(ln/BlllogM+2 nl) Sparse-Matrix Vector Multiply [10] (m nonzeros, nE edge separators) O(log2 n) O(lm/B + n/(M + 1)1-El) Convex Hull (e.g., see [8]) O(log2 n) O(ln/BlllogM+2 nl) Barnes Hut tree (e.g., see [8]) O(log2 n) O(ln/Bl(1 + logln/(M + 1)l)) Table 1: Cache complexities of some algorithms analyzed in the PCO model....
[...]
...Unfortunately, current dynamic parallelism approaches have important limitations: they either apply to hierarchies of only private or only shared caches [1,9,10,16,21], require some strict balance criteria [7,15], or require a joint algorithm/scheduler analysis [7, 13–16]....
[...]
...Sample Sort [10] O(log(2) n) O(dn/BedlogM+2 ne) Sparse-Matrix Vector Multiply [10] O(log(2) n) O(dm/B + n/(M + 1)1− e) (m nonzeros, n edge separators)...
[...]
...A pair of common abstract measures for capturing parallel cache based locality are the number of misses given a sequential ordering of a parallel computation [1, 9, 10, 21], and the depth (span, critical path length) of the computation....
[...]
References
1,469 citations
"Low depth cache-oblivious algorithm..." refers background in this paper
...The key factor in obtaining these low parallel cache complexities is the low depth of the algorithms we propose....
[...]
1,312 citations
1,264 citations
"Low depth cache-oblivious algorithm..." refers background in this paper
...G = (V, E) in S with n vertices can be partitioned into three sets of vertices Va, Vs, Vb such that |Vs| ≤ βf(n), |Va|, |Vb| ≤ αn, and {(u, v) ∈ E|(u ∈ Va ∧ v ∈ Vb) ∨ (u ∈ Vb ∧ v ∈ Va)} = ∅ [43]....
[...]
...For planar graphs this can be done in linear time [43]....
[...]
1,202 citations
789 citations