Low depth cache-oblivious algorithms
Citations
2 citations
Cites background or methods from "Low depth cache-oblivious algorithm..."
...Similarly, when scheduled on a PMSH using a PDF scheduler, the cache at each level i incurs fewer than Q(p(Mi−Bid), Bi), and the computation takes at most W ′/p+d′ steps, where d′ and W ′ are, respectively, the depth and work of the computation including the latencies of data misses [Blelloch et al., 2010]....
[...]
...show that for a nested-parallel computation of depth d scheduled with a work-stealing scheduler the cache complexity at each level of the hierarchy satisfies Qp(n,Mi, Bi) ≤ Q(n,Mi, Bi) +O(pMid/Bi) with probability 1− δ, where Mi and Bi are the cache and line sizes at level i, respectively, Q(n,Mi, Bi) is the sequential cache complexity of the computation at each level, and δ is an arbitrarily small positive constant [Blelloch et al., 2010]....
[...]
...which each processor has a private multi-level cache hierarchy, as well a Parallel Multi-level Shared Hierarchy (PMSH), where all processors share a multi-level cache hierarchy [Blelloch et al., 2010]....
[...]
...While there exist many sequential cache-oblivious algorithms with good cache complexity, several of these do not show natural parallelizations with low depth [Blelloch et al., 2010]....
[...]
2 citations
1 citations
1 citations
1 citations
Cites background from "Low depth cache-oblivious algorithm..."
...Particularly, nested parallel algorithms for which the natural sequential execution has low cache complexity will also attain good cache complexity on parallel machines with private or shared caches [4]....
[...]
...The implications for parallel performance can be captured using the results from [4], which reveal that nested parallel algorithms for which the natural sequential execution has low cache complexity will also attain good cache complexity on parallel machines with private or shared caches....
[...]
References
3,885 citations
Additional excerpts
...7] and distributed memory machines [48, 33, 12]....
[...]
2,378 citations
"Low depth cache-oblivious algorithm..." refers background in this paper
...It follows from [47] that the number of cache misses at each level under the multi-level LRU policy is within a factor of two of the number of misses for a cache half the size running the optimal replacement policy....
[...]
1,688 citations
"Low depth cache-oblivious algorithm..." refers background in this paper
...A common form of programming in this model is based on nested parallelism—consisting of nested parallel loops and/or fork-join constructs [13, 26, 20, 35, 44]....
[...]
1,577 citations
Additional excerpts
...A basic strategy for list ranking [40] is the following: (i) shrink the list to size O(n/ log n), and (ii) apply pointer jumping on this shorter list....
[...]
1,515 citations