An introduction to parallel algorithms
Citations
49 citations
48 citations
Cites background from "An introduction to parallel algorit..."
...Multiple updates to a label are resolved as in the parallel random access machine models (such as by using a reduction operation to combine the updates into a single update).(13) Each round thus consists of updating the graph based on values from the previous round, performing activities, and sending values to the next round....
[...]
48 citations
48 citations
Cites methods from "An introduction to parallel algorit..."
...– Step 4 requiresO(log n) time using a parallel maximum-finding algorithm [10]: For1 ≤ i ≤ k in parallel compute the maximum element of ∆i usingm/ log n processors....
[...]
...– Step 5 needs no more than O(log n) time using a prefix summation algorithm [10]: For1 ≤ i ≤ k in parallel compute the prefix sums and suffix sums ofRi usingn processors; – Step 6 can be done in O( √ k log k) = O(k) time: Thewhile-loop executes at most √ 2k times as shown in the analysis of the sequential algorithm....
[...]
...– Step 2 can be completed in O(n) time: For every edge′ = (u, v) ∈ E− E(MSTG) in parallel (processor P i) assigne′ to everye ∈ E(MSTG) on pathP (u, v) (at cellR(e)[i]) by Euler tour technique [10] in O(log n) time as|P (u, v)| ≤ n − 1 (m ≤ n(n − 1)/2 processors in total)....
[...]
47 citations
Cites background or methods from "An introduction to parallel algorit..."
...IMD architectures and a CPU-based implementation for larger datasets, where the GPU memory does not suffice. The running time of our method is O(n + k3) using nk cores in the parallel computation model [16], where n is the number of nodes and k is the number of communities. Since k ≪n, we have a linear running time in the number of nodes, which makes it scalable in the context of extremely large network...
[...]
...Lemma 6 (JáJá, 1992) Addition of s numbers in serial takes O(s) time; with Ω(s/ log s) cores, this can be improved to O(log s) time in the best case....
[...]
...Lemma 7 (JáJá, 1992) Consider M ∈ Rp×q and N ∈ Rq×r with s non-zeros per row/column....
[...]
... and the STGD column denotes the per-iteration complexity. in Table 2. The theoretical asymptotic complexity (Table 3) of our method is best addressed by considering the parallel model of computation [16]. This is justified considering that we implement our method on GPUs and matrix products are embarrassingly parallel. Memory Issues: The main bottleneck for our implementation is device storage, since ...
[...]
...The theoretical asymptotic complexity of our method is summarized in Table 2 and is best addressed by considering the parallel model of computation (JáJá, 1992), i.e., wherein a number of processors or compute cores are operating on the data simultaneously in parallel....
[...]
References
2,895 citations
"An introduction to parallel algorit..." refers background in this paper
...Multiprocessorbased computers have been around for decades and various types of computer architectures [2] have been implemented in hardware throughout the years with different types of advantages/performance gains depending on the application....
[...]
...Every location in the array represents a node of the tree: T [1] is the root, with children at T [2] and T [3]....
[...]
...The text by [2] is a good start as it contains a comprehensive description of algorithms and different architecture topologies for the network model (tree, hypercube, mesh, and butterfly)....
[...]
1,410 citations
"An introduction to parallel algorit..." refers background in this paper
...Parallel architectures have been described in several books (see, for example, [18, 29])....
[...]
1,000 citations
"An introduction to parallel algorit..." refers background in this paper
...Recent work on the mapping of PRAM algorithms on bounded-degree networks is described in [3,13,14, 20, 25], Our presentation on the communication complexity of the matrix-multiplication problem in the sharedmemory model is taken from [1], Data-parallel algorithms are described in [15]....
[...]
951 citations
"An introduction to parallel algorit..." refers background in this paper
...Rigorous descriptions of shared-memory models were introduced later in [11,12]....
[...]
864 citations
"An introduction to parallel algorit..." refers methods in this paper
...The WT scheduling principle is derived from a theorem in [7], In the literature, this principle is commonly referred to as Brent's theorem or Brent's scheduling principle....
[...]