A scalable processing-in-memory accelerator for parallel graph processing
Citations
40 citations
Cites background or methods from "A scalable processing-in-memory acc..."
...Tesseract [3] accelerates graph workloads by applying processingin-memory, where in-order core, prefetcher, and message queue are added to each vault, and memory network is used to transfer the data between memory modules....
[...]
...In Figure 6(d), next[15] is attempted to be relocated to next[3], but fails since next[11] is already relocated to next[3]....
[...]
39 citations
39 citations
39 citations
Cites background or methods from "A scalable processing-in-memory acc..."
...[29] focus on graph processing applications....
[...]
...TESSERACT [29] 2015 MM C3D S CPU F A CPU S API Y N Graph processing...
[...]
...have proposed various NMC designs and proved their potential in enhancing performance in many applications [29]–[32]....
[...]
...TESSERACT (2015) Ahn et al. [29] focus on graph processing applications....
[...]
...[29] has used a similar approach for graph processing algorithms....
[...]
38 citations
Cites background from "A scalable processing-in-memory acc..."
...Prior works propose a range of hardware-software cooperative mechanisms [4, 5, 11, 28, 36, 60, 73, 76, 80, 81, 84, 85, 95, 100] to accelerate memory-bound operations and can be applied to accelerate sparse matrix computations....
[...]
References
14,696 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
13,327 citations
5,629 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...For this purpose, we use METIS [27] to perform 512-way multi-constraint partitioning to balance the number of vertices, outgoing edges, and incoming edges of each partition, as done in a recent previous work [51]....
[...]
...This is confirmed by the observation that Tesseract with METIS spends 59% of execution time waiting for synchronization barriers....
[...]
4,019 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...We evaluate our architecture using an in-house cycle-accurate x86-64 simulator whose frontend is Pin [38]....
[...]
3,840 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
...It also includes two hardware prefetchers specialized for memory access patterns of graph processing, which operate based on the hints provided by our programming model....
[...]