A scalable processing-in-memory accelerator for parallel graph processing
Citations
28 citations
Cites background from "A scalable processing-in-memory acc..."
...Many prior works [2,27, 67, 101, 136] propose logic layers with various compute capabilities to minimize data movement to the CPU core....
[...]
28 citations
27 citations
Cites background from "A scalable processing-in-memory acc..."
...ate data-intensive applications in some existing work [1, 4]....
[...]
...Fixedfunction PIMs offer simple computing/logic functions and are accessed through assembly-level intrinsic or simple library calls [1, 32]....
[...]
...Previous work shows that neither type of PIMs is an obvious performance winner, given the variety of applications that are likely to benefit from PIMs [19, 1, 2, 32, 51, 33, 65, 20, 4, 13, 65, 33]....
[...]
...Recent studies [19, 1, 2, 32, 51, 33, 65, 20, 4, 13, 65, 33] demonstrate that such integration technologies is likely to enable PIM in a practical manner....
[...]
27 citations
Cites background from "A scalable processing-in-memory acc..."
...Newer 3D-stacked memory packages [7, 8, 24, 60] enable higher bandwidth by increasing the memory interface width....
[...]
27 citations
Cites background from "A scalable processing-in-memory acc..."
...Moreover, contrarily to static graph processing, little research exists into accelerating streaming graph processing using hardware acceleration such as FPGAs [30], [41], [66], high-performance networking hardware and associated abstractions [72], [35], [31], [202], [32], [97], low-cost atomics [183], [203], hardware transactions [34], and others [31], [9]....
[...]
References
14,696 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
13,327 citations
5,629 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...For this purpose, we use METIS [27] to perform 512-way multi-constraint partitioning to balance the number of vertices, outgoing edges, and incoming edges of each partition, as done in a recent previous work [51]....
[...]
...This is confirmed by the observation that Tesseract with METIS spends 59% of execution time waiting for synchronization barriers....
[...]
4,019 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...We evaluate our architecture using an in-house cycle-accurate x86-64 simulator whose frontend is Pin [38]....
[...]
3,840 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
...It also includes two hardware prefetchers specialized for memory access patterns of graph processing, which operate based on the hints provided by our programming model....
[...]