A scalable processing-in-memory accelerator for parallel graph processing
Citations
10Ā citations
Cites methods from "A scalable processing-in-memory acc..."
...Our experiments employ 17 benchmarks from a wide range of application based on prior works [8], [11], [16], [19], [34], as summarized in Table IV....
[...]
10Ā citations
Cites background from "A scalable processing-in-memory acc..."
...In [1, 7, 8], the proposed solutions require the programmers to identify the code that should be run near to memory....
[...]
...Several PIM architectures and programming models have been proposed by academic projects [1, 2, 11, 38] and multiple memory vendors are starting to adopt 3D die stacking in mass-produced memory....
[...]
...+ c[1] * (grid_a[i-1][j][k] + grid_a[i+1][j][k]...
[...]
10Ā citations
Cites methods from "A scalable processing-in-memory acc..."
...In [14], a special-purpose graph-processing architecture is presented, utilizing the logic layer of stacked memories to achieve very high performance....
[...]
9Ā citations
Cites background from "A scalable processing-in-memory acc..."
...On the hardware side, domain-specific accelerators dedicated towards graph processing has also been proposed [10], [11] to achieve high performance....
[...]
...[10] purposes to alleviate conventional concept of processing-inmemory (PIM) to design a programmable PIM accelerator that can achieve memory-capacity-proportional performance for large scale graph processing....
[...]
9Ā citations
References
14,696Ā citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
13,327Ā citations
5,629Ā citations
"A scalable processing-in-memory acc..." refers methods in this paper
...For this purpose, we use METIS [27] to perform 512-way multi-constraint partitioning to balance the number of vertices, outgoing edges, and incoming edges of each partition, as done in a recent previous work [51]....
[...]
...This is confirmed by the observation that Tesseract with METIS spends 59% of execution time waiting for synchronization barriers....
[...]
4,019Ā citations
"A scalable processing-in-memory acc..." refers methods in this paper
...We evaluate our architecture using an in-house cycle-accurate x86-64 simulator whose frontend is Pin [38]....
[...]
3,840Ā citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
...It also includes two hardware prefetchers specialized for memory access patterns of graph processing, which operate based on the hints provided by our programming model....
[...]