A scalable processing-in-memory accelerator for parallel graph processing
Citations
1,197 citations
Cites background from "A scalable processing-in-memory acc..."
...Recent efforts [2], [3], [4], [5], [58] decouple logic and memory designs in different dies, adopting 3D stacked memories with a logic layer that encapsulates processing units to perform computation, as shown in Figure 3(b)....
[...]
...Also, while previous work focused on database and graph processing applications [3], [5], PRIME aims at accelerating NN applications....
[...]
...introduce promising solutions to the challenges [2], [3], [4], [5], by leveraging 3D memory technologies [6] to integrate computation logic with the memory....
[...]
633 citations
Cites background from "A scalable processing-in-memory acc..."
...Recent efforts [57, 58] decouple logic and memory designs in different dies, adopting 3D stacked memories with a logic layer that encapsulates processing units to perform computation....
[...]
453 citations
444 citations
Cites background from "A scalable processing-in-memory acc..."
..., [16, 17, 24, 37, 42, 47, 48, 115]) propose processing in the logic layer of 3D-stacked DRAM, which stacks DRAM layers on top of a logic layer (e....
[...]
..., [16, 17, 24, 25, 37, 42, 47, 48, 80, 89, 115]) propose mechanisms to perform computation in the logic layer of 3D-stacked memory architectures....
[...]
415 citations
References
14,696 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
13,327 citations
5,629 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...For this purpose, we use METIS [27] to perform 512-way multi-constraint partitioning to balance the number of vertices, outgoing edges, and incoming edges of each partition, as done in a recent previous work [51]....
[...]
...This is confirmed by the observation that Tesseract with METIS spends 59% of execution time waiting for synchronization barriers....
[...]
4,019 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...We evaluate our architecture using an in-house cycle-accurate x86-64 simulator whose frontend is Pin [38]....
[...]
3,840 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
...It also includes two hardware prefetchers specialized for memory access patterns of graph processing, which operate based on the hints provided by our programming model....
[...]