A scalable processing-in-memory accelerator for parallel graph processing
Citations
1,197 citations
Cites background from "A scalable processing-in-memory acc..."
...Recent efforts [2], [3], [4], [5], [58] decouple logic and memory designs in different dies, adopting 3D stacked memories with a logic layer that encapsulates processing units to perform computation, as shown in Figure 3(b)....
[...]
...Also, while previous work focused on database and graph processing applications [3], [5], PRIME aims at accelerating NN applications....
[...]
...introduce promising solutions to the challenges [2], [3], [4], [5], by leveraging 3D memory technologies [6] to integrate computation logic with the memory....
[...]
633 citations
Cites background from "A scalable processing-in-memory acc..."
...Recent efforts [57, 58] decouple logic and memory designs in different dies, adopting 3D stacked memories with a logic layer that encapsulates processing units to perform computation....
[...]
453 citations
444 citations
Cites background from "A scalable processing-in-memory acc..."
..., [16, 17, 24, 37, 42, 47, 48, 115]) propose processing in the logic layer of 3D-stacked DRAM, which stacks DRAM layers on top of a logic layer (e....
[...]
..., [16, 17, 24, 25, 37, 42, 47, 48, 80, 89, 115]) propose mechanisms to perform computation in the logic layer of 3D-stacked memory architectures....
[...]
415 citations
References
558 citations
543 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...On the other hand, a 64-bit vertical interface for each DRAM partition (or vault, see Section 3.1 for details), 32 vaults per cube, and 2 Gb/s of TSV signaling rate [24] together achieve an internal memory bandwidth of 512 GB/s per cube....
[...]
541 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
541 citations
"A scalable processing-in-memory acc..." refers background or methods in this paper
...Although the use of multiple GPGPUs alleviates this problem to some extent, relatively low bandwidth and high latency of PCIe-based interconnect may not be sufficient for fast graph processing, which generates a massive amount of random memory accesses across the entire graph [40]....
[...]
...Some works use GPGPUs to accelerate graph processing [15, 18, 19, 40]....
[...]
504 citations