A scalable processing-in-memory accelerator for parallel graph processing
Citations
87 citations
84 citations
Cites background or methods from "A scalable processing-in-memory acc..."
...In addition, we introduce a simple hardware structure that monitors the locality of data accessed by a PIM-enabled instruction at runtime to adaptively execute the instruction at the host processor (instead of in memory) when the instruction can benefit from large on-chip caches....
[...]
...Moreover, most prior approaches perform in-memory computation on noncacheable, physically addressed memory regions, which inevitably sacrifices efficiency and safety of all memory accesses from host processors to memory regions that can potentially be accessed by PIM....
[...]
80 citations
77 citations
Cites background or methods from "A scalable processing-in-memory acc..."
..., [15, 14, 154, 41, 49, 59, 58, 25, 77, 24, 102, 119]) propose processing in the logic layer of 3D-stacked DRAM, which stacks DRAM layers on top of a logic layer (e....
[...]
...In contrast to Processing in Memory architectures [15, 14, 154, 49, 157, 141, 58, 43, 108, 57, 42, 41, 20, 16, 88, 139, 137, 118, 85, 115, 46, 39, 132, 133, 59, 53, 52, 24, 102, 25, 112, 26] that add extra computational logic closer to main memory, the idea behind Processing using Memory is to exploit the existing structure and organization of memory devices with minimal changes to provide additional functionality....
[...]
75 citations
References
14,696 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
13,327 citations
5,629 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...For this purpose, we use METIS [27] to perform 512-way multi-constraint partitioning to balance the number of vertices, outgoing edges, and incoming edges of each partition, as done in a recent previous work [51]....
[...]
...This is confirmed by the observation that Tesseract with METIS spends 59% of execution time waiting for synchronization barriers....
[...]
4,019 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...We evaluate our architecture using an in-house cycle-accurate x86-64 simulator whose frontend is Pin [38]....
[...]
3,840 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
...It also includes two hardware prefetchers specialized for memory access patterns of graph processing, which operate based on the hints provided by our programming model....
[...]