A scalable processing-in-memory accelerator for parallel graph processing
Citations
16 citations
16 citations
Cites background or methods from "A scalable processing-in-memory acc..."
...In [3], the authors demonstrate that increasing computation cores is inefficient because higher performance would require bigger memory bandwidth....
[...]
...The location pointer of V ertex 4 is [2, 2], so the cells after location [2, 2] to [3, 1] are adjacent vertices of V ertex 5....
[...]
...Driven by the 3D-stacking technology in recent years, PIM is resurgent by putting logic layer into 3D stacked memories [3]....
[...]
...The next step is to attain the adjacent vertices of V ertex 5 from the coordinate [2, 2] to [3, 1] in crossbar by activating wordline No....
[...]
...To maximize the available memory bandwidth, [3] integrates PIM technology into 3D-stacked memory....
[...]
16 citations
Cites background from "A scalable processing-in-memory acc..."
...[6] propose Tesseract, a programmable PIM accelerator for large scale graph processing using 3D integration....
[...]
16 citations
16 citations
Cites background or result from "A scalable processing-in-memory acc..."
...The performance of the previous studies implementing PIM on a base die of a 3D-stacked memory [19]–[21] cannot be better than that provided by the external full memory bandwidth...
[...]
...on a base die of a 3D-stacked memory [19]–[21] cannot be...
[...]
...First, the standard memory commands need to be neither blocked nor handled differently during the PIM execution; thus, at any time during the PIM computation, we can service high priority standard memory requests and naturally satisfy their performance requirement, which was not presented in the previous PIM studies [19], [21]–[23]....
[...]
...Tesseract [21] focused on the scalability of PIM memory for large-scale graph analysis [32], [33], [51]....
[...]
...the standard memory requests are assumed to be not received when the PIM operation is in progress [19], [21]–[23], [42]....
[...]
References
14,696 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
13,327 citations
5,629 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...For this purpose, we use METIS [27] to perform 512-way multi-constraint partitioning to balance the number of vertices, outgoing edges, and incoming edges of each partition, as done in a recent previous work [51]....
[...]
...This is confirmed by the observation that Tesseract with METIS spends 59% of execution time waiting for synchronization barriers....
[...]
4,019 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...We evaluate our architecture using an in-house cycle-accurate x86-64 simulator whose frontend is Pin [38]....
[...]
3,840 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
...It also includes two hardware prefetchers specialized for memory access patterns of graph processing, which operate based on the hints provided by our programming model....
[...]