A scalable processing-in-memory accelerator for parallel graph processing
Citations
3Ā citations
Cites background from "A scalable processing-in-memory acc..."
...Over the past few years, there have been numerous works targeting moving processing to in or near memories like DRAM([54],[55],[33]), 3D-stacked HBM[9] or HMC[51]([29], [64],[32],[46],[42],[30]) and even SSDs([48])....
[...]
2Ā citations
Cites background from "A scalable processing-in-memory acc..."
...A spectrum of near-data processing (NDP) work [1,3,6,10,17,16,24,18,27,29,30,24] have been proposed recently....
[...]
2Ā citations
2Ā citations
Cites background from "A scalable processing-in-memory acc..."
...Tesseract [12] places small computing units on the logic die of 3D stacked memories, and multiple blocks work together as an accelerator for graph processing....
[...]
2Ā citations
References
14,696Ā citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
13,327Ā citations
5,629Ā citations
"A scalable processing-in-memory acc..." refers methods in this paper
...For this purpose, we use METIS [27] to perform 512-way multi-constraint partitioning to balance the number of vertices, outgoing edges, and incoming edges of each partition, as done in a recent previous work [51]....
[...]
...This is confirmed by the observation that Tesseract with METIS spends 59% of execution time waiting for synchronization barriers....
[...]
4,019Ā citations
"A scalable processing-in-memory acc..." refers methods in this paper
...We evaluate our architecture using an in-house cycle-accurate x86-64 simulator whose frontend is Pin [38]....
[...]
3,840Ā citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
...It also includes two hardware prefetchers specialized for memory access patterns of graph processing, which operate based on the hints provided by our programming model....
[...]