A scalable processing-in-memory accelerator for parallel graph processing
Citations
46 citations
46 citations
Cites background from "A scalable processing-in-memory acc..."
...Tesseract [6] is a scalable NDP accelerator for parallel graph processing....
[...]
...a promising paradigm to reduce the data movement between CPUs and memory by placing simple general-purpose processors [6, 16, 42] or application-specific accelerators [7, 16, 19, 43, 52, 111] in or close to the logic layer of 3D-stacked memory....
[...]
...efficiency when they are carefully designed with low-cost and low-overhead near data processing cores for memory-bound applications [6, 7, 16, 17, 30, 31, 33, 34, 38, 43, 52, 57, 68, 70, 72]....
[...]
46 citations
44 citations
44 citations
Cites background from "A scalable processing-in-memory acc..."
...stacking technology, reopening opportunities for near-DRAM processing architectures [2], [6]–[8], [24], [63]–[70]....
[...]
...As such, the high-level processing model of the recent near-memory processing architectures was inspired by the distributed computing frameworks [2], [6]....
[...]
...nonetheless, require significant changes in target applications especially to orchestrate the communication between the host and near-memory processors [2], [5], [13], [14]....
[...]
References
14,696 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
13,327 citations
5,629 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...For this purpose, we use METIS [27] to perform 512-way multi-constraint partitioning to balance the number of vertices, outgoing edges, and incoming edges of each partition, as done in a recent previous work [51]....
[...]
...This is confirmed by the observation that Tesseract with METIS spends 59% of execution time waiting for synchronization barriers....
[...]
4,019 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...We evaluate our architecture using an in-house cycle-accurate x86-64 simulator whose frontend is Pin [38]....
[...]
3,840 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
...It also includes two hardware prefetchers specialized for memory access patterns of graph processing, which operate based on the hints provided by our programming model....
[...]