A scalable processing-in-memory accelerator for parallel graph processing
Citations
3 citations
Cites methods from "A scalable processing-in-memory acc..."
...Ahn et al. [8] tuned a PIM accelerator for large-scale graph processing called Tesseract....
[...]
...Apart from the advancements in designing high performance many-core processors, the progress of tightly stacking logic and memory elements in a package has enabled the researchers to redesign the conventional Processing-In-Memory (PIM) concept for in-memory big-data processing [8]....
[...]
...Regarding to the specialized memory access characteristics of graph processing workloads, new hardware prefetchers are utilized in Tesseract to enhance the utilization of memory bandwidth....
[...]
...[8] tuned a PIM accelerator for large-scale graph processing called Tesseract....
[...]
...% of cl ea n v ict im bl oc ks [1, 8) [8, 16) [16, 32) [32, 64) > 64 # of Re-reference:...
[...]
3 citations
3 citations
3 citations
Additional excerpts
...computation [2, 5, 6, 13, 20, 25, 32, 59, 60, 71, 78] continues....
[...]
3 citations
Cites background from "A scalable processing-in-memory acc..."
...The exploitation of the in-memory bandwidth available on 3D-stacked memory devices is presented in [18], in which processing-inmemory is used for parallel processing of big-data graphs....
[...]
References
14,696 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
13,327 citations
5,629 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...For this purpose, we use METIS [27] to perform 512-way multi-constraint partitioning to balance the number of vertices, outgoing edges, and incoming edges of each partition, as done in a recent previous work [51]....
[...]
...This is confirmed by the observation that Tesseract with METIS spends 59% of execution time waiting for synchronization barriers....
[...]
4,019 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...We evaluate our architecture using an in-house cycle-accurate x86-64 simulator whose frontend is Pin [38]....
[...]
3,840 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
...It also includes two hardware prefetchers specialized for memory access patterns of graph processing, which operate based on the hints provided by our programming model....
[...]