A scalable processing-in-memory accelerator for parallel graph processing
Citations
5 citations
5 citations
4 citations
Cites background or methods from "A scalable processing-in-memory acc..."
...For comparison purposes, we use GPGPUsim to simulate PNM architectures based on a GPGPU, Variable Warp Sizing (VWS) [41] which is currently the best branch-optimized GPGPU (for BMLAs’ branches), and SSMC (representing previous multicores without row-orientedness [11], [10], [12])....
[...]
...Further, Tesseract is not row-oriented and would incur straying similar to conventional multicores and plain SSMC....
[...]
...While processing-in-memory (PIM) has been around for decades [5], [6], [7], [8], [9], [10], [11], [12], [13], there have been three problems....
[...]
...While Tesseract [12] targets graph workloads via MIMD and inter-core communication, such workloads are not row-dense or compact....
[...]
4 citations
Cites background from "A scalable processing-in-memory acc..."
...In addition, new emerging data-intensive applications further increase memory traffic [4, 5, 47]....
[...]
References
14,696 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
13,327 citations
5,629 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...For this purpose, we use METIS [27] to perform 512-way multi-constraint partitioning to balance the number of vertices, outgoing edges, and incoming edges of each partition, as done in a recent previous work [51]....
[...]
...This is confirmed by the observation that Tesseract with METIS spends 59% of execution time waiting for synchronization barriers....
[...]
4,019 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...We evaluate our architecture using an in-house cycle-accurate x86-64 simulator whose frontend is Pin [38]....
[...]
3,840 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
...It also includes two hardware prefetchers specialized for memory access patterns of graph processing, which operate based on the hints provided by our programming model....
[...]