A scalable processing-in-memory accelerator for parallel graph processing
Citations
4 citations
4 citations
4 citations
Cites background from "A scalable processing-in-memory acc..."
...PIM has been described in several hardware architecture papers [2, 3, 12, 14, 18, 19, 28, 32] which have explored various parameters in the design space....
[...]
4 citations
Cites background or methods from "A scalable processing-in-memory acc..."
...Tesseract is an application-specific architecture but TUPIM is universal....
[...]
...Therefore, we choose several typical benchmarks of scale-out applications from GraphBIG [22] used in[1, 5, 9, 11, 12], and implement them in C++....
[...]
...According to the prior works [1, 5, 7, 9, 11, 12], PFIs should meet the following three requirements: 1) The selection of PFIs should minimize unnecessary intermediate data movements (a....
[...]
...For example, the performance and energy improvements of Tesseract [1] on some graph processing applications are obtained by rewriting applications using the dedicated programming model for graph processing, which is not applicable to other PIM architectures or applications....
[...]
...Tesseract heavily needs code rewriting and re-compiling but TUPIM does not....
[...]
4 citations
References
14,696 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
13,327 citations
5,629 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...For this purpose, we use METIS [27] to perform 512-way multi-constraint partitioning to balance the number of vertices, outgoing edges, and incoming edges of each partition, as done in a recent previous work [51]....
[...]
...This is confirmed by the observation that Tesseract with METIS spends 59% of execution time waiting for synchronization barriers....
[...]
4,019 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...We evaluate our architecture using an in-house cycle-accurate x86-64 simulator whose frontend is Pin [38]....
[...]
3,840 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
...It also includes two hardware prefetchers specialized for memory access patterns of graph processing, which operate based on the hints provided by our programming model....
[...]