A scalable processing-in-memory accelerator for parallel graph processing
Citations
5 citations
Cites background from "A scalable processing-in-memory acc..."
...There are also many studies that explore emerging processing-in-memory architectures to accelerate graph processing [1, 46, 59]....
[...]
...Therefore, there have been some research efforts aiming to reduce the performance impact of data conflicts in on-chip BRAM during graph processing [1, 11, 21, 46, 79]: some [1, 46] focus on reducing the inherent overheads in providing atomic data accesses, while others [11, 21, 79] aim to reduce the number and frequency of data conflicts in BRAM....
[...]
...four residue classes: [0] = {0, 4}, [1] = {1, 5}, [2] = {2, 6}, and [3] = {3, 7}....
[...]
..., reducing the number of conflicts incurred [11, 79], alleviating the atomicity overhead involved [1, 46], and employing a parallel conflict management scheme [72]....
[...]
5 citations
5 citations
5 citations
Cites background from "A scalable processing-in-memory acc..."
..., NDP) in storage level [3] or processing-in-memory in memory level [12][19], respectively....
[...]
5 citations
Cites background from "A scalable processing-in-memory acc..."
..., [1, 2]), or with the non-volatile memory (e....
[...]
References
14,696 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
13,327 citations
5,629 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...For this purpose, we use METIS [27] to perform 512-way multi-constraint partitioning to balance the number of vertices, outgoing edges, and incoming edges of each partition, as done in a recent previous work [51]....
[...]
...This is confirmed by the observation that Tesseract with METIS spends 59% of execution time waiting for synchronization barriers....
[...]
4,019 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...We evaluate our architecture using an in-house cycle-accurate x86-64 simulator whose frontend is Pin [38]....
[...]
3,840 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
...It also includes two hardware prefetchers specialized for memory access patterns of graph processing, which operate based on the hints provided by our programming model....
[...]