A scalable processing-in-memory accelerator for parallel graph processing
Citations
13 citations
Cites background or methods from "A scalable processing-in-memory acc..."
..., HMC([14,32,73,81]) and ReRAM([70,82]) as described previously....
[...]
...As a consequence, the appropriate graph partition methods are required and are important to reduce the communication overhead([28,32,81])....
[...]
...Tesseract([32]) integrates common instructions of graph algorithms and achieves high performance through multiple HMCs....
[...]
...For example, Tesseract([32]) distributes the graphs to multiple vaults on HMCs to process in parallel....
[...]
...Evaluation on these accelerators has also demonstrated the efficiency and effectiveness of DSA design([16,28,32])....
[...]
13 citations
Additional excerpts
..., in or near memory structures), as described in detail in [4-6, 8, 38] and exemplified by [7-12, 14, 19, 20, 24, 27, 30, 34, 84, 108113]....
[...]
13 citations
Cites background from "A scalable processing-in-memory acc..."
...As both systolic arrays and graph processing are heavily investigated techniques in literature [20], [21], [19], [22], [23], [24], we omit details of implementation for the sake of brevity....
[...]
13 citations
Cites background from "A scalable processing-in-memory acc..."
...There is prior work on graph processing hardware accelerators [4, 48, 124, 126], but we believe there is still room for improvement....
[...]
13 citations
Cites background from "A scalable processing-in-memory acc..."
...Processing-in-memory (PIM) techniques have been proven promising in solving the computational and memory challenges [2]–[6]....
[...]
References
14,696 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
13,327 citations
5,629 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...For this purpose, we use METIS [27] to perform 512-way multi-constraint partitioning to balance the number of vertices, outgoing edges, and incoming edges of each partition, as done in a recent previous work [51]....
[...]
...This is confirmed by the observation that Tesseract with METIS spends 59% of execution time waiting for synchronization barriers....
[...]
4,019 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...We evaluate our architecture using an in-house cycle-accurate x86-64 simulator whose frontend is Pin [38]....
[...]
3,840 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
...It also includes two hardware prefetchers specialized for memory access patterns of graph processing, which operate based on the hints provided by our programming model....
[...]