A scalable processing-in-memory accelerator for parallel graph processing
Citations
8 citations
8 citations
8 citations
Cites background or methods from "A scalable processing-in-memory acc..."
...SnipSnap frees snapshotting from the shackles of the consistencyperformance tradeoff by leveraging two related hardware trends—the emergence of high-bandwidth DRAM placed on the same package as the CPU [15, 41, 60, 61], and the resurgence of near-memory processing [6, 7, 44]....
[...]
...Consequently, near-memory processing logic for machine learning, graph processing, and general-purpose processing has been proposed [6, 7, 44] for better system performance and energy....
[...]
8 citations
Cites background or methods from "A scalable processing-in-memory acc..."
...We use typical PIM applications which are also used in [3-7, 11-12] to evaluate CuckooPIM, as listed in Table II....
[...]
...The ever-growing processing ability of PIM cores even brings more pressure on the design of PIM coherence since more and more instructions can be offloaded to PIM cores [3-8, 11-12]....
[...]
8 citations
Cites methods from "A scalable processing-in-memory acc..."
...[15] Tesseract Without prefetching: 9× DDR3-OoO, HMC-OoO,...
[...]
...To mitigate this, a programmable accelerator (called Tesseract) having 3D stacked memory has been designed with PIM technology [15]....
[...]
References
14,696 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
13,327 citations
5,629 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...For this purpose, we use METIS [27] to perform 512-way multi-constraint partitioning to balance the number of vertices, outgoing edges, and incoming edges of each partition, as done in a recent previous work [51]....
[...]
...This is confirmed by the observation that Tesseract with METIS spends 59% of execution time waiting for synchronization barriers....
[...]
4,019 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...We evaluate our architecture using an in-house cycle-accurate x86-64 simulator whose frontend is Pin [38]....
[...]
3,840 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
...It also includes two hardware prefetchers specialized for memory access patterns of graph processing, which operate based on the hints provided by our programming model....
[...]