A scalable processing-in-memory accelerator for parallel graph processing
Citations
10 citations
Cites background from "A scalable processing-in-memory acc..."
...This potentially increases the number of accesses to the main memory, which is around 1000x more costly than float-point operations [1, 12]....
[...]
...However, both issues can potentially be solved: previous works [1, 12] have already proposed solutions for such algorithms targeting improvements on locality....
[...]
10 citations
Cites background or methods from "A scalable processing-in-memory acc..."
...The first major innovation is 3D-stacked memory [5, 37-40]....
[...]
...In fact, unlike PIM logic that is added to server or desktop environments, consumer devices may not be able to afford the addition of full-blown general-purpose PIM cores [2224, 68, 120], GPU PIM cores [75, 85, 90], or complex PIM accelerators [5, 62, 119] to 3D-stacked memory....
[...]
...Today, the total cost of computation, in terms of performance and in terms of energy, is dominated by the cost of data movement for modern data-intensive workloads such as machine learning and data analytics [5, 15, 16, 21-25]....
[...]
...To solve this second challenge, we develop a series of interfaces and mechanisms that are designed specifically to allow programmers to use PIM in a way that preserves conventional programming models [5, 16-24, 62, 75]....
[...]
...Innovations such as (1) 3D-stacked memory dies that combine a logic layer with DRAM layers [5, 37-40], (2) the ability to perform logic operations using memory cells themselves inside a memory chip [18, 20, 41-49], and (3) the emergence of potentially more computation-friendly resistive memory technologies [50-61] provide new opportunities to embed general-purpose computation directly within the memory [5, 1619, 21, 22, 24, 25, 41-43, 47-49, 62-100]....
[...]
10 citations
10 citations
10 citations
References
14,696 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
13,327 citations
5,629 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...For this purpose, we use METIS [27] to perform 512-way multi-constraint partitioning to balance the number of vertices, outgoing edges, and incoming edges of each partition, as done in a recent previous work [51]....
[...]
...This is confirmed by the observation that Tesseract with METIS spends 59% of execution time waiting for synchronization barriers....
[...]
4,019 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...We evaluate our architecture using an in-house cycle-accurate x86-64 simulator whose frontend is Pin [38]....
[...]
3,840 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
...It also includes two hardware prefetchers specialized for memory access patterns of graph processing, which operate based on the hints provided by our programming model....
[...]