A scalable processing-in-memory accelerator for parallel graph processing
Citations
17 citations
16 citations
Additional excerpts
...cessing [9], [10], and biomedical applications [11]....
[...]
16 citations
Cites background from "A scalable processing-in-memory acc..."
...Figure 5 provides an example code for regex “[1-9]\dapples,” where the character class “[1-9]” and the escape character “\d” are not specific nodes....
[...]
...Thus, the matching candidates finder checks the input stream for “a” instead of the start character “[1-9],” and REMU searches “apples” first....
[...]
...When REMU executes line 7, it will reset the current position to the start offset (the position of “a” in the input stream) and then scans the data stream backward to match “\d[1-9]....
[...]
...The benefits of near data processing (NDP) have been demonstrated by many researchers at different levels of system hierarchy such as in-memory computing [1, 16, 28, 60] and processing in storage [5, 19, 24, 46, 54, 55]....
[...]
...An example code for search regex “[1-9]\dapples....
[...]
16 citations
Cites background or methods from "A scalable processing-in-memory acc..."
...Many recent studies are conducted based on this device to accelerate diverse applications [2], [3], [12], [13], [22], [24], [25], [28], [32]–[34], [40], [51], [54], [60], [64], [82]....
[...]
...This has been proposed for both very fine-grain NDA operations within single cache lines [2], [3], [41], [48], [59] and NDA operations within a virtual memory page [61]....
[...]
..., [2], [3], [6]–[8], [12], [23], [25], [26], [28], [39], [43], [44], [51], [63], [77])....
[...]
...NDA execution of graph processing has also been proposed because graph processing can be bottlenecked by peak memory bandwidth because of low temporal and spatial locality [2], [3], [59], [75], [83]....
[...]
...Such fine-grain NDA operations have indeed been discussed in prior work [2], [3], [48], [59]....
[...]
16 citations
References
14,696 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
13,327 citations
5,629 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...For this purpose, we use METIS [27] to perform 512-way multi-constraint partitioning to balance the number of vertices, outgoing edges, and incoming edges of each partition, as done in a recent previous work [51]....
[...]
...This is confirmed by the observation that Tesseract with METIS spends 59% of execution time waiting for synchronization barriers....
[...]
4,019 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...We evaluate our architecture using an in-house cycle-accurate x86-64 simulator whose frontend is Pin [38]....
[...]
3,840 citations
"A scalable processing-in-memory acc..." refers methods in this paper
...Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems....
[...]
...It also includes two hardware prefetchers specialized for memory access patterns of graph processing, which operate based on the hints provided by our programming model....
[...]