NDMiner: accelerating graph pattern mining using near data processing

doi:10.1145/3470496.3527437

Proceedings ArticleDOI

NDMiner: accelerating graph pattern mining using near data processing

TLDR

NDMiner is presented, a Near Data Processing (NDP) architecture that improves the performance of GPM workloads and proposes a new graph remapping scheme in memory and a hardware-based set operation reordering technique to best optimize bank, rank, and channel-level parallelism in DRAM.

Abstract:

Graph Pattern Mining (GPM) algorithms mine structural patterns in graphs. The performance of GPM workloads is bottlenecked by control flow and memory stalls. This is because of data-dependent branches used in set intersection and difference operations that dominate the execution time. This paper first conducts a systematic GPM workload analysis and uncovers four new observations to inform the optimization effort. First, GPM workloads mostly fetch inputs of costly set operations from different memory banks. Second, to avoid redundant computation, modern GPM workloads employ symmetry breaking that discards several data reads, resulting in cache pollution and wasted DRAM bandwidth. Third, sparse pattern mining algorithms perform redundant memory reads and computations. Fourth, GPM workloads do not fully utilize the in-DRAM data parallelism. Based on these observations, this paper presents NDMiner, a Near Data Processing (NDP) architecture that improves the performance of GPM workloads. To reduce in-memory data transfer of fetching data from different memory banks, NDMiner integrates compute units to offload set operations in the buffer chip of DRAM. To alleviate the wasted memory bandwidth caused by symmetry breaking, NDMiner integrates a load elision unit in hardware that detects the satisfiability of symmetry breaking constraints and terminates unnecessary loads. To optimize the performance of sparse pattern mining, NDMiner employs compiler optimizations and maps reduced reads and composite computation to NDP hardware that improves algorithmic efficiency of sparse GPM. Finally, NDMiner proposes a new graph remapping scheme in memory and a hardware-based set operation reordering technique to best optimize bank, rank, and channel-level parallelism in DRAM. To orchestrate NDP computation, this paper presents design modifications at the host ISA, compiler, and memory controller. We compare the performance of NDMiner with state-of-the-art software and hardware baselines using a mix of dense and sparse GPM algorithms. Our evaluation shows that NDMiner significantly outperforms software and hardware baselines by 6.4X and 2.5X, on average, while incurring a negligible area overhead on CPU and DRAM.

NDMiner: accelerating graph pattern mining using near data processing

Citations

Mint: An Accelerator For Mining Temporal Motifs

Software Systems Implementation and Domain-Specific Architectures towards Graph Analytics

Arya: Arbitrary Graph Pattern Mining with Decomposition-based Sampling

PIMMiner: A High-performance PIM Architecture-aware Graph Mining Framework

Shogun: A Task Scheduling Framework for Graph Mining Accelerators

References

ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars

PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory

A scalable processing-in-memory accelerator for parallel graph processing

A case for intelligent RAM

Ramulator: A Fast and Extensible DRAM Simulator

Related Papers (5)

Run-Time Parallelization Optimization Techniques

A New Approach to Malware Detection

An Extended PRAM-NUMA Model of Computation for TCF Programming

Structural statistical software testing with active learning in a graph

On the control structure of a program slice