scispace - formally typeset
Proceedings ArticleDOI

NDMiner: accelerating graph pattern mining using near data processing

TLDR
NDMiner is presented, a Near Data Processing (NDP) architecture that improves the performance of GPM workloads and proposes a new graph remapping scheme in memory and a hardware-based set operation reordering technique to best optimize bank, rank, and channel-level parallelism in DRAM.
Abstract
Graph Pattern Mining (GPM) algorithms mine structural patterns in graphs. The performance of GPM workloads is bottlenecked by control flow and memory stalls. This is because of data-dependent branches used in set intersection and difference operations that dominate the execution time. This paper first conducts a systematic GPM workload analysis and uncovers four new observations to inform the optimization effort. First, GPM workloads mostly fetch inputs of costly set operations from different memory banks. Second, to avoid redundant computation, modern GPM workloads employ symmetry breaking that discards several data reads, resulting in cache pollution and wasted DRAM bandwidth. Third, sparse pattern mining algorithms perform redundant memory reads and computations. Fourth, GPM workloads do not fully utilize the in-DRAM data parallelism. Based on these observations, this paper presents NDMiner, a Near Data Processing (NDP) architecture that improves the performance of GPM workloads. To reduce in-memory data transfer of fetching data from different memory banks, NDMiner integrates compute units to offload set operations in the buffer chip of DRAM. To alleviate the wasted memory bandwidth caused by symmetry breaking, NDMiner integrates a load elision unit in hardware that detects the satisfiability of symmetry breaking constraints and terminates unnecessary loads. To optimize the performance of sparse pattern mining, NDMiner employs compiler optimizations and maps reduced reads and composite computation to NDP hardware that improves algorithmic efficiency of sparse GPM. Finally, NDMiner proposes a new graph remapping scheme in memory and a hardware-based set operation reordering technique to best optimize bank, rank, and channel-level parallelism in DRAM. To orchestrate NDP computation, this paper presents design modifications at the host ISA, compiler, and memory controller. We compare the performance of NDMiner with state-of-the-art software and hardware baselines using a mix of dense and sparse GPM algorithms. Our evaluation shows that NDMiner significantly outperforms software and hardware baselines by 6.4X and 2.5X, on average, while incurring a negligible area overhead on CPU and DRAM.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Mint: An Accelerator For Mining Temporal Motifs

TL;DR: In this paper , the authors propose a task-centric programming model that enables decoupled, asynchronous execution of task context information on-chip and design a domain-specific hardware accelerator using its data path and memory subsystem design.
Journal ArticleDOI

Software Systems Implementation and Domain-Specific Architectures towards Graph Analytics

TL;DR: In this article , the authors discuss the future challenges of graph analytics and present several programming models, execution modes, and messaging strategies to improve the utilization of traditional hardware and performance of graph applications.
Proceedings Article

Arya: Arbitrary Graph Pattern Mining with Decomposition-based Sampling

TL;DR: Arya as discussed by the authors combines graph decomposition theory with edge sampling-based approximation to reduce the complexity of mining complex patterns on graphs with up to tens of billions of edges, a scale that was only possible on supercomputers.
Journal ArticleDOI

PIMMiner: A High-performance PIM Architecture-aware Graph Mining Framework

Jiya Su, +2 more
- 17 Jun 2023 - 
TL;DR: PIMMiner as mentioned in this paper is a high-performance PIM architecture graph mining framework that enhances the locality, and internal bandwidth utilization and reduces remote bank accesses and load imbalance through cohesive algorithm and architecture co-designs.
Proceedings ArticleDOI

Shogun: A Task Scheduling Framework for Graph Mining Accelerators

TL;DR: Shogun as discussed by the authors enables adaptive locality-aware out-of-order task scheduling by deploying a task tree to decouple the task generation and execution pipeline stages and further develops accelerator optimizations including task tree splitting for load balance, and search tree merging to explore multiple search trees in parallel on one PE.
References
More filters
Journal ArticleDOI

ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars

TL;DR: This work explores an in-situ processing approach, where memristor crossbar arrays not only store input weights, but are also used to perform dot-product operations in an analog manner.
Journal ArticleDOI

PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory

TL;DR: This work proposes a novel PIM architecture, called PRIME, to accelerate NN applications in ReRAM based main memory, and distinguishes itself from prior work on NN acceleration, with significant performance improvement and energy saving.
Proceedings ArticleDOI

A scalable processing-in-memory accelerator for parallel graph processing

TL;DR: This work argues that the conventional concept of processing-in-memory (PIM) can be a viable solution to achieve memory-capacity-proportional performance and designs a programmable PIM accelerator for large-scale graph processing called Tesseract.
Journal ArticleDOI

A case for intelligent RAM

TL;DR: The state of microprocessors and DRAMs today is reviewed, some of the opportunities and challenges for IRAMs are explored, and performance and energy efficiency of three IRAM designs are estimated.
Journal ArticleDOI

Ramulator: A Fast and Extensible DRAM Simulator

TL;DR: This paper presents Ramulator, a fast and cycle-accurate DRAM simulator that is built from the ground up for extensibility, and is able to provide out-of-the-box support for a wide array of DRAM standards.
Related Papers (5)