Application-transparent near-memory processing architecture with memory channel network

doi:10.1109/MICRO.2018.00070

Proceedings ArticleDOI

Application-transparent near-memory processing architecture with memory channel network

- pp 802-814

TLDR

Memory Channel Network can serve as an application-transparent framework which can seamlessly unify near-memory processing within a server and distributed computing across such servers for data-intensive applications.

Abstract:

The physical memory capacity of servers is expected to increase drastically with the deployment of the forthcoming non-volatile memory technologies. This is a welcomed improvement for the emerging data-intensive applications. For such servers to be cost-effective, nonetheless, we must cost-effectively increase compute throughput and memory bandwidth commensurate with the increase in memory capacity without compromising the application readiness. Tackling this challenge, we present Memory Channel Network (MCN) architecture in this paper. Specifically, first, we propose an MCN DIMM, an extension of a buffered DIMM where a small but capable processor called MCN processor is integrated with a buffer device on the DIMM for near-memory processing. Second, we implement device drivers to give the host and MCN processors in a server an illusion that they are independent heterogeneous nodes connected through an Ethernet link. These allow the host and MCN processors in a server to run a given data-intensive application together based on popular distributed computing frameworks such as MPI and Spark without any change in the host processor hardware and its application software, while offering the benefits of high-bandwidth and low-latency communication between the host and MCN processors over the memory channels. As such, MCN can serve as an application-transparent framework which can seamlessly unify the near-memory processing within a server and the distributed computing across such servers for data-intensive applications. Our simulation running the full software stack shows that a server with 8 MCN DIMMs offers 4.56 x higher throughput and consume 47.5% less energy than a cluster with 9 conventional nodes connected through Ethernet links, as it facilitates up to 8.17 x higher aggregate DRAM bandwidth utilization. Lastly, we demonstrate the feasibility of MCN with an IBM POWER8 system and an experimental buffered DIMM.

Application-transparent near-memory processing architecture with memory channel network

Citations

TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning

Processing-in-memory: A workload-driven perspective

Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product

TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning

NERO: A Near High-Bandwidth Memory Stencil Accelerator for Weather Prediction Modeling

References

The Hadoop Distributed File System

Spark: cluster computing with working sets

The gem5 simulator

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

The Nas Parallel Benchmarks

Related Papers (5)

PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture

PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory

Neurocube: a programmable digital neuromorphic architecture with high-density 3D memory

A scalable processing-in-memory accelerator for parallel graph processing

Hybrid memory cube (HMC)