PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture

doi:10.1145/2749469.2750385

Proceedings ArticleDOI

PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture

Junwhan Ahn, +3 more

- Vol. 43, Iss: 3, pp 336-348

Chats0

TLDR

In this article, the authors propose a new PIM architecture that does not change the existing sequential programming models and automatically decides whether to execute PIM operations in memory or processors depending on the locality of data.

Abstract:

Processing-in-memory (PIM) is rapidly rising as a viable solution for the memory wall crisis, rebounding from its unsuccessful attempts in 1990s due to practicality concerns, which are alleviated with recent advances in 3D stacking technologies. However, it is still challenging to integrate the PIM architectures with existing systems in a seamless manner due to two common characteristics: unconventional programming models for in-memory computation units and lack of ability to utilize large on-chip caches. In this paper, we propose a new PIM architecture that (1) does not change the existing sequential programming models and (2) automatically decides whether to execute PIM operations in memory or processors depending on the locality of data. The key idea is to implement simple in-memory computation using compute-capable memory commands and use specialized instructions, which we call PIM-enabled instructions, to invoke in-memory computation. This allows PIM operations to be interoperable with existing programming models, cache coherence protocols, and virtual memory mechanisms with no modification. In addition, we introduce a simple hardware structure that monitors the locality of data accessed by a PIM-enabled instruction at runtime to adaptively execute the instruction at the host processor (instead of in memory) when the instruction can benefit from large on-chip caches. Consequently, our architecture provides the illusion that PIM operations are executed as if they were host processor instructions. We provide a case study of how ten emerging data-intensive workloads can benefit from our new PIM abstraction and its hardware implementation. Evaluations show that our architecture significantly improves system performance and, more importantly, combines the best parts of conventional and PIM architectures by adapting to data locality of applications.

PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture

Citations

Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology

Neurocube: a programmable digital neuromorphic architecture with high-density 3D memory

Pinatubo: a processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories

DRISA: a DRAM-based Reconfigurable In-Situ Accelerator

Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks

References

The anatomy of a large-scale hypertextual Web search engine

The Anatomy of a Large-Scale Hypertextual Web Search Engine.

Pin: building customized program analysis tools with dynamic instrumentation

The PARSEC benchmark suite: characterization and architectural implications

Measurement and analysis of online social networks

Related Papers (5)

A scalable processing-in-memory accelerator for parallel graph processing

TOP-PIM: throughput-oriented programmable processing in memory

A case for intelligent RAM

NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads

PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory