Journal ArticleDOI
Accelerating dependent cache misses with an enhanced memory controller
Milad Hashemi,Khubaib,Eiman Ebrahimi,Onur Mutlu,Yale N. Patt +4 more
- Vol. 44, Iss: 3, pp 444-455
Reads0
Chats0
TLDR
This work proposes adding just enough functionality to dynamically identify instructions at the core and migrate them to the memory controller for execution as soon as source data arrives from DRAM, allowing memory requests issued by the new Enhanced Memory Controller to experience a 20% lower latency than ifissued by the core.Abstract:
On-chip contention increases memory access latency for multicore processors. We identify that this additional latency has a substantial efect on performance for an important class of latency-critical memory operations: those that result in a cache miss and are dependent on data from a prior cache miss. We observe that the number of instructions between the frst cache miss and its dependent cache miss is usually small. To minimize dependent cache miss latency, we propose adding just enough functionality to dynamically identify these instructions at the core and migrate them to the memory controller for execution as soon as source data arrives from DRAM. This migration allows memory requests issued by our new Enhanced Memory Controller (EMC) to experience a 20% lower latency than if issued by the core. On a set of memory intensive quad-core workloads, the EMC results in a 13% improvement in system performance and a 5% reduction in energy consumption over a system with a Global History Bufer prefetcher, the highest performing prefetcher in our evaluation.read more
Citations
More filters
Proceedings ArticleDOI
Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation
Kevin Hsieh,Samira Khan,Nandita Vijaykumar,Kevin K. Chang,Amirali Boroumand,Saugata Ghose,Onur Mutlu +6 more
TL;DR: The In-Memory PoInter Chasing Accelerator (IMPICA), which leverages the logic layer within 3D-stacked memory for linked data structure traversal and addresses the key challenges of how to achieve high parallelism in the presence of serial accesses in pointer chasing, and how to effectively perform virtual-to-physical address translation on the memory side without requiring expensive accesses to the CPU's memory management unit.
Proceedings ArticleDOI
Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities
Ashutosh Pattnaik,Xulong Tang,Adwait Jog,Onur Kayiran,Asit K. Mishra,Mahmut Kandemir,Onur Mutlu,Chita R. Das +7 more
TL;DR: Two new runtime techniques are developed: a regression-based affinity prediction model and mechanism that accurately identifies which kernels would benefit from PIM and offloads them to GPU cores in memory, and a concurrent kernel management mechanism that uses the affinity Prediction model, a new kernel execution time prediction model, and kernel dependency information to decide which kernels to schedule concurrently on main GPU cores and the GPU core in memory.
Journal ArticleDOI
Processing data where it makes sense: Enabling in-memory computation
Onur Mutlu,Onur Mutlu,Saugata Ghose,Juan Gómez-Luna,Rachata Ausavarungnirun,Rachata Ausavarungnirun +5 more
TL;DR: In this paper, the authors discuss some recent research that aims to practically enable computation close to data and discuss at least two promising directions for processing-in-memory (PIM): (1) performing massively-parallel bulk operations in memory by exploiting the analog operational properties of DRAM, with low-cost changes, and (2) exploiting the logic layer in 3D-stacked memory technology to accelerate important data-intensive applications.
Journal ArticleDOI
RowHammer: A Retrospective
Onur Mutlu,Jeremie S. Kim +1 more
TL;DR: Kim et al. as mentioned in this paper comprehensively survey the scientific literature on RowHammer-based attacks as well as mitigation techniques to prevent RowHammers, and discuss what other related vulnerabilities may be lurking in DRAM and other types of memories, e.g., NAND flash memory or phase change memory, that can potentially threaten the foundations of secure systems.
Posted Content
The RowHammer Problem and Other Issues We May Face as Memory Becomes Denser
TL;DR: This work discusses the RowHammer problem in DRAM, which is a prime (and perhaps the first) example of how a circuit-level failure mechanism can cause a practical and widespread system security vulnerability, and describes and advocates a principled approach to memory reliability and security research that can enable us to better anticipate and prevent such vulnerabilities.
References
More filters
Proceedings ArticleDOI
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures
TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taking into account configuring clusters with 4 cores gives thebest EDA2P and EDAP.
Proceedings ArticleDOI
Automatically characterizing large scale program behavior
TL;DR: This work quantifies the effectiveness of Basic Block Vectors in capturing program behavior across several different architectural metrics, explores the large scale behavior of several programs, and develops a set of algorithms based on clustering capable of analyzing this behavior.
Proceedings ArticleDOI
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
TL;DR: In this article, a hardware technique to improve the performance of caches is presented, where a small fully-associative cache between a cache and its refill path is used to place prefetched data and not in the cache.
CACTI 6.0: A Tool to Model Large Caches
TL;DR: This report details the analytical model assumed for the newly added modules along with their validation analysis of CACTI 6.0, a significantly enhanced version of the tool that primarily focuses on interconnect design for large caches.
Proceedings ArticleDOI
A scalable processing-in-memory accelerator for parallel graph processing
TL;DR: This work argues that the conventional concept of processing-in-memory (PIM) can be a viable solution to achieve memory-capacity-proportional performance and designs a programmable PIM accelerator for large-scale graph processing called Tesseract.