Showing papers by "Moinuddin K. Qureshi published in 2017"

PDF

Open Access

Proceedings Article•

FlashBlox: achieving both performance isolation and uniform lifetime for virtualized SSDs

[...]

Jian Huang¹, Anirudh Badam², Laura Marie Caulfield², Suman Nath², Sudipta Sengupta², Bikash Sharma², Moinuddin K. Qureshi¹ - Show less +3 more•Institutions (2)

Georgia Institute of Technology¹, Microsoft²

27 Feb 2017

TL;DR: The wear of different channels and dies is proposed to be allowed to diverge at fine time granularities in favor of isolation and adjusting that imbalance at a coarse time granularity in a principled manner.

...read moreread less

Abstract: A longstanding goal of SSD virtualization has been to provide performance isolation between multiple tenants sharing the device. Virtualizing SSDs, however, has traditionally been a challenge because of the fundamental tussle between resource isolation and the lifetime of the device - existing SSDs aim to uniformly age all the regions of flash and this hurts isolation. We propose utilizing flash parallelism to improve isolation between virtual SSDs by running them on dedicated channels and dies. Furthermore, we offer a complete solution by also managing the wear. We propose allowing the wear of different channels and dies to diverge at fine time granularities in favor of isolation and adjusting that imbalance at a coarse time granularity in a principled manner. Our experiments show that the new SSD wears uniformly while the 99th percentile latencies of storage operations in a variety of multi-tenant settings are reduced by up to 3.1x compared to software isolated virtual SSDs.

...read moreread less

86 citations

Proceedings Article•DOI•

FlashGuard: Leveraging Intrinsic Flash Properties to Defend Against Encryption Ransomware

[...]

Jian Huang¹, Jun Xu², Xinyu Xing², Peng Liu², Moinuddin K. Qureshi¹ - Show less +1 more•Institutions (2)

Georgia Institute of Technology¹, Pennsylvania State University²

30 Oct 2017

TL;DR: FlashGuard is proposed, a ransomware tolerant Solid State Drive (SSD) which has a firmware-level recovery system that allows quick and effective recovery from encryption ransomware without relying on explicit backups and has a negligible impact on the performance and lifetime of the SSD.

...read moreread less

Abstract: Encryption ransomware is a malicious software that stealthily encrypts user files and demands a ransom to provide access to these files. Several prior studies have developed systems to detect ransomware by monitoring the activities that typically occur during a ransomware attack. Unfortunately, by the time the ransomware is detected, some files already undergo encryption and the user is still required to pay a ransom to access those files. Furthermore, ransomware variants can obtain kernel privilege, which allows them to terminate software-based defense systems, such as anti-virus. While periodic backups have been explored as a means to mitigate ransomware, such backups incur storage overheads and are still vulnerable as ransomware can obtain kernel privilege to stop or destroy backups. Ideally, we would like to defend against ransomware without relying on software-based solutions and without incurring the storage overheads of backups. To that end, this paper proposes FlashGuard, a ransomware tolerant Solid State Drive (SSD) which has a firmware-level recovery system that allows quick and effective recovery from encryption ransomware without relying on explicit backups. FlashGuard leverages the observation that the existing SSD already performs out-of-place writes in order to mitigate the long erase latency of flash memories. Therefore, when a page is updated or deleted, the older copy of that page is anyway present in the SSD. FlashGuard slightly modifies the garbage collection mechanism of the SSD to retain the copies of the data encrypted by ransomware and ensure effective data recovery. Our experiments with 1,447 manually labeled ransomware samples show that FlashGuard can efficiently restore files encrypted by ransomware. In addition, we demonstrate that FlashGuard has a negligible impact on the performance and lifetime of the SSD.

...read moreread less

68 citations

Proceedings Article•DOI•

Taming the instruction bandwidth of quantum computers via hardware-managed error correction

[...]

Swamit S. Tannu¹, Zachary Myers², Prashant J. Nair¹, Douglas Carmean³, Moinuddin K. Qureshi¹ - Show less +1 more•Institutions (3)

Georgia Institute of Technology¹, Stanford University², Microsoft³

14 Oct 2017

TL;DR: It is shown that 99.999% of the instructions in the instruction stream of a typical quantum workload stem from error correction, and an architecture that delegates the task of quantum error correction to the hardware is proposed, QuEST (Quantum Error-Correction Substrate), which reduces instruction bandwidth demand of several key workloads by ftve orders of magnitude.

...read moreread less

Abstract: A quantum computer consists of quantum bits (qubits) and a control processor that acts as an interface between the programmer and the qubits. As qubits are very sensitive to noise, they rely on continuous error correction to maintain the correct state. Current proposals rely on software-managed error correction and require large instruction bandwidth, which must scale in proportion to the number of qubits. While such a design may be reasonable for small-scale quantum computers, we show that instruction bandwidth tends to become a critical bottleneck for scaling quantum computers. In this paper, we show that 99.999% of the instructions in the instruction stream of a typical quantum workload stem from error correction. Using this observation, we propose QuEST (Quantum Error-Correction Substrate), an architecture that delegates the task of quantum error correction to the hardware. QuEST uses a dedicated programmable micro-coded engine to continuously replay the instruction stream associated with error correction. The instruction bandwidth requirement of QuEST scales in proportion to the number of active qubits (typically < < 0.1%) rather than the total number of qubits. We analyze the effectiveness of QuEST with area and thermal constraints and propose a scalable microarchitecture using typical Quantum Error Correction Code (QECC) execution patterns. Our evaluations show that QuEST reduces instruction bandwidth demand of several key workloads by ftve orders of magnitude while ensuring deterministic instruction delivery. Apart from error correction, we also observe a large instruction bandwidth requirement for fault tolerant quantum instructions (magic state distillation). We extend QuEST to manage these instructions in hardware and provide additional reduction in bandwidth. With QuEST, we reduce the total instruction bandwidth by eight orders of magnitude. CCS CONCEPTS • Computer systems organization → Quantum computing;

...read moreread less

52 citations

Proceedings Article•DOI•

Cryogenic-DRAM based memory system for scalable quantum computers: a feasibility study

[...]

Swamit S. Tannu¹, Douglas Carmean², Moinuddin K. Qureshi¹•Institutions (2)

Georgia Institute of Technology¹, Microsoft²

02 Oct 2017

TL;DR: The minimum operational temperature for 55 DIMMs is reported and a significant fraction of DRAM chips continue to work at temperatures as low as 80K, which is an initial step towards evaluating the effectiveness of cryogenic DRAM as a main memory for quantum computers.

...read moreread less

Abstract: A quantum computer can solve fundamentally difficult problems by utilizing properties of quantum bits (qubits). It consists of a quantum substrate, connected to a conventional computer, termed as control processor. A control processor can manipulate and measure the state of the qubits and act as an interface between qubits and the programmer. Unfortunately, qubits are extremely noise-sensitive, and to minimize the noise; qubits are operated at cryogenic temperatures. To build a scalable quantum computer, a control processor which can work at cryogenic temperatures is essential [3, 14]. In this paper, we focus on the challenges of building a memory system for a cryogenic control processor. A scalable quantum computer will require large memory capacity for storing the program and the data generated by the quantum error correction. To this end, we evaluate the feasibility of cryogenic DRAM-based memory system by characterizing commercial DRAM modules at cryogenic temperatures. In this paper, we report the minimum operational temperature for 55 DIMMs (consisting of a total of 750 DRAM chips) and analyze the error patterns in commodity DRAM devices operated at cryogenic temperatures. Our study shows that a significant fraction of DRAM chips continue to work at temperatures as low as 80K. This study is an initial step towards evaluating the effectiveness of cryogenic DRAM as a main memory for quantum computers.

...read moreread less

43 citations

Proceedings Article•DOI•

DICE: Compressing DRAM Caches for Bandwidth and Capacity

[...]

Vinson Young¹, Prashant J. Nair¹, Moinuddin K. Qureshi¹•Institutions (1)

Georgia Institute of Technology¹

24 Jun 2017

TL;DR: DICE is proposed, a dynamic design that can adapt between spatial indexing and TSI, depending on the compressibility of the data, and low-cost Cache Index Predictors (CIP) that can accurately predict the cache indexing scheme on access in order to avoid probing both indices for retrieving a given cache line.

...read moreread less

Abstract: This paper investigates compression for DRAM caches. As the capacity of DRAM cache is typically large, prior techniques on cache compression, which solely focus on improving cache capacity, provide only a marginal benefit. We show that more performance benefit can be obtained if the compression of the DRAM cache is tailored to provide higher bandwidth. If a DRAM cache can provide two compressed lines in a single access, and both lines are useful, the effective bandwidth of the DRAM cache would double. Unfortunately, it is not straight-forward to compress DRAM caches for bandwidth. The typically used Traditional Set Indexing (TSI) maps consecutive lines to consecutive sets, so the multiple compressed lines obtained from the set are from spatially distant locations and unlikely to be used within a short period of each other. We can change the indexing of the cache to place consecutive lines in the same set to improve bandwidth; however, when the data is incompressible, such spatial indexing reduces effective capacity and causes significant slowdown.Ideally, we would like to have spatial indexing when the data is compressible and TSI otherwise. To this end, we propose Dynamic-Indexing Cache comprEssion (DICE), a dynamic design that can adapt between spatial indexing and TSI, depending on the compressibility of the data. We also propose low-cost Cache Index Predictors (CIP) that can accurately predict the cache indexing scheme on access in order to avoid probing both indices for retrieving a given cache line. Our studies with a 1GB DRAM cache, on a wide range of workloads (including SPEC and Graph), show that DICE improves performance by 19.0% and reduces energy-delay-product by 36% on average. DICE is within 3% of a design that has double the capacity and double the bandwidth. DICE incurs a storage overhead of less than 1KB and does not rely on any OS support.

...read moreread less

41 citations

Proceedings Article•DOI•

BATMAN: techniques for maximizing system bandwidth of memory systems with stacked-DRAM

[...]

Chiachen Chou¹, Aamer Jaleel², Moinuddin K. Qureshi¹•Institutions (2)

Georgia Institute of Technology¹, Nvidia²

02 Oct 2017

TL;DR: Bandwidth-Aware Tiered-Memory Management (BATMAN), a runtime mechanism that manages the distribution of memory accesses in a tiered-memory system by explicitly controlling data movement and incurs only an eight-byte hardware overhead and requires negligible software modification is proposed.

...read moreread less

Abstract: Tiered-memory systems consist of high-bandwidth 3D-DRAM and high-capacity commodity-DRAM. Conventional designs attempt to improve system performance by maximizing the number of memory accesses serviced by 3D-DRAM. However, when the commodity-DRAM bandwidth is a significant fraction of overall system bandwidth, the techniques inefficiently utilize the total bandwidth offered by the tiered-memory system and yields sub-optimal performance. In such situations, the performance can be improved by distributing memory accesses that are proportional to the bandwidth of each memory. Ideally, we want a simple and effective runtime mechanism that achieves the desired access distribution without requiring significant hardware or software support. This paper proposes Bandwidth-Aware Tiered-Memory Management (BATMAN), a runtime mechanism that manages the distribution of memory accesses in a tiered-memory system by explicitly controlling data movement. BATMAN monitors the number of accesses to both memories, and when the number of 3D-DRAM accesses exceeds the desired threshold, BATMAN disallows data movement from the commodity-DRAM to 3D-DRAM and proactively moves data from 3D-DRAM to commodity-DRAM. We demonstrate BATMAN on systems that architect the 3D-DRAM as either a hardware-managed cache (cache mode) or a part of the OS-visible memory space (flat mode). Our evaluations on a system with 4GB 3D-DRAM and 32GB commodity-DRAM show that BATMAN improves performance by an average of 11% and 10% and energy-delay product by 13% and 11% for systems in the cache and flat modes, respectively. BATMAN incurs only an eight-byte hardware overhead and requires negligible software modification.

...read moreread less

29 citations

Journal Article•DOI•

Top Picks from the 2016 Computer Architecture Conferences

[...]

Aamer Jaleel¹, Moinuddin K. Qureshi²•Institutions (2)

Nvidia¹, Georgia Institute of Technology²

14 Jun 2017-IEEE Micro

TL;DR: The guest editors introduce the Top Picks and Honorable Mentions from the 2016 computer architecture conferences.

...read moreread less

Abstract: The guest editors introduce the Top Picks and Honorable Mentions from the 2016 computer architecture conferences.

...read moreread less

3 citations