scispace - formally typeset
Search or ask a question

Showing papers by "Moinuddin K. Qureshi published in 2017"


Proceedings Article
27 Feb 2017
TL;DR: The wear of different channels and dies is proposed to be allowed to diverge at fine time granularities in favor of isolation and adjusting that imbalance at a coarse time granularity in a principled manner.
Abstract: A longstanding goal of SSD virtualization has been to provide performance isolation between multiple tenants sharing the device. Virtualizing SSDs, however, has traditionally been a challenge because of the fundamental tussle between resource isolation and the lifetime of the device - existing SSDs aim to uniformly age all the regions of flash and this hurts isolation. We propose utilizing flash parallelism to improve isolation between virtual SSDs by running them on dedicated channels and dies. Furthermore, we offer a complete solution by also managing the wear. We propose allowing the wear of different channels and dies to diverge at fine time granularities in favor of isolation and adjusting that imbalance at a coarse time granularity in a principled manner. Our experiments show that the new SSD wears uniformly while the 99th percentile latencies of storage operations in a variety of multi-tenant settings are reduced by up to 3.1x compared to software isolated virtual SSDs.

86 citations


Proceedings ArticleDOI
30 Oct 2017
TL;DR: FlashGuard is proposed, a ransomware tolerant Solid State Drive (SSD) which has a firmware-level recovery system that allows quick and effective recovery from encryption ransomware without relying on explicit backups and has a negligible impact on the performance and lifetime of the SSD.
Abstract: Encryption ransomware is a malicious software that stealthily encrypts user files and demands a ransom to provide access to these files. Several prior studies have developed systems to detect ransomware by monitoring the activities that typically occur during a ransomware attack. Unfortunately, by the time the ransomware is detected, some files already undergo encryption and the user is still required to pay a ransom to access those files. Furthermore, ransomware variants can obtain kernel privilege, which allows them to terminate software-based defense systems, such as anti-virus. While periodic backups have been explored as a means to mitigate ransomware, such backups incur storage overheads and are still vulnerable as ransomware can obtain kernel privilege to stop or destroy backups. Ideally, we would like to defend against ransomware without relying on software-based solutions and without incurring the storage overheads of backups. To that end, this paper proposes FlashGuard, a ransomware tolerant Solid State Drive (SSD) which has a firmware-level recovery system that allows quick and effective recovery from encryption ransomware without relying on explicit backups. FlashGuard leverages the observation that the existing SSD already performs out-of-place writes in order to mitigate the long erase latency of flash memories. Therefore, when a page is updated or deleted, the older copy of that page is anyway present in the SSD. FlashGuard slightly modifies the garbage collection mechanism of the SSD to retain the copies of the data encrypted by ransomware and ensure effective data recovery. Our experiments with 1,447 manually labeled ransomware samples show that FlashGuard can efficiently restore files encrypted by ransomware. In addition, we demonstrate that FlashGuard has a negligible impact on the performance and lifetime of the SSD.

68 citations


Proceedings ArticleDOI
14 Oct 2017
TL;DR: It is shown that 99.999% of the instructions in the instruction stream of a typical quantum workload stem from error correction, and an architecture that delegates the task of quantum error correction to the hardware is proposed, QuEST (Quantum Error-Correction Substrate), which reduces instruction bandwidth demand of several key workloads by ftve orders of magnitude.
Abstract: A quantum computer consists of quantum bits (qubits) and a control processor that acts as an interface between the programmer and the qubits. As qubits are very sensitive to noise, they rely on continuous error correction to maintain the correct state. Current proposals rely on software-managed error correction and require large instruction bandwidth, which must scale in proportion to the number of qubits. While such a design may be reasonable for small-scale quantum computers, we show that instruction bandwidth tends to become a critical bottleneck for scaling quantum computers. In this paper, we show that 99.999% of the instructions in the instruction stream of a typical quantum workload stem from error correction. Using this observation, we propose QuEST (Quantum Error-Correction Substrate), an architecture that delegates the task of quantum error correction to the hardware. QuEST uses a dedicated programmable micro-coded engine to continuously replay the instruction stream associated with error correction. The instruction bandwidth requirement of QuEST scales in proportion to the number of active qubits (typically < < 0.1%) rather than the total number of qubits. We analyze the effectiveness of QuEST with area and thermal constraints and propose a scalable microarchitecture using typical Quantum Error Correction Code (QECC) execution patterns. Our evaluations show that QuEST reduces instruction bandwidth demand of several key workloads by ftve orders of magnitude while ensuring deterministic instruction delivery. Apart from error correction, we also observe a large instruction bandwidth requirement for fault tolerant quantum instructions (magic state distillation). We extend QuEST to manage these instructions in hardware and provide additional reduction in bandwidth. With QuEST, we reduce the total instruction bandwidth by eight orders of magnitude. CCS CONCEPTS • Computer systems organization → Quantum computing;

52 citations


Proceedings ArticleDOI
02 Oct 2017
TL;DR: The minimum operational temperature for 55 DIMMs is reported and a significant fraction of DRAM chips continue to work at temperatures as low as 80K, which is an initial step towards evaluating the effectiveness of cryogenic DRAM as a main memory for quantum computers.
Abstract: A quantum computer can solve fundamentally difficult problems by utilizing properties of quantum bits (qubits). It consists of a quantum substrate, connected to a conventional computer, termed as control processor. A control processor can manipulate and measure the state of the qubits and act as an interface between qubits and the programmer. Unfortunately, qubits are extremely noise-sensitive, and to minimize the noise; qubits are operated at cryogenic temperatures. To build a scalable quantum computer, a control processor which can work at cryogenic temperatures is essential [3, 14]. In this paper, we focus on the challenges of building a memory system for a cryogenic control processor. A scalable quantum computer will require large memory capacity for storing the program and the data generated by the quantum error correction. To this end, we evaluate the feasibility of cryogenic DRAM-based memory system by characterizing commercial DRAM modules at cryogenic temperatures. In this paper, we report the minimum operational temperature for 55 DIMMs (consisting of a total of 750 DRAM chips) and analyze the error patterns in commodity DRAM devices operated at cryogenic temperatures. Our study shows that a significant fraction of DRAM chips continue to work at temperatures as low as 80K. This study is an initial step towards evaluating the effectiveness of cryogenic DRAM as a main memory for quantum computers.

43 citations


Proceedings ArticleDOI
24 Jun 2017
TL;DR: DICE is proposed, a dynamic design that can adapt between spatial indexing and TSI, depending on the compressibility of the data, and low-cost Cache Index Predictors (CIP) that can accurately predict the cache indexing scheme on access in order to avoid probing both indices for retrieving a given cache line.
Abstract: This paper investigates compression for DRAM caches. As the capacity of DRAM cache is typically large, prior techniques on cache compression, which solely focus on improving cache capacity, provide only a marginal benefit. We show that more performance benefit can be obtained if the compression of the DRAM cache is tailored to provide higher bandwidth. If a DRAM cache can provide two compressed lines in a single access, and both lines are useful, the effective bandwidth of the DRAM cache would double. Unfortunately, it is not straight-forward to compress DRAM caches for bandwidth. The typically used Traditional Set Indexing (TSI) maps consecutive lines to consecutive sets, so the multiple compressed lines obtained from the set are from spatially distant locations and unlikely to be used within a short period of each other. We can change the indexing of the cache to place consecutive lines in the same set to improve bandwidth; however, when the data is incompressible, such spatial indexing reduces effective capacity and causes significant slowdown.Ideally, we would like to have spatial indexing when the data is compressible and TSI otherwise. To this end, we propose Dynamic-Indexing Cache comprEssion (DICE), a dynamic design that can adapt between spatial indexing and TSI, depending on the compressibility of the data. We also propose low-cost Cache Index Predictors (CIP) that can accurately predict the cache indexing scheme on access in order to avoid probing both indices for retrieving a given cache line. Our studies with a 1GB DRAM cache, on a wide range of workloads (including SPEC and Graph), show that DICE improves performance by 19.0% and reduces energy-delay-product by 36% on average. DICE is within 3% of a design that has double the capacity and double the bandwidth. DICE incurs a storage overhead of less than 1KB and does not rely on any OS support.

41 citations


Proceedings ArticleDOI
02 Oct 2017
TL;DR: Bandwidth-Aware Tiered-Memory Management (BATMAN), a runtime mechanism that manages the distribution of memory accesses in a tiered-memory system by explicitly controlling data movement and incurs only an eight-byte hardware overhead and requires negligible software modification is proposed.
Abstract: Tiered-memory systems consist of high-bandwidth 3D-DRAM and high-capacity commodity-DRAM. Conventional designs attempt to improve system performance by maximizing the number of memory accesses serviced by 3D-DRAM. However, when the commodity-DRAM bandwidth is a significant fraction of overall system bandwidth, the techniques inefficiently utilize the total bandwidth offered by the tiered-memory system and yields sub-optimal performance. In such situations, the performance can be improved by distributing memory accesses that are proportional to the bandwidth of each memory. Ideally, we want a simple and effective runtime mechanism that achieves the desired access distribution without requiring significant hardware or software support. This paper proposes Bandwidth-Aware Tiered-Memory Management (BATMAN), a runtime mechanism that manages the distribution of memory accesses in a tiered-memory system by explicitly controlling data movement. BATMAN monitors the number of accesses to both memories, and when the number of 3D-DRAM accesses exceeds the desired threshold, BATMAN disallows data movement from the commodity-DRAM to 3D-DRAM and proactively moves data from 3D-DRAM to commodity-DRAM. We demonstrate BATMAN on systems that architect the 3D-DRAM as either a hardware-managed cache (cache mode) or a part of the OS-visible memory space (flat mode). Our evaluations on a system with 4GB 3D-DRAM and 32GB commodity-DRAM show that BATMAN improves performance by an average of 11% and 10% and energy-delay product by 13% and 11% for systems in the cache and flat modes, respectively. BATMAN incurs only an eight-byte hardware overhead and requires negligible software modification.

29 citations


Journal ArticleDOI
TL;DR: The guest editors introduce the Top Picks and Honorable Mentions from the 2016 computer architecture conferences.
Abstract: The guest editors introduce the Top Picks and Honorable Mentions from the 2016 computer architecture conferences.

3 citations