scispace - formally typeset
Search or ask a question

Showing papers by "Moinuddin K. Qureshi published in 2021"


Proceedings ArticleDOI
TL;DR: Adaptive Dynamical Decoupling (ADAPT) as mentioned in this paper is a software framework that estimates the efficacy of DD for each qubit combination and judiciously applies DD only to the subset of qubits that provide the most benefit.
Abstract: The fidelity of applications on near-term quantum computers is limited by hardware errors. In addition to errors that occur during gate and measurement operations, a qubit is susceptible to idling errors, which occur when the qubit is idle and not actively undergoing any operations. To mitigate idling errors, prior works in the quantum devices community have proposed Dynamical Decoupling (DD), that reduces stray noise on idle qubits by continuously executing a specific sequence of single-qubit operations that effectively behave as an identity gate. Unfortunately, existing DD protocols have been primarily studied for individual qubits and their efficacy at the application-level is not yet fully understood. Our experiments show that naively enabling DD for every idle qubit does not necessarily improve fidelity. While DD reduces the idling error-rates for some qubits, it increases the overall error-rate for others due to the additional operations of the DD protocol. Furthermore, idling errors are program-specific and the set of qubits that benefit from DD changes with each program. To enable robust use of DD, we propose Adaptive Dynamical Decoupling (ADAPT), a software framework that estimates the efficacy of DD for each qubit combination and judiciously applies DD only to the subset of qubits that provide the most benefit. ADAPT employs a Decoy Circuit, which is structurally similar to the original program but with a known solution, to identify the DD sequence that maximizes the fidelity. To avoid the exponential search of all possible DD combinations, ADAPT employs a localized algorithm that has linear complexity in the number of qubits. Our experiments on IBM quantum machines (with 16-27 qubits) show that ADAPT improves the application fidelity by 1.86x on average and up-to 5.73x compared to no DD and by 1.2x compared to DD on all qubits.

33 citations


Proceedings ArticleDOI
18 Oct 2021
TL;DR: Adaptive Dynamical Decoupling (ADAPT) as discussed by the authors is a software framework that estimates the efficacy of DD for each qubit combination and judiciously applies DD only to the subset of qubits that provide the most benefit.
Abstract: The fidelity of applications on near-term quantum computers is limited by hardware errors. In addition to errors that occur during gate and measurement operations, a qubit is susceptible to idling errors, which occur when the qubit is idle and not actively undergoing any operations. To mitigate idling errors, prior works in the quantum devices community have proposed Dynamical Decoupling (DD), that reduces stray noise on idle qubits by continuously executing a specific sequence of single-qubit operations that effectively behave as an identity gate. Unfortunately, existing DD protocols have been primarily studied for individual qubits and their efficacy at the application-level is not yet fully understood. Our experiments show that naively enabling DD for every idle qubit does not necessarily improve fidelity. While DD reduces the idling error-rates for some qubits, it increases the overall error-rate for others due to the additional operations of the DD protocol. Furthermore, idling errors are program-specific and the set of qubits that benefit from DD changes with each program. To enable robust use of DD, we propose Adaptive Dynamical Decoupling (ADAPT), a software framework that estimates the efficacy of DD for each qubit combination and judiciously applies DD only to the subset of qubits that provide the most benefit. ADAPT employs a Decoy Circuit, which is structurally similar to the original program but with a known solution, to identify the DD sequence that maximizes the fidelity. To avoid the exponential search of all possible DD combinations, ADAPT employs a localized algorithm that has linear complexity in the number of qubits. Our experiments on IBM quantum machines (with 16-27 qubits) show that ADAPT improves the application fidelity by 1.86x on average and up-to 5.73x compared to no DD and by 1.2x compared to DD on all qubits.

32 citations


Proceedings ArticleDOI
19 Apr 2021
TL;DR: This paper presents Streamline, a flush-less covert-channel attack faster than all prior known attacks, and achieves a bit-rate of 1801 KB/s, which is 3x to 3.6x faster than the previous fastest Take-a-Way and Flush+Flush attacks, at comparable error rates.
Abstract: Covert-channel attacks exploit contention on shared hardware resources such as processor caches to transmit information between colluding processes on the same system. In recent years, covert channels leveraging cacheline-flush instructions, such as Flush+Reload and Flush+Flush, have emerged as the fastest cross-core attacks. However, current attacks are limited in their applicability and bit-rate not due to any fundamental hardware limitations, but due to their protocol design requiring flush instructions and tight synchronization between sender and receiver, where both processes synchronize every bit-period to maintain low error-rates. In this paper, we present Streamline, a flush-less covert-channel attack faster than all prior known attacks. The key insight behind the higher channel bandwidth is asynchronous communication. Streamline communicates over a sequence of shared addresses (larger than the cache size), where the sender can move to the next address after transmitting each bit without waiting for the receiver. Furthermore, it ensures that addresses accessed by the sender are preserved in the cache until the receiver has accessed them. Finally, by the time the sender accesses the entire sequence and wraps around, the cache-thrashing property ensures that the previously transmitted addresses are automatically evicted from the cache without any cacheline flushes, which ensures functional correctness while simultaneously improving channel bandwidth. To orchestrate Streamline on a real system, we overcome multiple challenges, such as circumventing hardware optimizations (prefetching and replacement policy), and ensuring that the sender and receiver have similar execution rates. We demonstrate Streamline on an Intel Skylake CPU and show that it achieves a bit-rate of 1801 KB/s, which is 3x to 3.6x faster than the previous fastest Take-a-Way (588 KB/s) and Flush+Flush (496 KB/s) attacks, at comparable error rates. Unlike prior attacks, Streamline only relies on generic properties of caches and is applicable to processors of all ISAs (x86, ARM, etc.) and micro-architectures (Intel, AMD, etc.).

30 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed two techniques to improve the resiliency of DNNs: drift regularization (DR) and multiplicative noise training (MNT), which improved model accuracy by up to 12% over one month.
Abstract: Phase change memory (PCM)-based “Analog-AI” accelerators are gaining importance for inference in edge applications due to the energy efficiency offered by in-memory computing. Nevertheless, noise sources inherent to PCM devices cause inaccuracies in the deep neural network (DNN) weight values. Such inaccuracies can lead to severe degradation in model accuracy. To address this, we propose two techniques to improve noise resiliency of DNNs: 1) drift regularization (DR) and 2) multiplicative noise training (MNT). We evaluate convolutional networks trained on image classification and recurrent neural networks trained on language modeling and show that our techniques improve model accuracy by up to 12% over one month.

22 citations


Proceedings ArticleDOI
18 Oct 2021
TL;DR: JigSaw as mentioned in this paper reduces the impact of measurement errors by running a program in two modes: running the entire program and measuring all the qubits for half of the trials to produce a global (albeit noisy) histogram.
Abstract: Near-term quantum computers contain noisy devices, which makes it difficult to infer the correct answer even if a program is run for thousands of trials. On current machines, qubit measurements tend to be the most error-prone operations (with an average error-rate of 4%) and often limit the size of quantum programs that can be run reliably on these systems. As quantum programs create and manipulate correlated states, all the program qubits are measured in each trial and thus, the severity of measurement errors increases with the program size. The fidelity of quantum programs can be improved by reducing the number of measurement operations. We present JigSaw, a framework that reduces the impact of measurement errors by running a program in two modes. First, running the entire program and measuring all the qubits for half of the trials to produce a global (albeit noisy) histogram. Second, running additional copies of the program and measuring only a subset of qubits in each copy, for the remaining trials, to produce localized (higher fidelity) histograms over the measured qubits. JigSaw then employs a Bayesian post-processing step, whereby the histograms produced by the subset measurements are used to update the global histogram. Our evaluations using three different IBM quantum computers with 27 and 65 qubits show that JigSaw improves the success rate on average by 3.6x and up-to 8.4x. Our analysis shows that the storage and time complexity of JigSaw scales linearly with the number of qubits and trials, making JigSaw applicable to programs with hundreds of qubits.

17 citations


Proceedings ArticleDOI
01 Jun 2021
TL;DR: Ma et al. as discussed by the authors proposed a data-free model stealing attack using zeroth-order gradient estimation that produces high-accuracy clones using only synthetic data created using a generative model to perform model stealing.
Abstract: High quality Machine Learning (ML) models are often considered valuable intellectual property by companies. Model Stealing (MS) attacks allow an adversary with blackbox access to a ML model to replicate its functionality by training a clone model using the predictions of the target model for different inputs. However, best available existing MS attacks fail to produce a high-accuracy clone without access to the target dataset or a representative dataset necessary to query the target model. In this paper, we show that preventing access to the target dataset is not an adequate defense to protect a model. We propose MAZE – a data-free model stealing attack using zeroth-order gradient estimation that produces high-accuracy clones. In contrast to prior works, MAZE uses only synthetic data created using a generative model to perform MS.Our evaluation with four image classification models shows that MAZE provides a normalized clone accuracy in the range of 0.90× to 0.99×, and outperforms even the recent attacks that rely on partial data (JBDA, clone accuracy 0.13× to 0.69×) and on surrogate data (KnockoffNets, clone accuracy 0.52× to 0.97×). We also study an extension of MAZE in the partial-data setting, and develop MAZE-PD, which generates synthetic data closer to the target distribution. MAZE-PD further improves the clone accuracy (0.97× to 1.0×) and reduces the query budget required for the attack by 2×-24×.

16 citations


Proceedings ArticleDOI
TL;DR: JigSaw as mentioned in this paper is a framework that reduces the impact of measurement errors by running a program in two modes: running the entire program and measuring all the qubits for half of the trials to produce a global (albeit noisy) histogram.
Abstract: Near-term quantum computers contain noisy devices, which makes it difficult to infer the correct answer even if a program is run for thousands of trials. On current machines, qubit measurements tend to be the most error-prone operations (with an average error-rate of 4%) and often limit the size of quantum programs that can be run reliably on these systems. As quantum programs create and manipulate correlated states, all the program qubits are measured in each trial and thus, the severity of measurement errors increases with the program size. The fidelity of quantum programs can be improved by reducing the number of measurement operations. We present JigSaw, a framework that reduces the impact of measurement errors by running a program in two modes. First, running the entire program and measuring all the qubits for half of the trials to produce a global (albeit noisy) histogram. Second, running additional copies of the program and measuring only a subset of qubits in each copy, for the remaining trials, to produce localized (higher fidelity) histograms over the measured qubits. JigSaw then employs a Bayesian post-processing step, whereby the histograms produced by the subset measurements are used to update the global histogram. Our evaluations using three different IBM quantum computers with 27 and 65 qubits show that JigSaw improves the success rate on average by 3.6x and up-to 8.4x. Our analysis shows that the storage and time complexity of JigSaw scales linearly with the number of qubits and trials, making JigSaw applicable to programs with hundreds of qubits.

15 citations


Journal ArticleDOI
TL;DR: In this article, the effect of error on quantum programs is analyzed and it is shown that error can have a significant impact on the behavior of quantum programs. But the analysis is limited to quantum systems.
Abstract: When a fault occurs in a computational system, it is of interest what effects it has. If it is known what can go wrong, it may also be known how to mitigate or correct for it. While complex, it is possible to obtain this information with rigorous fault injection in classical systems. This is also desirable for quantum systems, unfortunately it is much more difficult. The exponential information content of quantum systems makes this likely impossible for larger systems. However, analyses on smaller systems may provide valuable insight into the behavior of larger systems. This letter analyzes the effect of error on different parts of quantum programs.

12 citations


Proceedings Article
01 Jan 2021
TL;DR: Mirage as mentioned in this paper proposes to select eviction candidates randomly from all lines resident in the cache, to be immune to set-conflicts, thus offering a principled defense against any eviction-set discovery and any potential conflict based attacks.
Abstract: Shared processor caches are vulnerable to conflict-based side-channel attacks, where an attacker can monitor access patterns of a victim by evicting victim cache lines using cache-set conflicts. Recent mitigations propose randomized mapping of addresses to cache lines to obfuscate the locations of set-conflicts. However, these are vulnerable to new attacks that discover conflicting sets of addresses despite such mitigations, because these designs select eviction-candidates from a small set of conflicting lines. This paper presents Mirage, a practical design for a fully associative cache, wherein eviction candidates are selected randomly from all lines resident in the cache, to be immune to set-conflicts. A key challenge for enabling such designs in large shared caches (containing tens of thousands of cache lines) is the complexity of cache-lookup, as a naive design can require searching through all the resident lines. Mirage achieves full-associativity while retaining practical set-associative lookups by decoupling placement and replacement, using pointer-based indirection from tag-store to data-store to allow a newly installed address to globally evict the data of any random resident line. To eliminate set-conflicts, Mirage provisions extra invalid tags in a skewed-associative tag-store design where lines can be installed without set-conflict, along with a load-aware skew-selection policy that guarantees the availability of sets with invalid tags. Our analysis shows Mirage provides the global eviction property of a fully-associative cache throughout system lifetime (violations of full-associativity, i.e. set-conflicts, occur less than once in 10^4 to 10^17 years), thus offering a principled defense against any eviction-set discovery and any potential conflict based attacks. Mirage incurs limited slowdown (2%) and 17-20% extra storage compared to a non-secure cache.

4 citations


Journal ArticleDOI
TL;DR: In this paper, the compiler and hardware support for enabling reliable and scalable quantum computers is described, where the compiler is described as well as the hardware support as described in this paper.
Abstract: Quantum computers are domain-specific accelerators that can solve commercially important problems that are beyond the capability of conventional computers. Quantum computing is at an inflection point where small systems with a few dozen qubits have been demonstrated and the number of qubits is expected to increase to several hundred over the coming years. However, the qubit quality is likely to remain quite low making it difficult to execute programs reliably. Furthermore, scaling the designs to a large number of qubits would require specialized hardware that operates at cryogenic conditions. This article describes the compiler and hardware support for enabling reliable and scalable quantum computers.

3 citations


DOI
01 Sep 2021
TL;DR: BCE as discussed by the authors is a secure cache partitioning substrate that is scalable in supporting hundreds of isolated cache partitions and is flexible in allocating cache space independent of memory allocations, but it requires coupled DRAM and LLC allocations in the same ratio.
Abstract: Cache partitioning is a principled defense against side-channel attacks on shared last-level caches (LLCs). Such defenses allocate isolated cache regions to distrusting applications and prevent a spy from monitoring the cache accesses of a victim. But current solutions have severe practical limitations. Way-partitioning is not scalable as the number of partitions is limited by cache associativity and page-coloring is inflexible as it requires coupled DRAM and LLC allocations in the same ratio. For cache partitioning to be practical, we need a scheme that can scale to a large number of fine-grained partitions and places no restrictions on DRAM allocations.This paper proposes Bespoke Cache Enclaves (BCE), a secure cache partitioning substrate that is scalable in supporting hundreds of isolated cache partitions and is flexible in allocating cache space independent of memory allocations. BCE allocates cache space at the granularity of a cluster, a group of a few sets (e.g., 64 KB in size). The key insight of BCE is a configurable cache indexing function (determining the line to set mapping) that guides cache lines of a domain to only the allocated cache sets, enabling flexible set-partitioning independent of memory allocations. BCE achieves this by modifying the cache indexing hardware to include a Cluster-Indirection Module (CIM), which maps logical-to-physical clusters of a domain and a Load-Balancing Hash (LBH), which uniformly distributes lines of a domain among its clusters. Our implementation of BCE with a 32MB 16-way LLC scalably supports up to 512 isolated partitions while incurring negligible storage overheads (<2%) and slowdown (1% on average) compared to a non-secure unpartitioned LLC.

Proceedings Article
03 May 2021
TL;DR: Ensemble of Diverse Models (EDM) as mentioned in this paper is proposed to defend against model stealing attacks by using a different member of the ensemble to service different queries, producing predictions that are highly discontinuous in the input space for the adversary's OOD queries.
Abstract: Several recent works have demonstrated highly effective model stealing (MS) attacks on Deep Neural Networks (DNNs) in black-box settings, even when the training data is unavailable. These attacks typically use some form of Out of Distribution (OOD) data to query the target model and use the predictions obtained to train a clone model. Such a clone model learns to approximate the decision boundary of the target model, achieving high accuracy on in-distribution examples. We propose Ensemble of Diverse Models (EDM) to defend against such MS attacks. EDM is made up of models that are trained to produce dissimilar predictions for OOD inputs. By using a different member of the ensemble to service different queries, our defense produces predictions that are highly discontinuous in the input space for the adversary's OOD queries. Such discontinuities cause the clone model trained on these predictions to have poor generalization on in-distribution examples. Our evaluations on several image classification tasks demonstrate that EDM defense can severely degrade the accuracy of clone models (up to 39.7%). Our defense has minimal impact on the target accuracy, negligible computational costs during inference, and is compatible with existing defenses for MS attacks.

Journal ArticleDOI
TL;DR: Smart Quantization (SmaQ) as mentioned in this paper quantizes the sampled mean and standard deviation of tensors and quantizes each tensor element to 6 or 8 bits based on the z-score of that value.
Abstract: Advancements in modern deep learning have shown that deeper networks with larger datasets can achieve state of the art results in many different tasks. As networks become deeper, the memory requirement of neural network training proves to be the primary bottleneck of single-machine training. In this letter, we first study the characteristics of neural network weight, gradient, feature map, gradient map, and optimizer state distributions for some popular neural network architectures. Our investigation shows that the majority of the data structures used by neural networks can have their value distributions be approximated with normal distributions. We then introduce Smart Quantization (SmaQ), a quantization scheme that exploits this observed normal distribution to quantize the data structures. Our dynamic quantization method calculates the sampled mean and standard deviation of tensors and quantizes each tensor element to 6 or 8 bits based on the z-score of that value. Our scheme reduces the memory usage during training by up to 6.7x with minor losses in accuracy.

Posted Content
TL;DR: In this article, an ensemble of quantum machine instructions (QMIs) is generated by adding controlled perturbations to the program QMI to steer the program away from encountering the same bias during all trials.
Abstract: Quantum computing is an information processing paradigm that uses quantum-mechanical properties to speedup computationally hard problems. Although promising, existing gate-based quantum computers consist of only a few dozen qubits and are not large enough for most applications. On the other hand, existing QAs with few thousand of qubits have the potential to solve some domain-specific optimization problems. QAs are single instruction machines and to execute a program, the problem is cast to a Hamiltonian, embedded on the hardware, and a single quantum machine instruction (QMI) is run. Unfortunately, noise and imperfections in hardware result in sub-optimal solutions on QAs even if the QMI is run for thousands of trials. The limited programmability of QAs mean that the user executes the same QMI for all trials. This subjects all trials to a similar noise profile throughout the execution, resulting in a systematic bias. We observe that systematic bias leads to sub-optimal solutions and cannot be alleviated by executing more trials or using existing error-mitigation schemes. To address this challenge, we propose EQUAL (Ensemble Quantum Annealing). EQUAL generates an ensemble of QMIs by adding controlled perturbations to the program QMI. When executed on the QA, the ensemble of QMIs steers the program away from encountering the same bias during all trials and thus, improves the quality of solutions. Our evaluations using the 2041-qubit D-Wave QA show that EQUAL bridges the difference between the baseline and the ideal by an average of 14% (and up to 26%), without requiring any additional trials. EQUAL can be combined with existing error mitigation schemes to further bridge the difference between the baseline and ideal by an average of 55% (and up to 68%).

Posted Content
TL;DR: In this paper, the authors proposed Adaptive Noise Injection (ANI), which uses a light-weight DNN on the client-side to inject noise to each input, before transmitting it to the service provider to perform inference.
Abstract: User-facing software services are becoming increasingly reliant on remote servers to host Deep Neural Network (DNN) models, which perform inference tasks for the clients Such services require the client to send input data to the service provider, who processes it using a DNN and returns the output predictions to the client Due to the rich nature of the inputs such as images and speech, the input often contains more information than what is necessary to perform the primary inference task Consequently, in addition to the primary inference task, a malicious service provider could infer secondary (sensitive) attributes from the input, compromising the client's privacy The goal of our work is to improve inference privacy by injecting noise to the input to hide the irrelevant features that are not conducive to the primary classification task To this end, we propose Adaptive Noise Injection (ANI), which uses a light-weight DNN on the client-side to inject noise to each input, before transmitting it to the service provider to perform inference Our key insight is that by customizing the noise to each input, we can achieve state-of-the-art trade-off between utility and privacy (up to 485% degradation in sensitive-task accuracy with <1% degradation in primary accuracy), significantly outperforming existing noise injection schemes Our method does not require prior knowledge of the sensitive attributes and incurs minimal computational overheads

Posted Content
TL;DR: Gradient Inversion Attack (GIA) as discussed by the authors is a label leakage attack that allows an adversarial input owner to learn the label owner's private labels by exploiting the gradient information obtained during split learning.
Abstract: Split learning is a popular technique used to perform vertical federated learning, where the goal is to jointly train a model on the private input and label data held by two parties. To preserve privacy of the input and label data, this technique uses a split model and only requires the exchange of intermediate representations (IR) of the inputs and gradients of the IR between the two parties during the learning process. In this paper, we propose Gradient Inversion Attack (GIA), a label leakage attack that allows an adversarial input owner to learn the label owner's private labels by exploiting the gradient information obtained during split learning. GIA frames the label leakage attack as a supervised learning problem by developing a novel loss function using certain key properties of the dataset and models. Our attack can uncover the private label data on several multi-class image classification problems and a binary conversion prediction task with near-perfect accuracy (97.01% - 99.96%), demonstrating that split learning provides negligible privacy benefits to the label owner. Furthermore, we evaluate the use of gradient noise to defend against GIA. While this technique is effective for simpler datasets, it significantly degrades utility for datasets with higher input dimensionality. Our findings underscore the need for better privacy-preserving training techniques for vertically split data.

Patent
04 Mar 2021
TL;DR: In this paper, a quantum computing method including receiving program instructions, executing the program instructions on a quantum machine for a plurality of trials, combining the plurality of results, and determining a program result of the execution based on the combination of the results.
Abstract: A quantum computing method including: receiving program instructions; executing the program instructions on a quantum machine for a plurality of trials to generate a plurality of results; combining the plurality of results of the execution for each of the plurality of trials; and determining a program result of the execution based on the combination of the plurality of results.