scispace - formally typeset
Search or ask a question
Author

Nandakishore Santhi

Bio: Nandakishore Santhi is an academic researcher from Los Alamos National Laboratory. The author has contributed to research in topics: Memory model & CUDA. The author has an hindex of 13, co-authored 53 publications receiving 469 citations. Previous affiliations of Nandakishore Santhi include University of California, San Diego.


Papers
More filters
Posted Content
TL;DR: This review aims to explain the principles of quantum programming, which are quite different from classical programming, with straightforward algebra that makes understanding of the underlying fascinating quantum mechanical principles optional.
Abstract: As quantum computers become available to the general public, the need has arisen to train a cohort of quantum programmers, many of whom have been developing classical computer programs for most of their careers. While currently available quantum computers have less than 100 qubits, quantum computing hardware is widely expected to grow in terms of qubit count, quality, and connectivity. This review aims to explain the principles of quantum programming, which are quite different from classical programming, with straightforward algebra that makes understanding of the underlying fascinating quantum mechanical principles optional. We give an introduction to quantum computing algorithms and their implementation on real quantum hardware. We survey 20 different quantum algorithms, attempting to describe each in a succinct and self-contained fashion. We show how these algorithms can be implemented on IBM's quantum computer, and in each case, we discuss the results of the implementation with respect to differences between the simulator and the actual hardware runs. This article introduces computer scientists, physicists, and engineers to quantum algorithms and provides a blueprint for their implementations.

173 citations

Journal ArticleDOI
TL;DR: This paper presents PPT-GPU, a scalable and accurate simulation framework that enables GPU code developers and architects to predict the performance of applications in a fast, and accurate manner on different GPU architectures.
Abstract: Performance modeling is a challenging problem due to the complexities of hardware architectures. In this paper, we present PPT-GPU, a scalable and accurate simulation framework that enables GPU code developers and architects to predict the performance of applications in a fast, and accurate manner on different GPU architectures. PPT-GPU is part of the open source project, Performance Prediction Toolkit (PPT) developed at the Los Alamos National Laboratory. We extend the old GPU model in PPT that predict the runtimes of computational physics codes to offer better prediction accuracy, for which, we add models for different memory hierarchies found in GPUs and latencies for different instructions. To further show the utility of PPT-GPU, we compare our model against real GPU device(s) and the widely used cycle-accurate simulator, GPGPU-Sim using different workloads from RODINIA and Parboil benchmarks. The results indicate that the predicted performance of PPT-GPU is within a 10 percent error compared to the real device(s). In addition, PPT-GPU is highly scalable, where it is up to 450x faster than GPGPU-Sim with more accurate results.

31 citations

Posted Content
TL;DR: The problem of estimating the probability of error in multi-hypothesis testing when MAP criterion is used and a lower bound on equivocation valid for most random codes over memoryless channels is proved.
Abstract: We consider the problem of estimating the probability of error in multi-hypothesis testing when MAP criterion is used. This probability, which is also known as the Bayes risk is an important measure in many communication and information theory problems. In general, the exact Bayes risk can be difficult to obtain. Many upper and lower bounds are known in literature. One such upper bound is the equivocation bound due to R\'enyi which is of great philosophical interest because it connects the Bayes risk to conditional entropy. Here we give a simple derivation for an improved equivocation bound. We then give some typical examples of problems where these bounds can be of use. We first consider a binary hypothesis testing problem for which the exact Bayes risk is difficult to derive. In such problems bounds are of interest. Furthermore using the bounds on Bayes risk derived in the paper and a random coding argument, we prove a lower bound on equivocation valid for most random codes over memoryless channels.

28 citations

Proceedings ArticleDOI
06 Dec 2015
TL;DR: Simian, a family of open-source Parallel Discrete Event Simulation (PDES) engines written using Lua and Python, achieves high performance comparable with the state-of-the-art PDES engines implemented using compiled languages such as C or C++.
Abstract: We introduce Simian, a family of open-source Parallel Discrete Event Simulation (PDES) engines written using Lua and Python. Simian reaps the benefits of interpreted languages---ease of use, fast development time, enhanced readability and a high degree of portability on different platforms---and, through the optional use of Just-In-Time (JIT) compilation, achieves high performance comparable with the state-of-the-art PDES engines implemented using compiled languages such as C or C++. This paper describes the main design concepts of Simian, and presents a benchmark performance study, comparing four Simian implementations (written in Python and Lua, with and without using JIT) against a traditionally compiled simulator, MiniSSF, written in C++. Our experiments show that Simian in Lua with JIT outperforms MiniSSF, sometimes by a factor of three under high computational workloads.

23 citations

Proceedings ArticleDOI
01 Sep 2019
TL;DR: A very low overhead and portable analysis for exposing the latency of each instruction executing in the GPU pipeline(s) and the access overhead of the various memory hierarchies found in GPUs at the micro-architecture level is introduced.
Abstract: The last decade has seen a shift in the computer systems industry where heterogeneous computing has become prevalent. Graphics Processing Units (GPUs) are now present in supercomputers to mobile phones and tablets. GPUs are used for graphics operations as well as general-purpose computing (GPGPUs) to boost the performance of compute-intensive applications. However, the percentage of undisclosed characteristics beyond what vendors provide is not small. In this paper, we introduce a very low overhead and portable analysis for exposing the latency of each instruction executing in the GPU pipeline(s) and the access overhead of the various memory hierarchies found in GPUs at the micro-architecture level. Furthermore, we show the impact of the various optimizations the CUDA compiler can perform over the various latencies. We perform our evaluation on seven different high-end NVIDIA GPUs from five different generations/architectures: Kepler, Maxwell, Pascal, Volta, and Turing. The results in this paper can help architects to have an accurate characterization of the latencies of these GPUs, which will help in modeling the hardware accurately. Also, software developers can perform informed optimizations to their applications.

21 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: An overview of the field of Variational Quantum Algorithms is presented and strategies to overcome their challenges as well as the exciting prospects for using them as a means to obtain quantum advantage are discussed.
Abstract: Applications such as simulating complicated quantum systems or solving large-scale linear algebra problems are very challenging for classical computers due to the extremely high computational cost. Quantum computers promise a solution, although fault-tolerant quantum computers will likely not be available in the near future. Current quantum devices have serious constraints, including limited numbers of qubits and noise processes that limit circuit depth. Variational Quantum Algorithms (VQAs), which use a classical optimizer to train a parametrized quantum circuit, have emerged as a leading strategy to address these constraints. VQAs have now been proposed for essentially all applications that researchers have envisioned for quantum computers, and they appear to the best hope for obtaining quantum advantage. Nevertheless, challenges remain including the trainability, accuracy, and efficiency of VQAs. Here we overview the field of VQAs, discuss strategies to overcome their challenges, and highlight the exciting prospects for using them to obtain quantum advantage.

842 citations

Book ChapterDOI
01 Jan 1993
TL;DR: The paper is presented in two parts: the first, appearing here, summarizes the major results and treats the case of high transmission rates in detail; the second, to appear in the subsequent issue, treats the cases of low transmission rates.
Abstract: New lower bounds are presented for the minimum error probability that can be achieved through the use of block coding on noisy discrete memoryless channels. Like previous upper bounds, these lower bounds decrease exponentially with the block length N . The coefficient of N in the exponent is a convex function of the rate. From a certain rate of transmission up to channel capacity, the exponents of the upper and lower bounds coincide. Below this particular rate, the exponents of the upper and lower bounds differ, although they approach the same limit as the rate approaches zero. Examples are given and various incidental results and techniques relating to coding theory are developed. The paper is presented in two parts: the first, appearing here, summarizes the major results and treats the case of high transmission rates in detail; the second, to appear in the subsequent issue, treats the case of low transmission rates.

247 citations

Journal ArticleDOI
TL;DR: In this paper, the stopping redundancy of Copf was defined as the minimum number of rows in a parity-check matrix H for Copf such that the corresponding stopping distance s(H) attains its largest possible value.
Abstract: It is now well known that the performance of a linear code Copf under iterative decoding on a binary erasure channel (and other channels) is determined by the size of the smallest stopping set in the Tanner graph for Copf. Several recent papers refer to this parameter as the stopping distance s of Copf. This is somewhat of a misnomer since the size of the smallest stopping set in the Tanner graph for Copf depends on the corresponding choice of a parity-check matrix. It is easy to see that s les d, where d is the minimum Hamming distance of Copf, and we show that it is always possible to choose a parity-check matrix for Copf (with sufficiently many dependent rows) such that s=d. We thus introduce a new parameter, the stopping redundancy of Copf, defined as the minimum number of rows in a parity- check matrix H for Copf such that the corresponding stopping distance s(H) attains its largest possible value, namely, s(H)=d. We then derive general bounds on the stopping redundancy of linear codes. We also examine several simple ways of constructing codes from other codes, and study the effect of these constructions on the stopping redundancy. Specifically, for the family of binary Reed-Muller codes (of all orders), we prove that their stopping redundancy is at most a constant times their conventional redundancy. We show that the stopping redundancies of the binary and ternary extended Golay codes are at most 34 and 22, respectively. Finally, we provide upper and lower bounds on the stopping redundancy of MDS codes

207 citations