Home
/
Authors
/
Nandakishore Santhi

Author

Nandakishore Santhi

Other affiliations: University of California, San Diego

Bio: Nandakishore Santhi is an academic researcher from Los Alamos National Laboratory. The author has contributed to research in topics: Memory model & CUDA. The author has an hindex of 13, co-authored 53 publications receiving 469 citations. Previous affiliations of Nandakishore Santhi include University of California, San Diego.

Topics: Memory model, CUDA, Cache, Benchmark (computing), CPU cache ...read more

Papers published on a yearly basis

2022
2021
2020
2019
2018
2017
2016
2015
2012
2011
2010
2008
2007
2006
2004
2003

Papers

PDF

Open Access

More filters

Posted Content•

Quantum Algorithm Implementations for Beginners

[...]

Patrick J. Coles, Stephan Eidenbenz, Scott Pakin, Adetokunbo Adedoyin, John Ambrosiano, Petr M. Anisimov, William Casper, Gopinath Chennupati, Carleton Coffrin, Hristo N. Djidjev, David Gunter, Satish Karra, Nathan Lemons, Shi-Zeng Lin, Andrey Y. Lokhov, Alexander Malyzhenkov, David Dennis Lee Mascarenas, Susan M. Mniszewski, Balu Nadiga, Daniel O'Malley, Diane Oyen, Lakshman Prasad, Randy Roberts, Philip Romero, Nandakishore Santhi, Nikolai A. Sinitsyn, Pieter J. Swart, Marc Vuffray, James Wendelberger, Boram Yoon, Richard J. Zamora, Wei Zhu - Show less +28 more

10 Apr 2018-arXiv: Emerging Technologies

TL;DR: This review aims to explain the principles of quantum programming, which are quite different from classical programming, with straightforward algebra that makes understanding of the underlying fascinating quantum mechanical principles optional.

...read moreread less

Abstract: As quantum computers become available to the general public, the need has arisen to train a cohort of quantum programmers, many of whom have been developing classical computer programs for most of their careers. While currently available quantum computers have less than 100 qubits, quantum computing hardware is widely expected to grow in terms of qubit count, quality, and connectivity. This review aims to explain the principles of quantum programming, which are quite different from classical programming, with straightforward algebra that makes understanding of the underlying fascinating quantum mechanical principles optional. We give an introduction to quantum computing algorithms and their implementation on real quantum hardware. We survey 20 different quantum algorithms, attempting to describe each in a succinct and self-contained fashion. We show how these algorithms can be implemented on IBM's quantum computer, and in each case, we discuss the results of the implementation with respect to differences between the simulator and the actual hardware runs. This article introduces computer scientists, physicists, and engineers to quantum algorithms and provides a blueprint for their implementations.

...read moreread less

173 citations

Journal Article•DOI•

PPT-GPU: Scalable GPU Performance Modeling

[...]

Yehia Arafa¹, Abdel-Hameed A. Badawy¹, Gopinath Chennupati², Nandakishore Santhi², Stephan Eidenbenz² - Show less +1 more•Institutions (2)

New Mexico State University¹, Los Alamos National Laboratory²

01 Jan 2019-IEEE Computer Architecture Letters

TL;DR: This paper presents PPT-GPU, a scalable and accurate simulation framework that enables GPU code developers and architects to predict the performance of applications in a fast, and accurate manner on different GPU architectures.

...read moreread less

Abstract: Performance modeling is a challenging problem due to the complexities of hardware architectures. In this paper, we present PPT-GPU, a scalable and accurate simulation framework that enables GPU code developers and architects to predict the performance of applications in a fast, and accurate manner on different GPU architectures. PPT-GPU is part of the open source project, Performance Prediction Toolkit (PPT) developed at the Los Alamos National Laboratory. We extend the old GPU model in PPT that predict the runtimes of computational physics codes to offer better prediction accuracy, for which, we add models for different memory hierarchies found in GPUs and latencies for different instructions. To further show the utility of PPT-GPU, we compare our model against real GPU device(s) and the widely used cycle-accurate simulator, GPGPU-Sim using different workloads from RODINIA and Parboil benchmarks. The results indicate that the predicted performance of PPT-GPU is within a 10 percent error compared to the real device(s). In addition, PPT-GPU is highly scalable, where it is up to 450x faster than GPGPU-Sim with more accurate results.

...read moreread less

31 citations

Posted Content•

On an Improvement over Rényi's Equivocation Bound

[...]

Nandakishore Santhi¹, Alexander Vardy¹•Institutions (1)

University of California, San Diego¹

22 Aug 2006-arXiv: Information Theory

TL;DR: The problem of estimating the probability of error in multi-hypothesis testing when MAP criterion is used and a lower bound on equivocation valid for most random codes over memoryless channels is proved.

...read moreread less

Abstract: We consider the problem of estimating the probability of error in multi-hypothesis testing when MAP criterion is used. This probability, which is also known as the Bayes risk is an important measure in many communication and information theory problems. In general, the exact Bayes risk can be difficult to obtain. Many upper and lower bounds are known in literature. One such upper bound is the equivocation bound due to R\'enyi which is of great philosophical interest because it connects the Bayes risk to conditional entropy. Here we give a simple derivation for an improved equivocation bound. We then give some typical examples of problems where these bounds can be of use. We first consider a binary hypothesis testing problem for which the exact Bayes risk is difficult to derive. In such problems bounds are of interest. Furthermore using the bounds on Bayes risk derived in the paper and a random coding argument, we prove a lower bound on equivocation valid for most random codes over memoryless channels.

...read moreread less

28 citations

Proceedings Article•DOI•

The simian concept: parallel discrete event simulation with interpreted languages and just-in-time compilation

[...]

Nandakishore Santhi¹, Stephan Eidenbenz¹, Jason Liu²•Institutions (2)

Los Alamos National Laboratory¹, Florida International University²

06 Dec 2015

TL;DR: Simian, a family of open-source Parallel Discrete Event Simulation (PDES) engines written using Lua and Python, achieves high performance comparable with the state-of-the-art PDES engines implemented using compiled languages such as C or C++.

...read moreread less

Abstract: We introduce Simian, a family of open-source Parallel Discrete Event Simulation (PDES) engines written using Lua and Python. Simian reaps the benefits of interpreted languages---ease of use, fast development time, enhanced readability and a high degree of portability on different platforms---and, through the optional use of Just-In-Time (JIT) compilation, achieves high performance comparable with the state-of-the-art PDES engines implemented using compiled languages such as C or C++. This paper describes the main design concepts of Simian, and presents a benchmark performance study, comparing four Simian implementations (written in Python and Lua, with and without using JIT) against a traditionally compiled simulator, MiniSSF, written in C++. Our experiments show that Simian in Lua with JIT outperforms MiniSSF, sometimes by a factor of three under high computational workloads.

...read moreread less

23 citations

Proceedings Article•DOI•

Low Overhead Instruction Latency Characterization for NVIDIA GPGPUs

[...]

Yehia Arafa¹, Abdel-Hameed A. Badawy¹, Gopinath Chennupati², Nandakishore Santhi², Stephan Eidenbenz² - Show less +1 more•Institutions (2)

New Mexico State University¹, Los Alamos National Laboratory²

01 Sep 2019

TL;DR: A very low overhead and portable analysis for exposing the latency of each instruction executing in the GPU pipeline(s) and the access overhead of the various memory hierarchies found in GPUs at the micro-architecture level is introduced.

...read moreread less

Abstract: The last decade has seen a shift in the computer systems industry where heterogeneous computing has become prevalent. Graphics Processing Units (GPUs) are now present in supercomputers to mobile phones and tablets. GPUs are used for graphics operations as well as general-purpose computing (GPGPUs) to boost the performance of compute-intensive applications. However, the percentage of undisclosed characteristics beyond what vendors provide is not small. In this paper, we introduce a very low overhead and portable analysis for exposing the latency of each instruction executing in the GPU pipeline(s) and the access overhead of the various memory hierarchies found in GPUs at the micro-architecture level. Furthermore, we show the impact of the various optimizations the CUDA compiler can perform over the various latencies. We perform our evaluation on seven different high-end NVIDIA GPUs from five different generations/architectures: Kepler, Maxwell, Pascal, Volta, and Turing. The results in this paper can help architects to have an accurate characterization of the latencies of these GPUs, which will help in modeling the hardware accurately. Also, software developers can perform informed optimizations to their applications.

...read moreread less

21 citations

1
2
3
4
…
5
6
7
8
9
10
11
12

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Variational Quantum Algorithms

[...]

Marco Cerezo¹, Marco Cerezo², Andrew Arrasmith², Andrew Arrasmith¹, Ryan Babbush³, Simon C. Benjamin⁴, Suguru Endo⁵, Keisuke Fujii⁶, Jarrod R. McClean³, Kosuke Mitarai⁶, Kosuke Mitarai⁷, Xiao Yuan⁸, Xiao Yuan⁹, Lukasz Cincio², Lukasz Cincio¹, Patrick J. Coles¹, Patrick J. Coles² - Show less +13 more•Institutions (9)

Oak Ridge National Laboratory¹, Los Alamos National Laboratory², Google³, University of Oxford⁴, Nippon Telegraph and Telephone⁵, Osaka University⁶, National Presto Industries⁷, Peking University⁸, Stanford University⁹

16 Dec 2020-arXiv: Quantum Physics

TL;DR: An overview of the field of Variational Quantum Algorithms is presented and strategies to overcome their challenges as well as the exciting prospects for using them as a means to obtain quantum advantage are discussed.

...read moreread less

Abstract: Applications such as simulating complicated quantum systems or solving large-scale linear algebra problems are very challenging for classical computers due to the extremely high computational cost. Quantum computers promise a solution, although fault-tolerant quantum computers will likely not be available in the near future. Current quantum devices have serious constraints, including limited numbers of qubits and noise processes that limit circuit depth. Variational Quantum Algorithms (VQAs), which use a classical optimizer to train a parametrized quantum circuit, have emerged as a leading strategy to address these constraints. VQAs have now been proposed for essentially all applications that researchers have envisioned for quantum computers, and they appear to the best hope for obtaining quantum advantage. Nevertheless, challenges remain including the trainability, accuracy, and efficiency of VQAs. Here we overview the field of VQAs, discuss strategies to overcome their challenges, and highlight the exciting prospects for using them to obtain quantum advantage.

...read moreread less

842 citations

여성사 쓰기의 가능성과 현주소, 한국여성연구소 여성사연구실 지음, 청년사 1999

[...]

김선경

01 Apr 2000

361 citations

Book Chapter•DOI•

Lower Bounds to Error Probability for Coding on Discrete Memoryless Channels. I

[...]

Neil J. A. Sloane, Aaron D. Wyner

01 Jan 1993

TL;DR: The paper is presented in two parts: the first, appearing here, summarizes the major results and treats the case of high transmission rates in detail; the second, to appear in the subsequent issue, treats the cases of low transmission rates.

...read moreread less

Abstract: New lower bounds are presented for the minimum error probability that can be achieved through the use of block coding on noisy discrete memoryless channels. Like previous upper bounds, these lower bounds decrease exponentially with the block length N . The coefficient of N in the exponent is a convex function of the rate. From a certain rate of transmission up to channel capacity, the exponents of the upper and lower bounds coincide. Below this particular rate, the exponents of the upper and lower bounds differ, although they approach the same limit as the rate approaches zero. Examples are given and various incidental results and techniques relating to coding theory are developed. The paper is presented in two parts: the first, appearing here, summarizes the major results and treats the case of high transmission rates in detail; the second, to appear in the subsequent issue, treats the case of low transmission rates.

...read moreread less

247 citations

Journal Article•DOI•

On the stopping distance and the stopping redundancy of codes

[...]

Moshe Schwartz¹, Alexander Vardy¹•Institutions (1)

University of California, San Diego¹

01 Mar 2006-IEEE Transactions on Information Theory

TL;DR: In this paper, the stopping redundancy of Copf was defined as the minimum number of rows in a parity-check matrix H for Copf such that the corresponding stopping distance s(H) attains its largest possible value.

...read moreread less

Abstract: It is now well known that the performance of a linear code Copf under iterative decoding on a binary erasure channel (and other channels) is determined by the size of the smallest stopping set in the Tanner graph for Copf. Several recent papers refer to this parameter as the stopping distance s of Copf. This is somewhat of a misnomer since the size of the smallest stopping set in the Tanner graph for Copf depends on the corresponding choice of a parity-check matrix. It is easy to see that s les d, where d is the minimum Hamming distance of Copf, and we show that it is always possible to choose a parity-check matrix for Copf (with sufficiently many dependent rows) such that s=d. We thus introduce a new parameter, the stopping redundancy of Copf, defined as the minimum number of rows in a parity- check matrix H for Copf such that the corresponding stopping distance s(H) attains its largest possible value, namely, s(H)=d. We then derive general bounds on the stopping redundancy of linear codes. We also examine several simple ways of constructing codes from other codes, and study the effect of these constructions on the stopping redundancy. Specifically, for the family of binary Reed-Muller codes (of all orders), we prove that their stopping redundancy is at most a constant times their conventional redundancy. We show that the stopping redundancies of the binary and ternary extended Golay codes are at most 34 and 22, respectively. Finally, we provide upper and lower bounds on the stopping redundancy of MDS codes

...read moreread less

207 citations

Journal Article•DOI•

Dynamics of Proteins and Nucleic Acids

[...]

Alan Cooper

01 Apr 1988-Biochemical Society Transactions

189 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88

Collapse