Home
/
Authors
/
Y. N. Srikant

Author

Y. N. Srikant

Bio: Y. N. Srikant is an academic researcher from Indian Institute of Science. The author has contributed to research in topics: Cache & Very long instruction word. The author has an hindex of 15, co-authored 98 publications receiving 808 citations.

Papers published on a yearly basis

2020
2019
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2000
1999
1998
1995
1994
1993
1991
1990
1989
1988
1987
1986
1982

Papers

PDF

Open Access

More filters

Book•DOI•

Compiler Design Handbook: Optimizations and Machine Code Generation

[...]

Y. N. Srikant, Priti Shankar¹•Institutions (1)

Indian Institute of Science¹

01 Aug 2002

TL;DR: Worst-Case Execution Time and Energy Analysis T. Mitra, A. Roychoudhury Static Program Analysis for Security K. Srikant, K. Vardhan Statistical and Machine Learning Techniques in Compiler Design K. Vaswani type systems: Advances and Applications.

...read moreread less

Abstract: Worst-Case Execution Time and Energy Analysis T. Mitra, A. Roychoudhury Static Program Analysis for Security K. Gopinath Compiler Aided Design of Embedded Computers A. Shrivastava, N. Dutt Whole Execution Traces and Their Use in Debugging X. Zhang, N. Gupta, R. Gupta Optimizations for Memory Hierarchies E. Raman, D. I. August Garbage Collection Techniques A. Sanyal, U. P. Khedker Energy-Aware Compiler Optimizations Y. N. Srikant, K. A. Vardhan Statistical and Machine Learning Techniques in Compiler Design K. Vaswani Type Systems: Advances and Applications J. Palsberg, T. Millstein Dynamic Compilation E. Duesterwald The Static Single Assignment Form: Construction and Application to Program Optimization J. P. Prabhu, P. Shankar, Y. N. Srikant Shape Analysis and Applications T. Reps, M. Sagiv, R. Wilhelm Optimizations for Object-Oriented Languages A. Krall, N. Horspool Program Slicing G. B. Mund, R. Mall Computations on Iteration Spaces S. Rajopadhye, L. Renganarayana, G. Gupta, M. Strout Architecture Description Languages for Retargetable Compilation W. Qin, S. Malik Instruction Selection Using Tree Parsing P. Shankar A Retargetable Very Long Instruction Word Compiler Framework for Digital Signal Processors S. Rajagopalan, S. Malik Instruction Scheduling R. Govindarajan Advances in Software Pipelining H. Rong, R. Govindarajan Advances in Register Allocation Techniques V. K. Nandivada

...read moreread less

111 citations

Proceedings Article•DOI•

WCET estimation for executables in the presence of data caches

[...]

Rathijit Sen¹, Y. N. Srikant¹•Institutions (1)

Indian Institute of Science¹

30 Sep 2007

TL;DR: In this paper, the authors describe techniques to estimate the worst case execution time of executable code on architectures with data caches using Abstract Interpretation, which is used for the dual purposes of tracking address computations and cache behavior.

...read moreread less

Abstract: This paper describes techniques to estimate the worst case execution time of executable code on architectures with data caches. The underlying mechanism is Abstract Interpretation, which is used for the dual purposes of tracking address computations and cache behavior. A simultaneous numeric and pointer analysis using an abstraction for discrete sets of values computes safe approximations of access addresses which are then used to predict cache behavior using Must Analysis. A heuristic is also proposed which generates likely worst case estimates. It can be used in soft real time systems and also for reasoning about the tightness of the safe estimate. The analysis methods can handle programs with non-affine access patterns, for which conventional Presburger Arithmetic formulations or Cache Miss Equations do not apply. The precision of the estimates is user-controlled and can be traded off against analysis time. Executables are analyzed directly, which, apart from enhancing precision, renders the method language independent.

...read moreread less

55 citations

Journal Article•DOI•

IR2VEC: LLVM IR Based Scalable Program Embeddings

[...]

S. VenkataKeerthy¹, Rohit Aggarwal¹, Shalini Jain¹, Maunendra Sankar Desarkar¹, Ramakrishna Upadrasta¹, Y. N. Srikant² - Show less +2 more•Institutions (2)

Indian Institute of Technology, Hyderabad¹, Indian Institute of Science²

18 Dec 2020-ACM Transactions on Architecture and Code Optimization

TL;DR: IR2VEC as mentioned in this paper is a distributed embedding infrastructure that combines representation learning methods with flow information to capture the syntax as well as the semantics of the input programs and achieves state-of-the-art performance on heterogeneous device mapping and thread coarsening.

...read moreread less

Abstract: We propose IR2VEC, a Concise and Scalable encoding infrastructure to represent programs as a distributed embedding in continuous space. This distributed embedding is obtained by combining representation learning methods with flow information to capture the syntax as well as the semantics of the input programs. As our infrastructure is based on the Intermediate Representation (IR) of the source code, obtained embeddings are both language and machine independent. The entities of the IR are modeled as relationships, and their representations are learned to form a seed embedding vocabulary. Using this infrastructure, we propose two incremental encodings: Symbolic and Flow-Aware. Symbolic encodings are obtained from the seed embedding vocabulary, and Flow-Aware encodings are obtained by augmenting the Symbolic encodings with the flow information. We show the effectiveness of our methodology on two optimization tasks (Heterogeneous device mapping and Thread coarsening). Our way of representing the programs enables us to use non-sequential models resulting in orders of magnitude of faster training time. Both the encodings generated by IR2VEC outperform the existing methods in both the tasks, even while using simple machine learning models. In particular, our results improve or match the state-of-the-art speedup in 11/14 benchmark-suites in the device mapping task across two platforms and 53/68 benchmarks in the thread coarsening task across four different platforms. When compared to the other methods, our embeddings are more scalable, is non-data-hungry, and has better Out-Of-Vocabulary (OOV) characteristics.

...read moreread less

43 citations

Journal Article•DOI•

Falcon: A Graph Manipulation Language for Heterogeneous Systems

[...]

Unnikrishnan Cheramangalath¹, Rupesh Nasre², Y. N. Srikant¹•Institutions (2)

Indian Institute of Science¹, Indian Institute of Technology Madras²

22 Dec 2015-ACM Transactions on Architecture and Code Optimization

TL;DR: A domain-specific language (DSL) is proposed, Falcon, for implementing graph algorithms that abstracts the hardware, provides constructs to write explicitly parallel programs at a higher level, and can work with general algorithms that may change the graph structure.

...read moreread less

Abstract: Graph algorithms have been shown to possess enough parallelism to keep several computing resources busy—even hundreds of cores on a GPU. Unfortunately, tuning their implementation for efficient execution on a particular hardware configuration of heterogeneous systems consisting of multicore CPUs and GPUs is challenging, time consuming, and error prone. To address these issues, we propose a domain-specific language (DSL), Falcon, for implementing graph algorithms that (i) abstracts the hardware, (ii) provides constructs to write explicitly parallel programs at a higher level, and (iii) can work with general algorithms that may change the graph structure (morph algorithms). We illustrate the usage of our DSL to implement local computation algorithms (that do not change the graph structure) and morph algorithms such as Delaunay mesh refinement, survey propagation, and dynamic SSSP on GPU and multicore CPUs. Using a set of benchmark graphs, we illustrate that the generated code performs close to the state-of-the-art hand-tuned implementations.

...read moreread less

43 citations

Proceedings Article•DOI•

A Programmable Hardware Path Profiler

[...]

Kapil Vaswani¹, Matthew J. Thazhuthaveetil¹, Y. N. Srikant¹•Institutions (1)

Indian Institute of Science¹

20 Mar 2005

TL;DR: A low-overhead, non-intrusive hardware path profiling scheme that can be programmed to detect several types of paths including acyclic, intra-procedural paths, paths for a whole program path and extended paths, enabling context-sensitive performance monitoring and bottleneck analysis.

...read moreread less

Abstract: For aggressive path-based program optimizations to be profitable in cost-sensitive environments, accurate path profiles must be available at low overheads. In this paper, we propose a low-overhead, non-intrusive hardware path profiling scheme that can be programmed to detect several types of paths including acyclic, intra-procedural paths, paths for a whole program path and extended paths. The profiler consists of a path stack, which detects paths and generates a sequence of path descriptors using branch information from the processor pipeline, and a hot path table that collects a profile of hot paths for later use by a program optimizer. With assistance from the processor's event detection logic, our profiler can track a host of architectural metrics along paths, enabling context-sensitive performance monitoring and bottleneck analysis. We illustrate the utility of our scheme by associating paths with a power metric that estimates power consumption in the cache hierarchy caused by instructions along the path. Experiments using programs from the SPEC CPU2000 benchmark suite show that our path profiler, occupying 7KB of hardware real-estate, collects accurate path profiles (average overlap of 88% with a perfect profile) at negligible execution time overheads (0.6% on average).

...read moreread less

41 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

The Design and Analysis of Experiments

[...]

Margaret J. Robertson

01 Jun 1953-Yale Journal of Biology and Medicine

TL;DR: This book by a teacher of statistics (as well as a consultant for "experimenters") is a comprehensive study of the philosophical background for the statistical design of experiment.

...read moreread less

Abstract: THE DESIGN AND ANALYSIS OF EXPERIMENTS. By Oscar Kempthorne. New York, John Wiley and Sons, Inc., 1952. 631 pp. $8.50. This book by a teacher of statistics (as well as a consultant for \"experimenters\") is a comprehensive study of the philosophical background for the statistical design of experiment. It is necessary to have some facility with algebraic notation and manipulation to be able to use the volume intelligently. The problems are presented from the theoretical point of view, without such practical examples as would be helpful for those not acquainted with mathematics. The mathematical justification for the techniques is given. As a somewhat advanced treatment of the design and analysis of experiments, this volume will be interesting and helpful for many who approach statistics theoretically as well as practically. With emphasis on the \"why,\" and with description given broadly, the author relates the subject matter to the general theory of statistics and to the general problem of experimental inference. MARGARET J. ROBERTSON

...read moreread less

13,333 citations

Proceedings Article•DOI•

Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow

[...]

Wilson W. L. Fung¹, Ivan Sham¹, George L. Yuan¹, Tor M. Aamodt¹•Institutions (1)

University of British Columbia¹

01 Dec 2007

TL;DR: It is shown that a realistic hardware implementation that dynamically regroups threads into new warps on the fly following the occurrence of diverging branch outcomes improves performance by an average of 20.7% for an estimated area increase of 4.7%.

...read moreread less

Abstract: Recent advances in graphics processing units (GPUs) have resulted in massively parallel hardware that is easily programmable and widely available in commodity desktop computer systems. GPUs typically use single-instruction, multiple-data (SIMD) pipelines to achieve high perfor- mance with minimal overhead incurred by control hard- ware. Scalar threads are grouped together into SIMD batches, sometimes referred to as warps. While SIMD is ideally suited for simple programs, recent GPUs include control flow instructions in the GPU instruction set archi- tecture and programs using these instructions may experi- ence reduced performance due to the way branch execution is supported by hardware. One approach is to add a stack to allow different SIMD processing elements to execute dis- tinct program paths after a branch instruction. The occur- rence of diverging branch outcomes for different processing elements significantly degrades performance. In this paper, we explore mechanisms for more efficient SIMD branch ex- ecution on GPUs. We show that a realistic hardware im- plementation that dynamically regroups threads into new warps on the fly following the occurrence of diverging branch outcomes improves performance by an average of 20.7% for an estimated area increase of 4.7%.

...read moreread less

512 citations

Proceedings Article•DOI•

Program optimization space pruning for a multithreaded gpu

[...]

Shane Ryoo¹, Christopher I. Rodrigues¹, Sam S. Stone¹, Sara S. Baghsorkhi¹, Sain-Zee Ueng¹, John A. Stratton¹, Wen-mei W. Hwu¹ - Show less +3 more•Institutions (1)

University of Illinois at Urbana–Champaign¹

06 Apr 2008

TL;DR: The complexity involved in optimizing applications for one highly-parallel system and one relatively simple methodology for reducing the workload involved in the optimization process are shown.

...read moreread less

Abstract: Program optimization for highly-parallel systems has historically been considered an art, with experts doing much of the performance tuning by hand. With the introduction of inexpensive, single-chip, massively parallel platforms, more developers will be creating highly-parallel applications for these platforms, who lack the substantial experience and knowledge needed to maximize their performance. This creates a need for more structured optimization methods with means to estimate their performance effects. Furthermore these methods need to be understandable by most programmers. This paper shows the complexity involved in optimizing applications for one such system and one relatively simple methodology for reducing the workload involved in the optimization process.This work is based on one such highly-parallel system, the GeForce 8800 GTX using CUDA. Its flexible allocation of resources to threads allows it to extract performance from a range of applications with varying resource requirements, but places new demands on developers who seek to maximize an application's performance. We show how optimizations interact with the architecture in complex ways, initially prompting an inspection of the entire configuration space to find the optimal configuration. Even for a seemingly simple application such as matrix multiplication, the optimal configuration can be unexpected. We then present metrics derived from static code that capture the first-order factors of performance. We demonstrate how these metrics can be used to prune many optimization configurations, down to those that lie on a Pareto-optimal curve. This reduces the optimization space by as much as 98% and still finds the optimal configuration for each of the studied applications.

...read moreread less

312 citations

Journal Article•DOI•

Memory Hierarchies, Pipelines, and Buses for Future Architectures in Time-Critical Embedded Systems

[...]

Reinhard Wilhelm¹, Daniel Grund¹, Jan Reineke¹, Marc Schlickling¹, Markus Pister¹, Christian Ferdinand - Show less +2 more•Institutions (1)

Saarland University¹

01 Jul 2009-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: The architectural influence on static timing analysis is described and recommendations as to profitable and unacceptable architectural features are given and results show that measurement-based methods still used in industry are not useful for quite commonly used complex processors.

...read moreread less

Abstract: Embedded hard real-time systems need reliable guarantees for the satisfaction of their timing constraints. Experience with the use of static timing-analysis methods and the tools based on them in the automotive and the aeronautics industries is positive. However, both the precision of the results and the efficiency of the analysis methods are highly dependent on the predictability of the execution platform. In fact, the architecture determines whether a static timing analysis is practically feasible at all and whether the most precise obtainable results are precise enough. Results contained in this paper also show that measurement-based methods still used in industry are not useful for quite commonly used complex processors. This dependence on the architectural development is of growing concern to the developers of timing-analysis tools and their customers, the developers in industry. The problem reaches a new level of severity with the advent of multicore architectures in the embedded domain. This paper describes the architectural influence on static timing analysis and gives recommendations as to profitable and unacceptable architectural features.

...read moreread less

249 citations

Proceedings Article•DOI•

Mining advisor-advisee relationships from research publication networks

[...]

Chi Wang¹, Jiawei Han¹, Yuntao Jia¹, Jie Tang², Duo Zhang¹, Yintao Yu¹, Jingyi Guo² - Show less +3 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Tsinghua University²

25 Jul 2010

TL;DR: A time-constrained probabilistic factor graph model (TPFG), which takes a research publication network as input and models the advisor-advisee relationship mining problem using a jointly likelihood objective function is proposed and an efficient learning algorithm is designed to optimize the objective function.

...read moreread less

Abstract: Information network contains abundant knowledge about relationships among people or entities. Unfortunately, such kind of knowledge is often hidden in a network where different kinds of relationships are not explicitly categorized. For example, in a research publication network, the advisor-advisee relationships among researchers are hidden in the coauthor network. Discovery of those relationships can benefit many interesting applications such as expert finding and research community analysis. In this paper, we take a computer science bibliographic network as an example, to analyze the roles of authors and to discover the likely advisor-advisee relationships. In particular, we propose a time-constrained probabilistic factor graph model (TPFG), which takes a research publication network as input and models the advisor-advisee relationship mining problem using a jointly likelihood objective function. We further design an efficient learning algorithm to optimize the objective function. Based on that our model suggests and ranks probable advisors for every author. Experimental results show that the proposed approach infer advisor-advisee relationships efficiently and achieves a state-of-the-art accuracy (80-90%). We also apply the discovered advisor-advisee relationships to bole search, a specific expert finding task and empirical study shows that the search performance can be effectively improved (+4.09% by NDCG@5).

...read moreread less

212 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153

Collapse