Home
/
Authors
/
Amitabh Srivastava

Author

Amitabh Srivastava

Bio: Amitabh Srivastava is an academic researcher from Pennsylvania State University. The author has contributed to research in topics: Compiler & Program analysis. The author has an hindex of 7, co-authored 7 publications receiving 1433 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

ATOM: a system for building customized program analysis tools

[...]

Amitabh Srivastava, Alan Eustace

01 Jun 1994

TL;DR: ATOM as mentioned in this paper is a single framework for building a wide range of customized program analysis tools, including block counting, profiling, dynamic memory recording, instruction and data cache simulation, pipeline simulation, evaluating branch prediction, and instruction scheduling.

...read moreread less

Abstract: ATOM (Analysis Tools with OM) is a single framework for building a wide range of customized program analysis tools. It provides the common infrastructure present in all code-instrumenting tools; this is the difficult and time-consuming part. The user simply defines the tool-specific details in instrumentation and analysis routines. Building a basic block counting tool like Pixie with ATOM requires only a page of code.ATOM, using OM link-time technology, organizes the final executable such that the application program and user's analysis routines run in the same address space. Information is directly passed from the application program to the analysis routines through simple procedure calls instead of inter-process communication or files on disk. ATOM takes care that analysis routines do not interfere with the program's execution, and precise information about the program is presented to the analysis routines at all times. ATOM uses no simulation or interpretation.ATOM has been implemented on the Alpha AXP under OSF/1. It is efficient and has been used to build a diverse set of tools for basic block counting, profiling, dynamic memory recording, instruction and data cache simulation, pipeline simulation, evaluating branch prediction, and instruction scheduling.

...read moreread less

982 citations

Proceedings Article•

ATOM: a flexible interface for building high performance program analysis tools

[...]

Alan Eustace, Amitabh Srivastava

16 Jan 1995

TL;DR: This paper illustrates this flexibility by building five complete tools that span the interests of application programmers, computer architects, and compiler writers and the performance of each tool is similar to or greatly exceeds the best known hand-crafted implementations.

...read moreread less

Abstract: Code instrumentation is a powerful mechanism for understanding program behavior Unfortunately, code instrumentation is extremely difficult, and therefore has been mostly relegated to building special purpose tools for use on standard industry benchmark suites ATOM (Analysis Tools with OM) provides a very flexible and efficient code instrumentation interface that allows powerful, high performance program analysis tools to be built with very little effort This paper illustrates this flexibility by building five complete tools that span the interests of application programmers, computer architects, and compiler writers This flexibility does not come at the expense of performance Because ATOM uses procedure calls as the interface between the application and the analysis routines, the performance of each tool is similar to or greatly exceeds the best known hand-crafted implementations

...read moreread less

175 citations

Proceedings Article•DOI•

Link-time optimization of address calculation on a 64-bit architecture

[...]

Amitabh Srivastava, David W. Wall

01 Jun 1994

TL;DR: This paper has used its link-time code modification system OM to perform program transformations related to global address use on the Alpha AXP, and describes the optimizations performed and shows their effects on program size and performance.

...read moreread less

Abstract: Compilers for new machines with 64-bit addresses must generate code that works when the memory used by the program is large. Procedures and global variables are accessed indirectly via global address tables, and calling conventions include code to establish the addressability of the appropriate tables. In the common case of a program that does not require a lot of memory, all of this can be simplified considerably, with a corresponding reduction in program size and execution time.We have used our link-time code modification system OM to perform program transformations related to global address use on the Alpha AXP. Though simple, many of these arewhole-program optimizations that can be done only when we can see the entire program at once, so link-time is an ideal occasion to perform them.This paper describes the optimizations performed and shows their effects on program size and performance. Relatively modest transformations, possible without moving code, improve the performance of SPEC benchmarks by an average of 1.5%. More ambitious transformations, requiring an understanding of program structure that is thorough but not difficult at link-time, can do even better, reducing program size by 10% or more, and improving performance by an average of 3.8%.Even a program compiled monolithically with interprocedural optimization can benefit nearly as much from this technique, if it contains statically-linked pre-compiled library code. When the benchmark sources were compiled in this way, we were still able to improve their performance by 1.35% with the modest transformations and 3.4% with the ambitious transformations.

...read moreread less

111 citations

A Practical System for Intermodule Code Optimization at Link-Time

[...]

Amitabh Srivastava, David W. Wall

01 Jan 1999

TL;DR: In this article, a symbolic register transfer language (RTL) is used to transform a collection of object modules constituting the entire program into a symbolic RTL form that can be easily manipulated.

...read moreread less

Abstract: We have developed a system called OM to explore the problem of code optimization at link-time. OM takes a collection of object modules constituting the entire program, and converts the object code into a symbolic Register Transfer Language (RTL) form that can be easily manipulated. This RTL is then transformed by intermodule optimization and finally converted back into object form. Although much high-level information about the program is gone at link-time, this approach enables us to perform optimizations that a compiler looking at a single module cannot see. Since object modules are more or less independent of the particular source language or compiler, this also gives us the chance to improve the code in ways that some compilers might simply have missed. To test the concept, we have used OM to build an optimizer that does interprocedural code motion. It moves simple loop-invariant code out of loops, even when the loop body extends across many procedures and the loop control is in a different procedure from the invariant code. Our technique also easily handles ‘‘loops’’ induced by recursion rather than iteration. Our code motion technique makes use of an interprocedural liveness analysis to discover dead registers that it can use to hold loop-invariant results. This liveness analysis also lets us perform interprocedural dead code elimination. We applied our code motion and dead code removal to SPEC benchmarks compiled with optimization using the standard compilers for the DECstation 5000. Our system improved the performance by 5% on average and by more than 14% in one case. More improvement should be possible soon; at present we move only simple load and load-address operations out of loops, and we scavenge registers to hold these values, rather than completely reallocating them. This paper will appear in the March issue of Journal of Programming Languages. It replaces Technical Note TN-31, an earlier version of the same material.

...read moreread less

106 citations

Proceedings Article•

Performance implications of multiple pointer sizes

[...]

Jeffrey C. Mogul, Joel F. Bartlett, Robert N. Mayo, Amitabh Srivastava

16 Jan 1995

TL;DR: This paper analyzes several programs and programming techniques to understand the performance implications of different pointer sizes and shows small but definite performance consequences, primarily due to cache and paging effects.

...read moreread less

Abstract: Many users need 64-bit architectures: 32-bit systems cannot support the largest applications, and 64-bit systems perform better for some applications. However, performance on some other applications can suffer from the use of large pointers; large pointers can also constrain feasible problem size. Such applications are best served by a 64-bit machine that supports the use of both 32-bit and 64-bit pointer variables. This paper analyzes several programs and programming techniques to understand the performance implications of different pointer sizes. Many (but not all) programs show small but definite performance consequences, primarily due to cache and paging effects.

...read moreread less

26 citations

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

LLVM: a compilation framework for lifelong program analysis & transformation

[...]

Chris Lattner¹, Vikram Adve¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

20 Mar 2004

TL;DR: The design of the LLVM representation and compiler framework is evaluated in three ways: the size and effectiveness of the representation, including the type information it provides; compiler performance for several interprocedural problems; and illustrative examples of the benefits LLVM provides for several challenging compiler problems.

...read moreread less

Abstract: We describe LLVM (low level virtual machine), a compiler framework designed to support transparent, lifelong program analysis and transformation for arbitrary programs, by providing high-level information to compiler transformations at compile-time, link-time, run-time, and in idle time between runs. LLVM defines a common, low-level code representation in static single assignment (SSA) form, with several novel features: a simple, language-independent type-system that exposes the primitives commonly used to implement high-level language features; an instruction for typed address arithmetic; and a simple mechanism that can be used to implement the exception handling features of high-level languages (and setjmp/longjmp in C) uniformly and efficiently. The LLVM compiler framework and code representation together provide a combination of key capabilities that are important for practical, lifelong analysis and transformation of programs. To our knowledge, no existing compilation approach provides all these capabilities. We describe the design of the LLVM representation and compiler framework, and evaluate the design in three ways: (a) the size and effectiveness of the representation, including the type information it provides; (b) compiler performance for several interprocedural problems; and (c) illustrative examples of the benefits LLVM provides for several challenging compiler problems.

...read moreread less

4,841 citations

Journal Article•DOI•

Pin: building customized program analysis tools with dynamic instrumentation

[...]

Chi-Keung Luk¹, Robert Cohn¹, Robert Muth¹, Harish Patil¹, Artur Klauser¹, Geoff Lowney¹, Steven Wallace¹, Vijay Janapa Reddi², Kim Hazelwood¹ - Show less +5 more•Institutions (2)

Intel¹, University of Colorado Boulder²

12 Jun 2005

TL;DR: The goals are to provide easy-to-use, portable, transparent, and efficient instrumentation, and to illustrate Pin's versatility, two Pintools in daily use to analyze production software are described.

...read moreread less

Abstract: Robust and powerful software instrumentation tools are essential for program analysis tasks such as profiling, performance evaluation, and bug detection. To meet this need, we have developed a new instrumentation system called Pin. Our goals are to provide easy-to-use, portable, transparent, and efficient instrumentation. Instrumentation tools (called Pintools) are written in C/C++ using Pin's rich API. Pin follows the model of ATOM, allowing the tool writer to analyze an application at the instruction level without the need for detailed knowledge of the underlying instruction set. The API is designed to be architecture independent whenever possible, making Pintools source compatible across different architectures. However, a Pintool can access architecture-specific details when necessary. Instrumentation with Pin is mostly transparent as the application and Pintool observe the application's original, uninstrumented behavior. Pin uses dynamic compilation to instrument executables while they are running. For efficiency, Pin uses several techniques, including inlining, register re-allocation, liveness analysis, and instruction scheduling to optimize instrumentation. This fully automated approach delivers significantly better instrumentation performance than similar tools. For example, Pin is 3.3x faster than Valgrind and 2x faster than DynamoRIO for basic-block counting. To illustrate Pin's versatility, we describe two Pintools in daily use to analyze production software. Pin is publicly available for Linux platforms on four architectures: IA32 (32-bit x86), EM64T (64-bit x86), Itanium®, and ARM. In the ten months since Pin 2 was released in July 2004, there have been over 3000 downloads from its website.

...read moreread less

4,019 citations

Proceedings Article•DOI•

Automatically characterizing large scale program behavior

[...]

Timothy Sherwood¹, Erez Perelman¹, Greg Hamerly¹, Brad Calder¹•Institutions (1)

University of California, San Diego¹

01 Oct 2002

TL;DR: This work quantifies the effectiveness of Basic Block Vectors in capturing program behavior across several different architectural metrics, explores the large scale behavior of several programs, and develops a set of algorithms based on clustering capable of analyzing this behavior.

...read moreread less

Abstract: Understanding program behavior is at the foundation of computer architecture and program optimization. Many programs have wildly different behavior on even the very largest of scales (over the complete execution of the program). This realization has ramifications for many architectural and compiler techniques, from thread scheduling, to feedback directed optimizations, to the way programs are simulated. However, in order to take advantage of time-varying behavior, we must first develop the analytical tools necessary to automatically and efficiently analyze program behavior over large sections of execution.Our goal is to develop automatic techniques that are capable of finding and exploiting the Large Scale Behavior of programs (behavior seen over billions of instructions). The first step towards this goal is the development of a hardware independent metric that can concisely summarize the behavior of an arbitrary section of execution in a program. To this end we examine the use of Basic Block Vectors. We quantify the effectiveness of Basic Block Vectors in capturing program behavior across several different architectural metrics, explore the large scale behavior of several programs, and develop a set of algorithms based on clustering capable of analyzing this behavior. We then demonstrate an application of this technology to automatically determine where to simulate for a program to help guide computer architecture research.

...read moreread less

1,702 citations

Journal Article•DOI•

Eraser: a dynamic data race detector for multithreaded programs

[...]

Stefan Savage¹, Michael Burrows, Greg Nelson, Patrick G. Sobalvarro, Thomas Anderson² - Show less +1 more•Institutions (2)

University of Washington¹, University of California, Berkeley²

01 Nov 1997-ACM Transactions on Computer Systems

TL;DR: A new tool, called Eraser, is described, for dynamically detecting data races in lock-based multithreaded programs, which uses binary rewriting techniques to monitor every shared-monory reference and verify that consistent locking behavior is observed.

...read moreread less

Abstract: Multithreaded programming is difficult and error prone. It is easy to make a mistake in synchronization that produces a data race, yet it can be extremely hard to locate this mistake during debugging. This article describes a new tool, called Eraser, for dynamically detecting data races in lock-based multithreaded programs. Eraser uses binary rewriting techniques to monitor every shared-monory reference and verify that consistent locking behavior is observed. We present several case studies, including undergraduate coursework and a multithreaded Web search engine, that demonstrate the effectiveness of this approach.

...read moreread less

1,553 citations

Proceedings Article•DOI•

Eraser: a dynamic data race detector for multi-threaded programs

[...]

Stefan Savage¹, Michael Burrows, Greg Nelson, Patrick G. Sobalvarro, Thomas Anderson² - Show less +1 more•Institutions (2)

University of Washington¹, University of California, Berkeley²

01 Oct 1997

TL;DR: Eraser as mentioned in this paper uses binary rewriting techniques to monitor every shared memory reference and verify that consistent locking behavior is observed in lock-based multi-threaded programs, which can be used to detect data races.

...read moreread less

Abstract: Multi-threaded programming is difficult and error prone. It is easy to make a mistake in synchronization that produces a data race, yet it can be extremely hard to locate this mistake during debugging. This paper describes a new tool, called Eraser, for dynamically detecting data races in lock-based multi-threaded programs. Eraser uses binary rewriting techniques to monitor every shared memory reference and verify that consistent locking behavior is observed. We present several case studies, including undergraduate coursework and a multi-threaded Web search engine, that demonstrate the effectiveness of this approach.

...read moreread less

1,424 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse