Home
/
Authors
/
Keshav Pingali

Author

Keshav Pingali

Other affiliations: Indian Institute of Technology Madras, University of Texas System, Cornell University

Bio: Keshav Pingali is an academic researcher from University of Texas at Austin. The author has contributed to research in topics: Compiler & Data structure. The author has an hindex of 55, co-authored 252 publications receiving 9288 citations. Previous affiliations of Keshav Pingali include Indian Institute of Technology Madras & University of Texas System.

Papers published on a yearly basis

2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1986
1983

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

A lightweight infrastructure for graph analytics

[...]

Donald Nguyen¹, Andrew Lenharth¹, Keshav Pingali¹•Institutions (1)

University of Texas at Austin¹

03 Nov 2013

TL;DR: This paper argues that existing DSLs can be implemented on top of a general-purpose infrastructure that supports very fine-grain tasks, implements autonomous, speculative execution of these tasks, and allows application-specific control of task scheduling policies.

...read moreread less

Abstract: Several domain-specific languages (DSLs) for parallel graph analytics have been proposed recently. In this paper, we argue that existing DSLs can be implemented on top of a general-purpose infrastructure that (i) supports very fine-grain tasks, (ii) implements autonomous, speculative execution of these tasks, and (iii) allows application-specific control of task scheduling policies. To support this claim, we describe such an implementation called the Galois system.We demonstrate the capabilities of this infrastructure in three ways. First, we implement more sophisticated algorithms for some of the graph analytics problems tackled by previous DSLs and show that end-to-end performance can be improved by orders of magnitude even on power-law graphs, thanks to the better algorithms facilitated by a more general programming model. Second, we show that, even when an algorithm can be expressed in existing DSLs, the implementation of that algorithm in the more general system can be orders of magnitude faster when the input graphs are road networks and similar graphs with high diameter, thanks to more sophisticated scheduling. Third, we implement the APIs of three existing graph DSLs on top of the common infrastructure in a few hundred lines of code and show that even for power-law graphs, the performance of the resulting implementations often exceeds that of the original DSL systems, thanks to the lightweight infrastructure.

...read moreread less

541 citations

Proceedings Article•DOI•

Optimistic parallelism requires abstractions

[...]

Milind Kulkarni¹, Keshav Pingali¹, Bruce Walter², Ganesh Ramanarayanan², Kavita Bala², L. Paul Chew² - Show less +2 more•Institutions (2)

University of Texas at Austin¹, Cornell University²

10 Jun 2007

TL;DR: It is shown that Delaunay mesh generation and agglomerative clustering can be parallelized in a straight-forward way using the Galois approach, and results suggest that Galois is a practical approach to exploiting data parallelism in irregular programs.

...read moreread less

Abstract: Irregular applications, which manipulate large, pointer-based data structures like graphs, are difficult to parallelize manually. Automatic tools and techniques such as restructuring compilers and run-time speculative execution have failed to uncover much parallelism in these applications, in spite of a lot of effort by the research community. These difficulties have even led some researchers to wonder if there is any coarse-grain parallelism worth exploiting in irregular applications.In this paper, we describe two real-world irregular applications: a Delaunay mesh refinement application and a graphics application thatperforms agglomerative clustering. By studying the algorithms and data structures used in theseapplications, we show that there is substantial coarse-grain, data parallelism in these applications, but that this parallelism is very dependent on the input data and therefore cannot be uncoveredby compiler analysis. In principle, optimistic techniques such asthread-level speculation can be used to uncover this parallelism, but we argue that current implementations cannot accomplish thisbecause they do not use the proper abstractions for the data structuresin these programs.These insights have informed our design of the Galois system, an object-based optimistic parallelization system for irregular applications. There are three main aspects to Galois: (1) a small number of syntactic constructs for packaging optimistic parallelism as iteration over ordered and unordered sets, (2)assertions about methods in class libraries, and (3) a runtime scheme for detecting and recovering from potentially unsafe accesses to shared memory made by an optimistic computation.We show that Delaunay mesh generation and agglomerative clustering can be parallelized in a straight-forward way using the Galois approach, and we present experimental measurements to show that this approach is practical. These results suggest that Galois is a practical approach to exploiting data parallelismin irregular programs.

...read moreread less

433 citations

Journal Article•DOI•

The tao of parallelism in algorithms

[...]

Keshav Pingali¹, Donald Nguyen¹, Milind Kulkarni², Martin Burtscher³, M. Amber Hassaan¹, Rashid Kaleem¹, Tsung-Hsien Lee¹, Andrew Lenharth¹, Roman Manevich¹, Mario Méndez-Lojo¹, Dimitrios Prountzos¹, Xin Sui¹ - Show less +8 more•Institutions (3)

University of Texas at Austin¹, Purdue University², Texas State University³

04 Jun 2011

TL;DR: It is suggested that the operator formulation and tao-analysis of algorithms can be the foundation of a systematic approach to parallel programming.

...read moreread less

Abstract: For more than thirty years, the parallel programming community has used the dependence graph as the main abstraction for reasoning about and exploiting parallelism in "regular" algorithms that use dense arrays, such as finite-differences and FFTs. In this paper, we argue that the dependence graph is not a suitable abstraction for algorithms in new application areas like machine learning and network analysis in which the key data structures are "irregular" data structures like graphs, trees, and sets.To address the need for better abstractions, we introduce a data-centric formulation of algorithms called the operator formulation in which an algorithm is expressed in terms of its action on data structures. This formulation is the basis for a structural analysis of algorithms that we call tao-analysis. Tao-analysis can be viewed as an abstraction of algorithms that distills out algorithmic properties important for parallelization. It reveals that a generalized form of data-parallelism called amorphous data-parallelism is ubiquitous in algorithms, and that, depending on the tao-structure of the algorithm, this parallelism may be exploited by compile-time, inspector-executor or optimistic parallelization, thereby unifying these seemingly unrelated parallelization techniques. Regular algorithms emerge as a special case of irregular algorithms, and many application-specific optimization techniques can be generalized to a broader context.These results suggest that the operator formulation and tao-analysis of algorithms can be the foundation of a systematic approach to parallel programming.

...read moreread less

380 citations

Proceedings Article•DOI•

A quantitative study of irregular programs on GPUs

[...]

Martin Burtscher¹, Rupesh Nasre², Keshav Pingali²•Institutions (2)

Texas State University¹, University of Texas at Austin²

04 Nov 2012

TL;DR: This paper defines two measures of irregularity called control-flow irregularity and memory-access irregularity, and investigates, using performance-counter measurements, how irregular GPU kernels differ from regular kernels with respect to these measures.

...read moreread less

Abstract: GPUs have been used to accelerate many regular applications and, more recently, irregular applications in which the control flow and memory access patterns are data-dependent and statically unpredictable. This paper defines two measures of irregularity called control-flow irregularity and memory-access irregularity, and investigates, using performance-counter measurements, how irregular GPU kernels differ from regular kernels with respect to these measures. For a suite of 13 benchmarks, we find that (i) irregularity at the warp level varies widely, (ii) control-flow irregularity and memory-access irregularity are largely independent of each other, and (iii) most kernels, including regular ones, exhibit some irregularity. A program's irregularity can change between different inputs, systems, and arithmetic precision but generally stays in a specific region of the irregularity space. Whereas some highly tuned implementations of irregular algorithms exhibit little irregularity, trading off extra irregularity for better locality or less work can improve overall performance.

...read moreread less

371 citations

Proceedings Article•DOI•

Data-centric multi-level blocking

[...]

Induprakas Kodukula¹, Nawaaz Ahmed¹, Keshav Pingali¹•Institutions (1)

Cornell University¹

01 May 1997

TL;DR: This work presents a simple and novel framework for generating blocked codes for high-performance machines with a memory hierarchy based on reasoning directly about the flow of data through the memory hierarchy, which permits a more direct solution to the problem of enhancing data locality.

...read moreread less

Abstract: We present a simple and novel framework for generating blocked codes for high-performance machines with a memory hierarchy. Unlike traditional compiler techniques like tiling, which are based on reasoning about the control flow of programs, our techniques are based on reasoning directly about the flow of data through the memory hierarchy. Our data-centric transformations permit a more direct solution to the problem of enhancing data locality than current control-centric techniques do, and generalize easily to multiple levels of memory hierarchy. We buttress these claims with performance numbers for standard benchmarks from the problem domain of dense numerical linear algebra. The simplicity and intuitive appeal of our approach should make it attractive to compiler writers as well as to library writers.

...read moreread less

282 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Journal Article•DOI•

Phd by thesis

[...]

Richard Lathe¹•Institutions (1)

French Institute of Health and Medical Research¹

01 Apr 1988-Nature

TL;DR: In this paper, a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) is presented.

...read moreread less

Abstract: Deposits of clastic carbonate-dominated (calciclastic) sedimentary slope systems in the rock record have been identified mostly as linearly-consistent carbonate apron deposits, even though most ancient clastic carbonate slope deposits fit the submarine fan systems better. Calciclastic submarine fans are consequently rarely described and are poorly understood. Subsequently, very little is known especially in mud-dominated calciclastic submarine fan systems. Presented in this study are a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) that reveals a >250 m thick calciturbidite complex deposited in a calciclastic submarine fan setting. Seven facies are recognised from core and thin section characterisation and are grouped into three carbonate turbidite sequences. They include: 1) Calciturbidites, comprising mostly of highto low-density, wavy-laminated bioclast-rich facies; 2) low-density densite mudstones which are characterised by planar laminated and unlaminated muddominated facies; and 3) Calcidebrites which are muddy or hyper-concentrated debrisflow deposits occurring as poorly-sorted, chaotic, mud-supported floatstones. These

...read moreread less

9,929 citations

Proceedings Article•DOI•

Random graphs

[...]

Alan Frieze¹•Institutions (1)

Carnegie Mellon University¹

22 Jan 2006

TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.

...read moreread less

Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.

...read moreread less

7,116 citations

Journal Article•DOI•

The interdisciplinary study of coordination

[...]

Thomas W. Malone¹, Kevin Crowston²•Institutions (2)

Massachusetts Institute of Technology¹, University of Michigan²

01 Mar 1994-ACM Computing Surveys

TL;DR: This survey characterizes an emerging research area, sometimes called coordination theory, that focuses on the interdisciplinary study of coordination, that uses and extends ideas about coordination from disciplines such as computer science, organization theory, operations research, economics, linguistics, and psychology.

...read moreread less

Abstract: This survey characterizes an emerging research area, sometimes called coordination theory, that focuses on the interdisciplinary study of coordination. Research in this area uses and extends ideas about coordination from disciplines such as computer science, organization theory, operations research, economics, linguistics, and psychology.A key insight of the framework presented here is that coordination can be seen as the process of managing dependencies among activities. Further progress, therefore, should be possible by characterizing different kinds of dependencies and identifying the coordination processes that can be used to manage them. A variety of processes are analyzed from this perspective, and commonalities across disciplines are identified. Processes analyzed include those for managing shared resources, producer/consumer relationships, simultaneity constraints, and task/subtask dependencies.Section 3 summarizes ways of applying a coordination perspective in three different domains:(1) understanding the effects of information technology on human organizations and markets, (2) designing cooperative work tools, and (3) designing distributed and parallel computer systems. In the final section, elements of a research agenda in this new area are briefly outlined.

...read moreread less

3,447 citations

A Survey of Program Slicing Techniques.

[...]

Frank Tip¹•Institutions (1)

Centrum Wiskunde & Informatica¹

31 Jul 1994

TL;DR: An overview of the applications of program slicing, which include debugging, program integration, dataflow testing, and software maintenance is presented, including the various general approaches used to compute slices.

...read moreread less

Abstract: A program slice consists of the parts of a program that (potentially) affect the values computed at some point of interest Such a point of interest is referred to as a slicing criterion, and is typically specified by a location in the program in combination with a subset of the program’s variables The task of computing program slices is called program slicing The original definition of a program slice was presented by Weiser in 1979 Since then, various slightly different notions of program slices have been proposed, as well as a number of methods to compute them An important distinction is that between a static and a dynamic slice Static slices are computed without making assumptions regarding a program’s input, whereas the computation of dynamic slices relies on a specific test case This survey presents an overview of program slicing, including the various general approaches used to compute slices, as well as the specific techniques used to address a variety of language features such as procedures, unstructured control flow, composite data types and pointers, and concurrency Static and dynamic slicing methods for each of these features are compared and classified in terms of their accuracy and efficiency Moreover, the possibilities for combining solutions for different features are investigated Recent work on the use of compiler-optimization and symbolic execution techniques for obtaining more accurate slices is discussed The paper concludes with an overview of the applications of program slicing, which include debugging, program integration, dataflow testing, and software maintenance

...read moreread less

1,610 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse