Home
/
Authors
/
Charles E. Leiserson

Author

Charles E. Leiserson

Other affiliations: Vassar College, Carnegie Mellon University, Akamai Technologies

Bio: Charles E. Leiserson is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Cilk & Scheduling (computing). The author has an hindex of 65, co-authored 185 publications receiving 49312 citations. Previous affiliations of Charles E. Leiserson include Vassar College & Carnegie Mellon University.

Topics: Cilk, Scheduling (computing), Work stealing, Runtime system, Cache ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Tapir: Embedding Recursive Fork-join Parallelism into LLVM’s Intermediate Representation

[...]

Tao B. Schardl¹, William S. Moses¹, Charles E. Leiserson¹•Institutions (1)

Massachusetts Institute of Technology¹

17 Dec 2019

TL;DR: This research presents a meta-analyses of the immune system’s response to distributed computing and its applications in the context of EMMARM, a servers-based approach.

...read moreread less

Abstract: Tapir (pronounced TAY-per) is a compiler intermediate representation (IR) that embeds recursive fork-join parallelism, as supported by task-parallel programming platforms such as Cilk and OpenMP, into a mainstream compiler’s IR. Mainstream compilers typically treat parallel linguistic constructs as syntactic sugar for function calls into a parallel runtime. These calls prevent the compiler from performing optimizations on and across parallel control constructs. Remedying this situation has generally been thought to require an extensive reworking of compiler analyses and code transformations to handle parallel semantics. Tapir leverages the “serial-projection property,” which is commonly satisfied by task-parallel programs, to handle the semantics of these programs without an extensive rework of the compiler. For recursive fork-join programs that satisfy the serial-projection property, Tapir enables effective compiler optimization of parallel programs with only minor changes to existing compiler analyses and code transformations. Tapir uses the serial-projection property to order logically parallel fine-grained tasks in the program’s control-flow graph. This ordered representation of parallel tasks allows the compiler to optimize parallel codes effectively with only minor modifications. For example, to implement Tapir/LLVM, a prototype of Tapir in the LLVM compiler, we added or modified less than 3,000 lines of LLVM’s half-million-line core middle-end functionality. These changes sufficed to enable LLVM’s existing compiler optimizations for serial code—including loop-invariant-code motion, common-subexpression elimination, and tail-recursion elimination—to work with parallel control constructs such as parallel loops and Cilk’s Cilk_Spawn keyword. Tapir also supports parallel optimizations, such as loop scheduling, which restructure the parallel control flow of the program. By making use of existing LLVM optimizations and new parallel optimizations, Tapir/LLVM can optimize recursive fork-join programs more effectively than traditional compilation methods. On a suite of 35 Cilk application benchmarks, Tapir/LLVM produces more efficient executables for 30 benchmarks, with faster 18-core running times for 26 of them, compared to a nearly identical compiler that compiles parallel linguistic constructs the traditional way.

...read moreread less

13 citations

Proceedings Article•DOI•

The CSI Framework for Compiler-Inserted Program Instrumentation

[...]

Tao B. Schardl¹, Tyler Denniston¹, Damon Doucet¹, Bradley C. Kuszmaul¹, I-Ting Angelina Lee², Charles E. Leiserson¹ - Show less +2 more•Institutions (2)

Massachusetts Institute of Technology¹, Washington University in St. Louis²

12 Jun 2018

TL;DR: A standard API is defined for CSI and modified LLVM to insert CSI hooks into the compiler's internal representation of the program, which allows many compiler-based tools to be written as simple libraries without modifying the compiler, lowering the bar for the development of dynamic-analysis tools.

...read moreread less

Abstract: The CSI framework provides comprehensive static instrumentation that a compiler can insert into a program-under-test so that dynamic-analysis tools - memory checkers, race detectors, cache simulators, performance profilers, code-coverage analyzers, etc. - can observe and investigate runtime behavior. Heretofore, tools based on compiler instrumentation would each separately modify the compiler to insert their own instrumentation. In contrast, CSI inserts a standard collection of instrumentation hooks into the program-under-test. Each CSI-tool is implemented as a library that defines relevant hooks, and the remaining hooks are "nulled" out and elided during either compile-time or link-time optimization, resulting in instrumented runtimes on par with custom instrumentation. CSI allows many compiler-based tools to be written as simple libraries without modifying the compiler, lowering the bar for the development of dynamic-analysis tools. We have defined a standard API for CSI and modified LLVM to insert CSI hooks into the compiler's internal representation (IR) of the program. The API organizes IR objects - such as functions, basic blocks, and memory accesses - into flat and compact ID spaces, which not only simplifies the building of tools, but surprisingly enables faster maintenance of IR-object data than do traditional hash tables. CSI hooks contain a "property" parameter that allows tools to customize behavior based on static information without introducing overhead. CSI provides "forensic" tables that tools can use to associate IR objects with source-code locations and to relate IR objects to each other. To evaluate the efficacy of CSI, we implemented six demonstration CSI-tools. One of our studies shows that compiling with CSI and linking with the "null" CSI-tool produces a tool-instrumented executable that is as fast as the original uninstrumented code. Another study, using a CSI port of Google's ThreadSanitizer, shows that the CSI-tool rivals the performance of Google's custom compiler-based implementation. All other demonstration CSI tools slow down the execution of the program-under-test by less than 70%.

...read moreread less

13 citations

Proceedings Article•

Communication-Efficient Parallel Graph Algorithms.

[...]

Charles E. Leiserson, Bruce M. Maggs

01 Jan 1986

TL;DR: In this article, the authors show that many graph problems can be solved in parallel, not only with polylogarithmic performance, but with efficient communication at each step of the computation, and measure the communication requirements of an algorithm in a model called the distributed random-access machine (DRAM), in which communication cost is measured in terms of the congestion of memory accesses across cuts of an underlying network.

...read moreread less

Abstract: : Communication bandwidth is a resource ignored by most parallel random-access machine (PRAM) models. This paper shows that many graph problems can be solved in parallel, not only with polylogarithmic performance, but with efficient communication at each step of the computation. We measure the communication requirements of an algorithm in a model called the distributed random-access machine (DRAM), in which communication cost is measured in terms of the congestion of memory accesses across cuts of an underlying network. The algorithms are based on a communication-efficient variant of the tree contraction technique due to Miller and Reif. (Author)

...read moreread less

11 citations

Work Stealing with Parallelism Feedback

[...]

Kunal Agrawal, Yuxiong He, Charles E. Leiserson

01 Jan 2008

TL;DR: A randomized work-stealing thread scheduler for fork-join multithreaded jobs that provides continual parallelism feedback to the job scheduler in the form of processor requests is presented and it is indicated that A-STEAL provides almost perfect linear speedup across a variety of processor availability profiles.

...read moreread less

Abstract: We present a randomized work-stealing thread scheduler for fork-join multithreaded jobs that provides continual parallelism feedback to the job scheduler in the form of processor requests. Our A-STEAL algorithm is appropriate for large parallel servers where many jobs share a common multiprocessor resource and in which the number of processors available to a particular job may vary during the job’s execution. Assuming that the job scheduler never allots the job more processors than requested by the job’s thread scheduler, A-STEAL guarantees that the job completes in near-optimal time while utilizing at least a constant fraction of the allotted processors. Our analysis models the job scheduler as the thread scheduler’s adversary, challenging the thread scheduler to be robust to the system environment and the job scheduler’s administrative policies. For example, the job scheduler can make available a huge number of processors exactly when the job has little use for them. To analyze the performance of our adaptive thread scheduler under this stringent adversarial assumption, we introduce a new technique called “trim analysis,” which allows us to prove that our thread scheduler performs poorly on at most a small number of time steps, exhibiting near-optimal behavior on the vast majority. To be more precise, suppose that a job has work T1 and critical-path length T∞. On a machine with P processors, A-STEAL completes the job in expected O(T1/P +T∞+ L lgP ) time steps, where L is the length of a scheduling quantum and P denotes the O(T∞ +L lgP )-trimmed availability. This quantity is the average of the processor availability over all but the O(T∞ + L lgP ) time steps having the highest processor availability. When the job’s parallelism dominates the trimmed available, that is, P ≪ T1/T∞, the job achieves nearly perfect linear speedup. Conversely, when the trimmed mean dominates This research was supported in part by the Singapore-MIT Alliance and NSF Grants ACI-0324974 and CNS-0305606. Yuxiong He is a Visiting Scholar at MIT CSAIL and a Ph.D. candidate at the National University of Singapore. the parallelism, the asymptotic running time of the job is nearly the length of its critical path. We measured the performance of A-STEAL on a simulated multiprocessor system using synthetic workloads. For jobs with sufficient parallelism, our experiments indicate that A-STEAL provides almost perfect linear speedup across a variety of processor availability profiles. In these experiments, A-STEAL typically wasted less than 20% of the processor cycles allotted to the job. We compared A-STEAL with the ABP algorithm, an adaptive workstealing thread scheduler developed by Arora, Blumofe, and Plaxton that does not employ parallelism feedback. On moderateto heavy-loaded large machines with predetermined availability profiles, A-STEAL typically completed jobs more than twice as quickly, despite being allotted the same or fewer processors on every step, while wasting only 10% of the processor cycles wasted by ABP.

...read moreread less

11 citations

Journal Article•

There’s plenty of room at the Top

[...]

Charles E. Leiserson, Neil C. Thompson, Joel Emer, Bradley C. Kuszmaul, Butler W. Lampson, Daniel Sanchez, Tao B. Schardl - Show less +3 more

05 Jun 2020-Science

TL;DR: Channelling chemical intuition to conquer larger scales is a step closer to real-time decision-making that will help scientists and mathematicians tackle the challenges of climate change.

...read moreread less

Abstract: The miniaturization of semiconductor transistors has driven the growth in computer performance for more than 50 years. As miniaturization approaches its limits, bringing an end to Moore’s law, performance gains will need to come from software, algorithms, and hardware. We refer to these technologies as the “Top” of the computing stack to distinguish them from the traditional technologies at the “Bottom”: semiconductor physics and silicon-fabrication technology. In the post-Moore era, the Top will provide substantial performance gains, but these gains will be opportunistic, uneven, and sporadic, and they will suffer from the law of diminishing returns. Big system components offer a promising context for tackling the challenges of working at the Top.

...read moreread less

11 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
…
24
25
26
27
28
29
30
…
31
32
33
34
35
36
37
38

Collapse

Cited by

PDF

Open Access

More filters

Johnson: Computers and Intractability-A Guide to the Theory of NP-Completeness

[...]

Michael Randolph Garey

01 Jan 1979

42,654 citations

Book•

Handbook of Applied Cryptography

[...]

Alfred Menezes, Scott A. Vanstone¹, Paul C. van Oorschot¹•Institutions (1)

University of Waterloo¹

01 Jan 1996

TL;DR: A valuable reference for the novice as well as for the expert who needs a wider scope of coverage within the area of cryptography, this book provides easy and rapid access of information and includes more than 200 algorithms and protocols.

...read moreread less

Abstract: From the Publisher: A valuable reference for the novice as well as for the expert who needs a wider scope of coverage within the area of cryptography, this book provides easy and rapid access of information and includes more than 200 algorithms and protocols; more than 200 tables and figures; more than 1,000 numbered definitions, facts, examples, notes, and remarks; and over 1,250 significant references, including brief comments on each paper.

...read moreread less

13,597 citations

Proceedings Article•

ROUGE: A Package for Automatic Evaluation of Summaries

[...]

Chin-Yew Lin¹•Institutions (1)

Information Sciences Institute¹

25 Jul 2004

TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.

...read moreread less

Abstract: ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. It includes measures to automatically determine the quality of a summary by comparing it to other (ideal) summaries created by humans. The measures count the number of overlapping units such as n-gram, word sequences, and word pairs between the computer-generated summary to be evaluated and the ideal summaries created by humans. This paper introduces four different ROUGE measures: ROUGE-N, ROUGE-L, ROUGE-W, and ROUGE-S included in the ROUGE summarization evaluation package and their evaluations. Three of them have been used in the Document Understanding Conference (DUC) 2004, a large-scale summarization evaluation sponsored by NIST.

...read moreread less

9,293 citations

Proceedings Article•DOI•

RADAR: an in-building RF-based user location and tracking system

[...]

Paramvir Bahl¹, Venkata N. Padmanabhan¹•Institutions (1)

Microsoft¹

26 Mar 2000

TL;DR: RADAR is presented, a radio-frequency (RF)-based system for locating and tracking users inside buildings that combines empirical measurements with signal propagation modeling to determine user location and thereby enable location-aware services and applications.

...read moreread less

Abstract: The proliferation of mobile computing devices and local-area wireless networks has fostered a growing interest in location-aware systems and services. In this paper we present RADAR, a radio-frequency (RF)-based system for locating and tracking users inside buildings. RADAR operates by recording and processing signal strength information at multiple base stations positioned to provide overlapping coverage in the area of interest. It combines empirical measurements with signal propagation modeling to determine user location and thereby enable location-aware services and applications. We present experimental results that demonstrate the ability of RADAR to estimate user location with a high degree of accuracy.

...read moreread less

8,667 citations

Journal Article•DOI•

A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3

[...]

Pablo Cingolani¹, Adrian E. Platts², Le Lily Wang¹, M. Coon¹, Tung T. Nguyen¹, Luan Wang¹, Susan Land¹, Xiangyi Lu¹, Douglas M. Ruden¹ - Show less +5 more•Institutions (2)

Wayne State University¹, McGill University²

01 Apr 2012-Fly

TL;DR: It appears that the 5′ and 3′ UTRs are reservoirs for genetic variations that changes the termini of proteins during evolution of the Drosophila genus.

...read moreread less

Abstract: We describe a new computer program, SnpEff, for rapidly categorizing the effects of variants in genome sequences. Once a genome is sequenced, SnpEff annotates variants based on their genomic locations and predicts coding effects. Annotated genomic locations include intronic, untranslated region, upstream, downstream, splice site, or intergenic regions. Coding effects such as synonymous or non-synonymous amino acid replacement, start codon gains or losses, stop codon gains or losses, or frame shifts can be predicted. Here the use of SnpEff is illustrated by annotating ~356,660 candidate SNPs in ~117 Mb unique sequences, representing a substitution rate of ~1/305 nucleotides, between the Drosophila melanogaster w1118; iso-2; iso-3 strain and the reference y1; cn1 bw1 sp1 strain. We show that ~15,842 SNPs are synonymous and ~4,467 SNPs are non-synonymous (N/S ~0.28). The remaining SNPs are in other categories, such as stop codon gains (38 SNPs), stop codon losses (8 SNPs), and start codon gains (297 SNPs) in...

...read moreread less

8,017 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse