Showing papers by "Charles E. Leiserson published in 2016"

PDF

Open Access

Proceedings Article•DOI•

Deriving divide-and-conquer dynamic programming algorithms using solver-aided transformations

[...]

Shachar Itzhaky¹, Rohit Singh¹, Armando Solar-Lezama¹, Kuat Yessenov¹, Yongquan Lu¹, Charles E. Leiserson¹, Rezaul Chowdhury² - Show less +3 more•Institutions (2)

Massachusetts Institute of Technology¹, Stony Brook University²

19 Oct 2016

TL;DR: A framework allowing domain experts to manipulate computational terms in the interest of deriving better, more efficient implementations of dynamic programming algorithms that have better locality and are significantly more efficient than traditional loop-based implementations is introduced.

...read moreread less

Abstract: We introduce a framework allowing domain experts to manipulate computational terms in the interest of deriving better, more efficient implementations.It employs deductive reasoning to generate provably correct efficient implementations from a very high-level specification of an algorithm, and inductive constraint-based synthesis to improve automation. Semantic information is encoded into program terms through the use of refinement types. In this paper, we develop the technique in the context of a system called Bellmania that uses solver-aided tactics to derive parallel divide-and-conquer implementations of dynamic programming algorithms that have better locality and are significantly more efficient than traditional loop-based implementations. Bellmania includes a high-level language for specifying dynamic programming algorithms and a calculus that facilitates gradual transformation of these specifications into efficient implementations. These transformations formalize the divide-and conquer technique; a visualization interface helps users to interactively guide the process, while an SMT-based back-end verifies each step and takes care of low-level reasoning required for parallelism. We have used the system to generate provably correct implementations of several algorithms, including some important algorithms from computational biology, and show that the performance is comparable to that of the best manually optimized code.

...read moreread less

33 citations

Proceedings Article•DOI•

AUTOGEN: automatic discovery of cache-oblivious parallel recursive algorithms for solving dynamic programs

[...]

Rezaul Chowdhury¹, Pramod Ganapathi¹, Jesmin Jahan Tithi¹, Charles Bachmeier², Bradley C. Kuszmaul², Charles E. Leiserson², Armando Solar-Lezama², Yuan Tang³ - Show less +4 more•Institutions (3)

Stony Brook University¹, Massachusetts Institute of Technology², Fudan University³

27 Feb 2016

TL;DR: The experimental results show that several autodiscovered algorithms significantly outperform parallel looping and tiled loop-based algorithms and are less sensitive to fluctuations of memory and bandwidth compared with their looping counterparts, and their running times and energy profiles remain relatively more stable.

...read moreread less

Abstract: We present AUTOGEN---an algorithm that for a wide class of dynamic programming (DP) problems automatically discovers highly efficient cache-oblivious parallel recursive divide-and-conquer algorithms from inefficient iterative descriptions of DP recurrences. AUTOGEN analyzes the set of DP table locations accessed by the iterative algorithm when run on a DP table of small size, and automatically identifies a recursive access pattern and a corresponding provably correct recursive algorithm for solving the DP recurrence. We use AUTOGEN to autodiscover efficient algorithms for several well-known problems. Our experimental results show that several autodiscovered algorithms significantly outperform parallel looping and tiled loop-based algorithms. Also these algorithms are less sensitive to fluctuations of memory and bandwidth compared with their looping counterparts, and their running times and energy profiles remain relatively more stable. To the best of our knowledge, AUTOGEN is the first algorithm that can automatically discover new nontrivial divide-and-conquer algorithms.

...read moreread less

23 citations

Journal Article•DOI•

Executing Dynamic Data-Graph Computations Deterministically Using Chromatic Scheduling

[...]

Tim Kaler¹, William C. Hasenplaugh¹, Tao B. Schardl¹, Charles E. Leiserson¹•Institutions (1)

Massachusetts Institute of Technology¹

18 Jul 2016

TL;DR: Prism-R is presented, a variation of Prism that executes dynamic data-graph computations deterministically even when updates modify global variables with associative operations, and is only marginally slower than Prism.

...read moreread less

Abstract: A data-graph computation—popularized by such programming systems as Galois, Pregel, GraphLab, PowerGraph, and GraphChi—is an algorithm that performs local updates on the vertices of a graph. During each round of a data-graph computation, an update function atomically modifies the data associated with a vertex as a function of the vertex’s prior data and that of adjacent vertices. A dynamic data-graph computation updates only an active subset of the vertices during a round, and those updates determine the set of active vertices for the next round.This article introduces Prism, a chromatic-scheduling algorithm for executing dynamic data-graph computations. Prism uses a vertex coloring of the graph to coordinate updates performed in a round, precluding the need for mutual-exclusion locks or other nondeterministic data synchronization. A multibag data structure is used by Prism to maintain a dynamic set of active vertices as an unordered set partitioned by color. We analyze Prism using work-span analysis. Let G = (V, E) be a degree-Δ graph colored with χ colors, and suppose that Q⊆V is the set of active vertices in a round. Define size(Q)= vQv + ∑v∈ Q deg(v), which is proportional to the space required to store the vertices of Q using a sparse-graph layout. We show that a P-processor execution of Prism performs updates in Q using O(χ (lg ( Q/χ ) + lg Δ ) + lg P span and Θ(size(Q) + P) work.These theoretical guarantees are matched by good empirical performance. To isolate the effect of the scheduling algorithm on performance, we modified GraphLab to incorporate Prism and studied seven application benchmarks on a 12-core multicore machine. Prism executes the benchmarks 1.2 to 2.1 times faster than GraphLab’s nondeterministic lock-based scheduler while providing deterministic behavior.This article also presents Prism-R, a variation of Prism that executes dynamic data-graph computations deterministically even when updates modify global variables with associative operations. Prism-R satisfies the same theoretical bounds as Prism, but its implementation is more involved, incorporating a multivector data structure to maintain a deterministically ordered set of vertices partitioned by color. Despite its additional complexity, Prism-R is only marginally slower than Prism. On the seven application benchmarks studied, Prism-R incurs a 7p geometric mean overhead relative to Prism.

...read moreread less

18 citations

Journal Article•DOI•

On the efficiency of localized work stealing

[...]

Warut Suksompong¹, Charles E. Leiserson², Tao B. Schardl²•Institutions (2)

Stanford University¹, Vassar College²

01 Feb 2016-Information Processing Letters

TL;DR: In this article, the authors investigated a variant of the work-stealing algorithm that they call the localized work stealing algorithm, and they showed that the expected running time of the algorithm is T 1 / P + O (T ∞ lg ⁡ P ), and obtained another running time bound based on ratios between the sizes of serial tasks in the computation.

...read moreread less

13 citations

Journal Article•DOI•

Upper Bounds on Number of Steals in Rooted Trees

[...]

Charles E. Leiserson¹, Tao B. Schardl¹, Warut Suksompong²•Institutions (2)

Massachusetts Institute of Technology¹, Stanford University²

01 Feb 2016-Theory of Computing Systems \/ Mathematical Systems Theory

TL;DR: It is shown that if the computation with n processors starts with one processor having a complete k-ary tree of height h (and the remaining n − 1 processors having nothing), the maximum possible number of steals is ∑i=1n (k−1)ihi, and tight upper bounds on the number of thefts are obtained.

...read moreread less

Abstract: Inspired by applications in parallel computing, we analyze the setting of work stealing in multithreaded computations. We obtain tight upper bounds on the number of steals when the computation can be modeled by rooted trees. In particular, we show that if the computation with n processors starts with one processor having a complete k-ary tree of height h (and the remaining n ? 1 processors having nothing), the maximum possible number of steals is ?i=1n(k?1)ihi${\sum }_{i=1}^{n}(k-1)^{i}\binom {h}{i}$.

...read moreread less

6 citations

Journal Article•DOI•

A simple deterministic algorithm for guaranteeing the forward progress of transactions

[...]

Charles E. Leiserson¹•Institutions (1)

Vassar College¹

01 Apr 2016-Information Systems

TL;DR: A remarkably simple deterministic (not probabilistic) contention-management algorithm for guaranteeing the forward progress of transactions - avoiding deadlocks, livelocks, and other anomalies.

...read moreread less

3 citations