Showing papers by "Charles E. Leiserson published in 2014"

PDF

Open Access

Proceedings Article•DOI•

Standards for Graph Algorithm Primitives

[...]

Timothy G. Mattson¹, David A. Bader², Jon Berry³, Aydin Buluc⁴, Jack Dongarra⁵, Christos Faloutsos⁶, John Feo⁷, John R. Gilbert⁸, Joseph E. Gonzalez⁹, Bruce Hendrickson³, Jeremy Kepner¹⁰, Charles E. Leiserson¹⁰, Andrew Lumsdaine¹¹, David Padua¹², Stephen W. Poole¹³, Steve Reinhardt¹⁴, Michael Stonebraker¹⁰, Steve Wallach, Andrew Yoo¹⁵ - Show less +15 more•Institutions (15)

Intel¹, Georgia Institute of Technology², Sandia National Laboratories³, Lawrence Berkeley National Laboratory⁴, University of Tennessee⁵, Carnegie Mellon University⁶, Pacific Northwest National Laboratory⁷, University of California, Santa Barbara⁸, University of California, Berkeley⁹, Massachusetts Institute of Technology¹⁰, Indiana University¹¹, University of Illinois at Urbana–Champaign¹², Oak Ridge National Laboratory¹³, Cray¹⁴, Lawrence Livermore National Laboratory¹⁵

02 Aug 2014-arXiv: Mathematical Software

TL;DR: This paper is a position paper defining the problem and announcing the intention to launch an open effort to define a standard set of primitive building blocks.

...read moreread less

Abstract: It is our view that the state of the art in constructing a large collection of graph algorithms in terms of linear algebraic operations is mature enough to support the emergence of a standard set of primitive building blocks. This paper is a position paper defining the problem and announcing our intention to launch an open effort to define this standard.

...read moreread less

78 citations

Proceedings Article•DOI•

Ordering heuristics for parallel graph coloring

[...]

William C. Hasenplaugh¹, Tim Kaler¹, Tao B. Schardl¹, Charles E. Leiserson¹•Institutions (1)

Massachusetts Institute of Technology¹

23 Jun 2014

TL;DR: It is proved that JP-LLF and JP-SLL --- JP using the LLF and SLL heuristics, respectively --- execute with the same asymptotic work as JP-R and only logarithmically more span while producing higher-quality colorings thanJP-R in practice.

...read moreread less

Abstract: This paper introduces the largest-log-degree-first (LLF) and smallest-log-degree-last (SLL) ordering heuristics for parallel greedy graph-coloring algorithms, which are inspired by the largest-degree-first (LF) and smallest-degree-last (SL) serial heuristics, respectively. We show that although LF and SL, in practice, generate colorings with relatively small numbers of colors, they are vulnerable to adversarial inputs for which any parallelization yields a poor parallel speedup. In contrast, LLF and SLL allow for provably good speedups on arbitrary inputs while, in practice, producing colorings of competitive quality to their serial analogs. We applied LLF and SLL to the parallel greedy coloring algorithm introduced by Jones and Plassmann, referred to here as JP. Jones and Plassman analyze the variant of JP that processes the vertices of a graph in a random order, and show that on an O(1)-degree graph G=(V,E), this JP-R variant has an expected parallel running time of O(lgV/lglgV) in a PRAM model. We improve this bound to show, using work-span analysis, that JP-R, augmented to handle arbitrary-degree graphs, colors a graph G=(V,E) with degree Delta using Theta(V+E) work and O(lgV+ lg Delta . min sqrt-E, Delta +lg DeltaVlglgV) expected span. We prove that JP-LLF and JP-SLL --- JP using the LLF and SLL heuristics, respectively --- execute with the same asymptotic work as JP-R and only logarithmically more span while producing higher-quality colorings than JP-R in practice. We engineered an efficient implementation of JP for modern shared-memory multicore computers and evaluated its performance on a machine with 12 Intel Core-i7 (Nehalem) processor cores. Our implementation of JP-LLF achieves a geometric-mean speedup of 7.83 on eight real-world graphs and a geometric-mean speedup of 8.08 on ten synthetic graphs, while our implementation using SLL achieves a geometric-mean speedup of 5.36 on these real-world graphs and a geometric-mean speedup of 7.02 on these synthetic graphs. Furthermore, on one processor, JP-LLF is slightly faster than a well-engineered serial greedy algorithm using LF, and likewise, JP-SLL is slightly faster than the greedy algorithm using SL.

...read moreread less

78 citations

Proceedings Article•DOI•

Executing dynamic data-graph computations deterministically using chromatic scheduling

[...]

Tim Kaler¹, William C. Hasenplaugh¹, Tao B. Schardl¹, Charles E. Leiserson¹•Institutions (1)

Massachusetts Institute of Technology¹

23 Jun 2014

TL;DR: A variation of PRISM that executes dynamic data-graph computations deterministically even when updates modify global variables with associative operations is presented, and its implementation is more involved, incorporating a multivector data structure to maintain an ordered set of vertices partitioned by color.

...read moreread less

Abstract: A data-graph computation — popularized by such programming systems as Galois, Pregel, GraphLab, PowerGraph, and GraphChi — is an algorithm that performs local updates on the vertices of a graph. During each round of a data-graph computation, an update function atomically modifies the data associated with a vertex as a function of the vertex's prior data and that of adjacent vertices. A dynamic data-graph computation updates only an active subset of the vertices during a round, and those updates determine the set of active vertices for the next round. This paper introduces PRISM, a chromatic-scheduling algorithm for executing dynamic data-graph computations. PRISM uses a vertex-coloring of the graph to coordinate updates performed in a round, precluding the need for mutual-exclusion locks or other nondeterministic data synchronization. A multibag data structure is used by PRISM to maintain a dynamic set of active vertices as an unordered set partitioned by color. We analyze PRISM using work-span analysis. Let G=(V,E) be a degree-Δ graph colored with Χ colors, and suppose that Q⊆V is the set of active vertices in a round. Define size(Q)=[Q] + Σv∈Qdeg(v), which is proportional to the space required to store the vertices of Q using a sparse-graph layout. We show that a P-processor execution of PRISM performs updates in Q using O(Χ(lg (Q/Χ)+lgΔ)+ lgP) span and Θ(size(Q)+Χ+P) work. These theoretical guarantees are matched by good empirical performance. We modified GraphLab to incorporate PRISM and studied seven application benchmarks on a 12-core multicore machine. PRISM executes the benchmarks 1.2–2.1 times faster than GraphLab's nondeterministic lock-based scheduler while providing deterministic behavior. This paper also presents PRISM-R, a variation of PRISM that executes dynamic data-graph computations deterministically even when updates modify global variables with associative operations. PRISM-R satisfies the same theoretical bounds as PRISM, but its implementation is more involved, incorporating a multivector data structure to maintain an ordered set of vertices partitioned by color.

...read moreread less

21 citations