Top 2 papers published by Shoaib Kamil from Adobe Systems in 2013

Proceedings Article•DOI•

Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication

[...]

James Demmel¹, David Eliahu¹, Armando Fox¹, Shoaib Kamil², Benjamin Lipshitz¹, Oded Schwartz¹, Omer Spillinger¹ - Show less +3 more•Institutions (2)

University of California, Berkeley¹, Massachusetts Institute of Technology²

20 May 2013

TL;DR: This work obtains the first communication-optimal algorithm for all dimensions of rectangular matrices by combining the dimension-splitting technique with the recursive BFS/DFS approach, and shows significant speedups over existing parallel linear algebra libraries both on a 32-core shared-memory machine and on a distributed-memory supercomputer.

...read moreread less

Abstract: Communication-optimal algorithms are known for square matrix multiplication. Here, we obtain the first communication-optimal algorithm for all dimensions of rectangular matrices. Combining the dimension-splitting technique of Frigo, Leiserson, Prokop and Ramachandran (1999) with the recursive BFS/DFS approach of Ballard, Demmel, Holtz, Lipshitz and Schwartz (2012) allows for a communication-optimal as well as cache and network-oblivious algorithm. Moreover, the implementation is simple: approximately 50 lines of code for the shared-memory version. Since the new algorithm minimizes communication across the network, between NUMA domains, and between levels of cache, it performs well in practice on both shared and distributed-memory machines. We show significant speedups over existing parallel linear algebra libraries both on a 32-core shared-memory machine and on a distributed-memory supercomputer.

...read moreread less

132 citations

Proceedings Article•DOI•

High-Productivity and High-Performance Analysis of Filtered Semantic Graphs

[...]

Aydin Buluc¹, Erika Duriakova², Armando Fox³, John R. Gilbert⁴, Shoaib Kamil³, Adam Lugowski⁴, Leonid Oliker¹, Samuel Williams¹ - Show less +4 more•Institutions (4)

Lawrence Berkeley National Laboratory¹, University College Dublin², University of California, Berkeley³, University of California, Santa Barbara⁴

20 May 2013

TL;DR: The Selective Embedded JIT Specialization (SEJITS) approach is used to automatically translate semiring operations and filters defined by programmers into a lower-level efficiency language, bypassing the upcall into Python, demonstrating the first known solution to the problem of obtaining high performance from a productivity language when applying graph algorithms selectively on semantic graphs.

...read moreread less

Abstract: High performance is a crucial consideration when executing a complex analytic query on a massive semantic graph. In a semantic graph, vertices and edges carry attributes of various types. Analytic queries on semantic graphs typically depend on the values of these attributes; thus, the computation must view the graph through a filter that passes only those individual vertices and edges of interest. Knowledge Discovery Toolbox (KDT), a Python library for parallel graph computations, is customizable in two ways. First, the user can write custom graph algorithms by specifying operations between edges and vertices. These programmer-specified operations are called semiring operations due to KDT's underlying linear-algebraic abstractions. Second, the user can customize existing graph algorithms by writing filters that return true for those vertices and edges the user wants to retain during algorithm execution. For high productivity, both semiring operations and filters are written in a high-level language, resulting in relatively low performance due to the bottleneck of having to call into the Python virtual machine for each vertex and edge. In this work, we use the Selective Embedded JIT Specialization (SEJITS) approach to automatically translate semiring operations and filters defined by programmers into a lower-level efficiency language, bypassing the upcall into Python. We evaluate our approach by comparing it with the high-performance Combinatorial BLAS engine, and show our approach enables users to write in high-level languages and still obtain the high performance of low-level code. We also present a new roofline model for graph traversals, and show that our high-performance implementations do not significantly deviate from the roofline. Overall, we demonstrate the first known solution to the problem of obtaining high performance from a productivity language when applying graph algorithms selectively on semantic graphs.

...read moreread less

18 citations

Showing papers by "Shoaib Kamil published in 2013"