scispace - formally typeset
Search or ask a question

Showing papers by "Shoaib Kamil published in 2013"


Proceedings ArticleDOI
20 May 2013
TL;DR: This work obtains the first communication-optimal algorithm for all dimensions of rectangular matrices by combining the dimension-splitting technique with the recursive BFS/DFS approach, and shows significant speedups over existing parallel linear algebra libraries both on a 32-core shared-memory machine and on a distributed-memory supercomputer.
Abstract: Communication-optimal algorithms are known for square matrix multiplication. Here, we obtain the first communication-optimal algorithm for all dimensions of rectangular matrices. Combining the dimension-splitting technique of Frigo, Leiserson, Prokop and Ramachandran (1999) with the recursive BFS/DFS approach of Ballard, Demmel, Holtz, Lipshitz and Schwartz (2012) allows for a communication-optimal as well as cache and network-oblivious algorithm. Moreover, the implementation is simple: approximately 50 lines of code for the shared-memory version. Since the new algorithm minimizes communication across the network, between NUMA domains, and between levels of cache, it performs well in practice on both shared and distributed-memory machines. We show significant speedups over existing parallel linear algebra libraries both on a 32-core shared-memory machine and on a distributed-memory supercomputer.

132 citations


Proceedings ArticleDOI
20 May 2013
TL;DR: The Selective Embedded JIT Specialization (SEJITS) approach is used to automatically translate semiring operations and filters defined by programmers into a lower-level efficiency language, bypassing the upcall into Python, demonstrating the first known solution to the problem of obtaining high performance from a productivity language when applying graph algorithms selectively on semantic graphs.
Abstract: High performance is a crucial consideration when executing a complex analytic query on a massive semantic graph. In a semantic graph, vertices and edges carry attributes of various types. Analytic queries on semantic graphs typically depend on the values of these attributes; thus, the computation must view the graph through a filter that passes only those individual vertices and edges of interest. Knowledge Discovery Toolbox (KDT), a Python library for parallel graph computations, is customizable in two ways. First, the user can write custom graph algorithms by specifying operations between edges and vertices. These programmer-specified operations are called semiring operations due to KDT's underlying linear-algebraic abstractions. Second, the user can customize existing graph algorithms by writing filters that return true for those vertices and edges the user wants to retain during algorithm execution. For high productivity, both semiring operations and filters are written in a high-level language, resulting in relatively low performance due to the bottleneck of having to call into the Python virtual machine for each vertex and edge. In this work, we use the Selective Embedded JIT Specialization (SEJITS) approach to automatically translate semiring operations and filters defined by programmers into a lower-level efficiency language, bypassing the upcall into Python. We evaluate our approach by comparing it with the high-performance Combinatorial BLAS engine, and show our approach enables users to write in high-level languages and still obtain the high performance of low-level code. We also present a new roofline model for graph traversals, and show that our high-performance implementations do not significantly deviate from the roofline. Overall, we demonstrate the first known solution to the problem of obtaining high performance from a productivity language when applying graph algorithms selectively on semantic graphs.

18 citations