scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Cache-Oblivious Data Structures and Algorithms for Undirected Breadth-First Search and Shortest Paths

TL;DR: The cache-oblivious SSSP-algorithm takes nearly full advantage of block transfers for dense graphs, and the number of I/Os for sparse graphs is reduced by a factor of nearly sqrt{B}, where B is the cache-block size.
Abstract: We present improved cache-oblivious data structures and algorithms for breadth-first search (BFS) on undirected graphs and the single-source shortest path (SSSP) problem on undirected graphs with non-negative edge weights. For the SSSP problem, our result closes the performance gap between the currently best cache-aware algorithm and the cache-oblivious counterpart. Our cache-oblivious SSSP-algorithm takes nearly full advantage of block transfers for dense graphs. The algorithm relies on a new data structure, called bucket heap , which is the first cache-oblivious priority queue to efficiently support a weak D ECREASE K EY operation. For the BFS problem, we reduce the number of I/Os for sparse graphs by a factor of nearly sqrt{B}, where B is the cache-block size, nearly closing the performance gap between the currently best cache-aware and cache-oblivious algorithms.

Content maybe subject to copyright    Report

Citations
More filters
DissertationDOI
Deepak Ajwani1
01 Jan 2008
TL;DR: A simple algorithm which maintains the topological order of a directed acyclic graph with n nodes under an online edge insertion sequence in O(n2.75) time, independent of the number m of edges inserted, an improvement over the previous best result of O(min{m 3 2 logn, m 3 2 +n2 logn}).
Abstract: The notion of graph traversal is of fundamental importance to solving many computational problems. In many modern applications involving graph traversal such as those arising in the domain of social networks, Internet based services, fraud detection in telephone calls etc., the underlying graph is very large and dynamically evolving. This thesis deals with the design and engineering of traversal algorithms for such graphs. We engineer various I/O-efficient Breadth First Search (BFS) algorithms for massive sparse undirected graphs. Our pipelined implementations with low constant factors, together with some heuristics preserving the worst-case guarantees makes BFS viable on massive graphs. We perform an extensive set of experiments to study the effect of various graph properties such as diameter, initial disk layouts, tuning parameters, disk parallelism, cache-obliviousness etc. on the relative performance of these algorithms. We characterize the performance of NAND flash based storage devices, including many solid state disks. We show that despite the similarities between flash memory and RAM (fast random reads) and between flash disk and hard disk (both are block based devices), the algorithms designed in the RAM model or the external memory model do not realize the full potential of the flash memory devices. We also analyze the effect of misalignments, aging, past I/O patterns, etc. on the performance obtained on these devices. We also consider I/O-efficient BFS algorithms for the case when a hard disk and a solid state disk are used together. We present a simple algorithm which maintains the topological order of a directed acyclic graph with n nodes under an online edge insertion sequence in O(n2.75) time, independent of the number m of edges inserted. For dense DAGs, this is an improvement over the previous best result of O(min{m 3 2 logn,m 3 2 +n2 logn}). While our analysis holds only for the incremental setting, our algorithm itself is fully dynamic.

10 citations

Posted Content
TL;DR: This paper compares two different Cache-Oblivious priority queues based on the Funnel and Bucket Heap and applies them to the single source shortest path problem on graphs with positive edge weights and shows that when RAM is limited and data is swapping to external storage, the Cache- Obliviously priority queues achieve orders of magnitude speedups over standard internal memory techniques.
Abstract: In recent years the Cache-Oblivious model of external memory computation has provided an attractive theoretical basis for the analysis of algorithms on massive datasets. Much progress has been made in discovering algorithms that are asymptotically optimal or near optimal. However, to date there are still relatively few successful experimental studies. In this paper we compare two different Cache-Oblivious priority queues based on the Funnel and Bucket Heap and apply them to the single source shortest path problem on graphs with positive edge weights. Our results show that when RAM is limited and data is swapping to external storage, the Cache-Oblivious priority queues achieve orders of magnitude speedups over standard internal memory techniques. However, for the single source shortest path problem both on simulated and real world graph data, these speedups are markedly lower due to the time required to access the graph adjacency list itself.

7 citations


Cites methods from "Cache-Oblivious Data Structures and..."

  • ...Four Cache-Oblivious priority queues have been developed which we name Arge Heap[4], Funnel Heap[6], Bucket Heap[7] and Buffer Heap[10]....

    [...]

  • ...We show that algorithms not explicitly designed for external memory suffer a dramatic performance penalty compared to the Cache-Oblivious algorithms we implement when data is too large to hold in RAM....

    [...]

Book ChapterDOI
TL;DR: In this paper, the authors focus on realistic computation models that capture the running time of algorithms involving large data sets on modern computers better than the traditional RAM (and its parallel counterpart PRAM) model.
Abstract: Many real-world applications involve storing and processing large amounts of data These data sets need to be either stored over the memory hierarchy of one computer or distributed and processed over many parallel computing devices or both In fact, in many such applications, choosing a realistic computation model proves to be a critical factor in obtaining practically acceptable solutions In this chapter, we focus on realistic computation models that capture the running time of algorithms involving large data sets on modern computers better than the traditional RAM (and its parallel counterpart PRAM) model

5 citations

DOI
01 Jan 2014
TL;DR: This thesis tries to close the gap between theoretically worst-case optimal classical algorithms and the real-world circumstances one face under the assumptions imposed by the data size, limited main memory or available parallelism.
Abstract: Fundamental Algorithms build a basis knowledge for every computer science undergraduate or a professional programmer. It is a set of basic techniques one can find in any (good) coursebook on algorithms and data structures. In this thesis we try to close the gap between theoretically worst-case optimal classical algorithms and the real-world circumstances one face under the assumptions imposed by the data size, limited main memory or available parallelism.

5 citations

Posted Content
TL;DR: This work provides an empirical proof of concept that the overlapping approach for perform- ing sums of products using one global Funnel Heap is more suited than the serialised approach, even when the latter uses the best merging structures available.
Abstract: This work is a comprehensive extension of Abu-Salem et al. (2015) that investigates the prowess of the Funnel Heap for implementing sums of products in the polytope method for factoring polynomials, when the polynomials are in sparse distributed representation. We exploit that the work and cache complexity of an Insert operation using Funnel Heap can be refined to de- pend on the rank of the inserted monomial product, where rank corresponds to its lifetime in Funnel Heap. By optimising on the pattern by which insertions and extractions occur during the Hensel lifting phase of the polytope method, we are able to obtain an adaptive Funnel Heap that minimises all of the work, cache, and space complexity of this phase. Additionally, we conduct a detailed empirical study confirming the superiority of Funnel Heap over the generic Binary Heap once swaps to external memory begin to take place. We demonstrate that Funnel Heap is a more efficient merger than the cache oblivious k-merger, which fails to achieve its optimal (and amortised) cache complexity when used for performing sums of products. This provides an empirical proof of concept that the overlapping approach for perform- ing sums of products using one global Funnel Heap is more suited than the serialised approach, even when the latter uses the best merging structures available.

2 citations


Cites background from "Cache-Oblivious Data Structures and..."

  • ...However, all of those features can also be observed when adopting an alternate cache oblivious priority queue (see for example, (Brodal et al., 2004; Arge et al., 2002))....

    [...]