scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Cache-oblivious shortest paths in graphs using buffer heap

27 Jun 2004-pp 245-254
TL;DR: These results appear to give the first non-trivial cache-oblivious bounds for shortest path problems on general graphs and undirected and directed single-source shortest path (SSSP) problems for graphs with non-negative edge-weights.
Abstract: We present the Buffer Heap (BH), a cache-oblivious priority queue that supports Delete-Min, Delete, and Decrease-Key operations in O(1overB log2NoverB) amortized block transfers from external memory, where B is the (unknown) block-size and N is the maximum number of elements in the queue. As is common in cache-oblivious algorithms, we assume a 'tall cache' (i.e., M = Ω(B1 + e), where M is the size of the main memory). We also assume the Decrease-Key operation only verifies that the element does not exist in the priority queue with a smaller key value, hence it also supports the insert operation in the same amortized bound. The amortized time bound for each operation is O(log N). We also present a Cache-Oblivious Tournament Tree (COTT), which is simpler than the Buffer Heap, but has weaker bounds.Using the Buffer Heap we present cache-oblivious algorithms for undirected and directed single-source shortest path (SSSP) problems for graphs with non-negative edge-weights. On a graph with V vertices and E edges, our algorithm for the undirected case performs O(V + EoverB log2VoverB) block transfers and for the directed case performs O((V + EoverB) . log2VoverB) block transfers. The running time of both algorithms is O((V + E). log V).For both priority queues with Decrease-Key operation, and for shortest path problems on general graphs, our results appear to give the first non-trivial cache-oblivious bounds.

Content maybe subject to copyright    Report

Citations
More filters
Book ChapterDOI
08 Jul 2004
TL;DR: An overview of the results achieved on cache-oblivious algorithms and data structures since the seminal paper by Frigo et al. in 1999 is given.
Abstract: Frigo, Leiserson, Prokop and Ramachandran in 1999 introduced the ideal-cache model as a formal model of computation for developing algorithms in environments with multiple levels of caching, and coined the terminology of cache-oblivious algorithms. Cache-oblivious algorithms are described as standard RAM algorithms with only one memory level, i.e. without any knowledge about memory hierarchies, but are analyzed in the two-level I/O model of Aggarwal and Vitter for an arbitrary memory and block size and an optimal off-line cache replacement strategy. The result are algorithms that automatically apply to multi-level memory hierarchies. This paper gives an overview of the results achieved on cache-oblivious algorithms and data structures since the seminal paper by Frigo et al.

113 citations


Cites background from "Cache-oblivious shortest paths in g..."

  • ...Undirected single source shortest path (SSSP) can be solved cache-obliviously in O(V +E/B log(E/B)) I/Os [32, 37], matching the known bounds for the I/O model [51]....

    [...]

Journal ArticleDOI
TL;DR: A carefully implemented cache-oblivious sorting algorithm, which can be faster than the best Quicksort implementation the authors are able to find for input sizes well within the limits of RAM and at least as fast as the recent cache-aware implementations included in the test.
Abstract: This paper is an algorithmic engineering study of cache-oblivious sorting. We investigate by empirical methods a number of implementation issues and parameter choices for the cache-oblivious sorting algorithm Lazy Funnelsort and compare the final algorithm with Quicksort, the established standard for comparison-based sorting, as well as with recent cache-aware proposals. The main result is a carefully implemented cache-oblivious sorting algorithm, which, our experiments show, can be faster than the best Quicksort implementation we are able to find for input sizes well within the limits of RAM. It is also at least as fast as the recent cache-aware implementations included in the test. On disk, the difference is even more pronounced regarding Quicksort and the cache-aware algorithms, whereas the algorithm is slower than a careful implementation of multiway Mergesort, such as TPIE.

59 citations

Book ChapterDOI
08 Jul 2004
TL;DR: In this paper, the authors present improved cache-oblivious data structures and algorithms for breadth-first search and the single-source shortest path problem on undirected graphs with non-negative edge weights.
Abstract: We present improved cache-oblivious data structures and algorithms for breadth-first search and the single-source shortest path problem on undirected graphs with non-negative edge weights. Our results removes the performance gap between the currently best cache-aware algorithms for these problems and their cache-oblivious counterparts. Our shortest-path algorithm relies on a new data structure, called bucket heap, which is the first cache-oblivious priority queue to efficiently support a weak DecreaseKey operation.

48 citations

01 Jan 2004
TL;DR: This work presents improved cache-oblivious data structures and algorithms for breadth-first search and the single-source shortest path problem on undirected graphs with non-negative edge weights and removes the performance gap between the currently best cache-aware algorithms for these problems.

38 citations


Cites methods from "Cache-oblivious shortest paths in g..."

  • ...Independently of our work, the bucket heap as well as a cache-oblivious version of the tournament tree have simultaneously been developed by Chowdhury and Ramachandran [12]....

    [...]

Journal ArticleDOI
TL;DR: An optimal cache-oblivious priority queue data structure, supporting insertion, deletion, and delete-min operations in amortized memory transfers, is developed, as efficient as several previously developed external memory (cache-aware)priority queue data structures.
Abstract: We develop an optimal cache-oblivious priority queue data structure, supporting insertion, deletion, and delete-min operations in $O(\frac{1}{B}\log_{M/B}\frac{N}{B})$ amortized memory transfers, where $M$ and $B$ are the memory and block transfer sizes of any two consecutive levels of a multilevel memory hierarchy. In a cache-oblivious data structure, $M$ and $B$ are not used in the description of the structure. Our structure is as efficient as several previously developed external memory (cache-aware) priority queue data structures, which all rely crucially on knowledge about $M$ and $B$. Priority queues are a critical component in many of the best known external memory graph algorithms, and using our cache-oblivious priority queue we develop several cache-oblivious graph algorithms.

38 citations


Cites background from "Cache-oblivious shortest paths in g..."

  • ...Note that recently, cache-oblivious algorithm for undirected shortest path computation have also been developed [29, 34]....

    [...]

  • ...[29], as well as Chowdhuey and Ramachandran [34], have also developed cache-oblivious priority queues that support updates in the same bound as the I/O-efficient structure of Kumar and Schwabe [42]....

    [...]

  • ...Brodal et al. [29], as well as Chowdhuey and Ramachandran [34], have also developed cache-oblivious priority queues that support updates in the same bound as the I/O-efficient structure of Kumar and Schwabe [42]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A tree is a graph with one and only one path between every two nodes, where at least one path exists between any two nodes and the length of each branch is given.
Abstract: We consider n points (nodes), some or all pairs of which are connected by a branch; the length of each branch is given. We restrict ourselves to the case where at least one path exists between any two nodes. We now consider two problems. Problem 1. Constrnct the tree of minimum total length between the n nodes. (A tree is a graph with one and only one path between every two nodes.) In the course of the construction that we present here, the branches are subdivided into three sets: I. the branches definitely assignec~ to the tree under construction (they will form a subtree) ; II. the branches from which the next branch to be added to set I, will be selected ; III. the remaining branches (rejected or not yet considered). The nodes are subdivided into two sets: A. the nodes connected by the branches of set I, B. the remaining nodes (one and only one branch of set II will lead to each of these nodes), We start the construction by choosing an arbitrary node as the only member of set A, and by placing all branches that end in this node in set II. To start with, set I is empty. From then onwards we perform the following two steps repeatedly. Step 1. The shortest branch of set II is removed from this set and added to

22,704 citations


"Cache-oblivious shortest paths in g..." refers background in this paper

  • ...We also assume the Decrease-Key operation only veri.es that the element does not exist in the priority queue with a smaller key value, hence it also supports the insert op­eration in the same amortized bound....

    [...]

Journal ArticleDOI
TL;DR: Using F-heaps, a new data structure for implementing heaps that extends the binomial queues proposed by Vuillemin and studied further by Brown, the improved bound for minimum spanning trees is the most striking.
Abstract: In this paper we develop a new data structure for implementing heaps (priority queues). Our structure, Fibonacci heaps (abbreviated F-heaps), extends the binomial queues proposed by Vuillemin and studied further by Brown. F-heaps support arbitrary deletion from an n-item heap in O(log n) amortized time and all other standard heap operations in O(1) amortized time. Using F-heaps we are able to obtain improved running times for several network optimization algorithms. In particular, we obtain the following worst-case bounds, where n is the number of vertices and m the number of edges in the problem graph: O(n log n + m) for the single-source shortest path problem with nonnegative edge lengths, improved from O(mlog(m/n+2)n);O(n2log n + nm) for the all-pairs shortest path problem, improved from O(nm log(m/n+2)n);O(n2log n + nm) for the assignment problem (weighted bipartite matching), improved from O(nmlog(m/n+2)n);O(mβ(m, n)) for the minimum spanning tree problem, improved from O(mlog log(m/n+2)n); where β(m, n) = min {i | log(i)n ≤ m/n}. Note that β(m, n) ≤ log*n if m ≥ n.Of these results, the improved bound for minimum spanning trees is the most striking, although all the results give asymptotic improvements for graphs of appropriate densities.

2,484 citations

Proceedings ArticleDOI
24 Oct 1984
TL;DR: The structure, Fibonacci heaps (abbreviated F-heaps), extends the binomial queues proposed by Vuillemin and studied further by Brown to obtain improved running times for several network optimization algorithms.
Abstract: In this paper we develop a new data structure for implementing heaps (priority queues). Our structure, Fibonacci heaps (abbreviated F-heaps), extends the binomial queues proposed by Vuillemin and studied further by Brown. F-heaps support arbitrary deletion from an n-item heap in 0(log n) amortized time and all other standard heap operations in 0(1) amortized time. Using F-heaps we are able to obtain improved running times for several network optimization algorithms.

1,757 citations


"Cache-oblivious shortest paths in g..." refers background in this paper

  • ...We also assume the Decrease-Key operation only veri.es that the element does not exist in the priority queue with a smaller key value, hence it also supports the insert op­eration in the same amortized bound....

    [...]

Journal ArticleDOI
TL;DR: Tight upper and lower bounds are provided for the number of inputs and outputs (I/OS) between internal memory and secondary storage required for five sorting-related problems: sorting, the fast Fourier transform (FFT), permutation networks, permuting, and matrix transposition.
Abstract: We provide tight upper and lower bounds, up to a constant factor, for the number of inputs and outputs (I/OS) between internal memory and secondary storage required for five sorting-related problems: sorting, the fast Fourier transform (FFT), permutation networks, permuting, and matrix transposition. The bounds hold both in the worst case and in the average case, and in several situations the constant factors match. Secondary storage is modeled as a magnetic disk capable of transferring P blocks each containing B records in a single time unit; the records in each block must be input from or output to B contiguous locations on the disk. We give two optimal algorithms for the problems, which are variants of merge sorting and distribution sorting. In particular we show for P = 1 that the standard merge sorting algorithm is an optimal external sorting method, up to a constant factor in the number of I/Os. Our sorting algorithms use the same number of I/Os as does the permutation phase of key sorting, except when the internal memory size is extremely small, thus affirming the popular adage that key sorting is not faster. We also give a simpler and more direct derivation of Hong and Kung's lower bound for the FFT for the special case B = P = O(1).

1,344 citations

Proceedings ArticleDOI
17 Oct 1999
TL;DR: It is proved that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the assumption of optimal replacement in the ideal-cache model can be simulated efficiently by LRU replacement.
Abstract: This paper presents asymptotically optimal algorithms for rectangular matrix transpose, FFT, and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms, these algorithms are cache oblivious: no variables dependent on hardware parameters, such as cache size and cache-line length, need to be tuned to achieve optimality. Nevertheless, these algorithms use an optimal amount of work and move data optimally among multiple levels of cache. For a cache with size Z and cache-line length L where Z=/spl Omega/(L/sup 2/) the number of cache misses for an m/spl times/n matrix transpose is /spl Theta/(1+mn/L). The number of cache misses for either an n-point FFT or the sorting of n numbers is /spl Theta/(1+(n/L)(1+log/sub Z/n)). We also give an /spl Theta/(mnp)-work algorithm to multiply an m/spl times/n matrix by an n/spl times/p matrix that incurs /spl Theta/(1+(mn+np+mp)/L+mnp/L/spl radic/Z) cache faults. We introduce an "ideal-cache" model to analyze our algorithms. We prove that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the assumption of optimal replacement in the ideal-cache model. Can be simulated efficiently by LRU replacement. We also provide preliminary empirical results on the effectiveness of cache-oblivious algorithms in practice.

789 citations