External-memory graph algorithms

doi:10.5555/313651.313681

Home
/
Papers
/
External-memory graph algorithms

Proceedings Article•DOI•

External-memory graph algorithms

Yi-Jen Chiang¹, Michael T. Goodrich², Edward F. Grove³, Roberto Tamassia¹, Darren Erik Vengroff¹, Jeffrey Scott Vitter³ - Show less +2 more•Institutions (3)

Brown University¹, Johns Hopkins University², Duke University³

22 Jan 1995-pp 139-149

TL;DR: A collection of new techniques for designing and analyzing external-memory algorithms for graph problems and illustrating how these techniques can be applied to a wide variety of speci c problems are presented.

read less

Abstract: We present a collection of new techniques for designing and analyzing e cient external-memory algorithms for graph problems and illustrate how these techniques can be applied to a wide variety of speci c problems. Our results include: Proximate-neighboring. We present a simple method for deriving external-memory lower bounds via reductions from a problem we call the \proximate neighbors" problem. We use this technique to derive non-trivial lower bounds for such problems as list ranking, expression tree evaluation, and connected components. PRAM simulation. We give methods for e ciently simulating PRAM computations in external memory, even for some cases in which the PRAM algorithm is not work-optimal. We apply this to derive a number of optimal (and simple) external-memory graph algorithms. Time-forward processing. We present a general technique for evaluating circuits (or \circuit-like" computations) in external memory. We also use this in a deterministic list ranking algorithm. Department of Computer Science, Box 1910, Brown University, Providence, RI 02912{1910. y Supported in part by the National Science Foundation, by the U.S. Army Research O ce, and by the Advanced Research

...read moreread less

Citations

PDF

Open Access

More filters

Report•DOI•

Large-scale Graph Computation on Just a PC

[...]

Aapo Kyrola

01 May 2014

TL;DR: This work presents GraphChi, a disk-based system for computing efficiently on graphs with billions of edges, and builds on the basis of Parallel Sliding Windows to propose a new data structure Partitioned Adjacency Lists, which is used to design an online graph database graphChi-DB.

...read moreread less

Abstract: : Current systems for graph computation require a distributed computing cluster to handle very large real-world problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible developing distributed graph algorithms still remains challenging, especially to non-experts. In this work, we present GraphChi, a disk-based system for computing efficiently on graphs with billions of edges. By using a well-known method to break large graphs into small parts, and a novel Parallel Sliding Windows algorithm, GraphChi is able to execute several advanced data mining, graph mining and machine learning algorithms on very large graphs, using just a single consumer-level computer. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives. We build on the basis of Parallel Sliding Windows to propose a new data structure Partitioned Adjacency Lists, which we use to design an online graph database GraphChi-DB.We demonstrate that, on a single PC, GraphChi-DB can process over one hundred thousand graph updates per second, while simultaneously performing computation. GraphChi-DB compares favorably to existing graph databases, particularly on data that is much larger than the available memory. We evaluate our work both experimentally and theoretically. Based on the Parallel Sliding Windows algorithm, we propose new I/O efficient algorithms for solving fundamental graph problems. We also propose a novel algorithm for simulating billions of random walks in parallel on a single computer. By repeating experiments reported for existing distributed systems we show that with only fraction of the resources, GraphChi can solve the same problems in a very reasonable time. Our work makes large-scale graph computation available to anyone with a modern PC.

...read moreread less

907 citations

Cites background from "External-memory graph algorithms"

...[35], with an I/O bound ofO(min(sort(V (2)), log(V/M)sort(E)))....
[...]
...[35] O((1 + V M )scan(E) + V ) O((1 + V M )scan(E) + V ) PSW (Gauss-Seidel) O(sort(E) + DG(V+E) B ) O(sort(E) + V (V+E) B ) To our knowledge, our implementation of SCC on PSW is the first practical implementation for external memory that works on natural graphs....
[...]

Proceedings Article•DOI•

GraphChi: large-scale graph computation on just a PC

[...]

Aapo Kyrola¹, Guy E. Blelloch¹, Carlos Guestrin²•Institutions (2)

Carnegie Mellon University¹, University of Washington²

08 Oct 2012

TL;DR: GraphChi as mentioned in this paper is a disk-based system for computing efficiently on graphs with billions of edges, using a well-known method to break large graphs into small parts, and a novel parallel sliding windows method.

...read moreread less

Abstract: Current systems for graph computation require a distributed computing cluster to handle very large real-world problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains challenging, especially to non-experts.In this work, we present GraphChi, a disk-based system for computing efficiently on graphs with billions of edges. By using a well-known method to break large graphs into small parts, and a novel parallel sliding windows method, GraphChi is able to execute several advanced data mining, graph mining, and machine learning algorithms on very large graphs, using just a single consumer-level computer. We further extend GraphChi to support graphs that evolve over time, and demonstrate that, on a single computer, GraphChi can process over one hundred thousand graph updates per second, while simultaneously performing computation. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives.By repeating experiments reported for existing distributed systems, we show that, with only fraction of the resources, GraphChi can solve the same problems in very reasonable time. Our work makes large-scale graph computation available to anyone with a modern PC.

...read moreread less

874 citations

Journal Article•DOI•

External memory algorithms and data structures: dealing with massive data

[...]

Jeffrey Scott Vitter¹•Institutions (1)

Duke University¹

01 Jun 2001-ACM Computing Surveys

TL;DR: The state of the art in the design and analysis of external memory algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs is surveyed.

...read moreread less

Abstract: Data sets in large applications are often too massive to fit completely inside the computers internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this article we survey the state of the art in the design and analysis of external memory (or EM) algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs. We consider a variety of EM paradigms for solving batched and online problems efficiently in external memory. For the batched problem of sorting and related problems such as permuting and fast Fourier transform, the key paradigms include distribution and merging. The paradigm of disk striping offers an elegant way to use multiple disks in parallel. For sorting, however, disk striping can be nonoptimal with respect to I/O, so to gain further improvements we discuss distribution and merging techniques for using the disks independently. We also consider useful techniques for batched EM problems involving matrices (such as matrix multiplication and transposition), geometric data (such as finding intersections and constructing convex hulls), and graphs (such as list ranking, connected components, topological sorting, and shortest paths). In the online domain, canonical EM applications include dictionary lookup and range searching. The two important classes of indexed data structures are based upon extendible hashing and B-trees. The paradigms of filtering and bootstrapping provide a convenient means in online data structures to make effective use of the data accessed from disk. We also reexamine some of the above EM problems in slightly different settings, such as when the data items are moving, when the data items are variable-length (e.g., text strings), or when the allocated amount of internal memory can change dynamically. Programming tools and environments are available for simplifying the EM programming task. During the course of the survey, we report on some experiments in the domain of spatial databases using the TPIE system (transparent parallel I/O programming environment). The newly developed EM algorithms and data structures that incorporate the paradigms we discuss are significantly faster than methods currently used in practice.

...read moreread less

751 citations

Cites background or methods from "External-memory graph algorithms"

...Comparision of scalable sweeping-based spatial join (SSSJ) with the original PBSM (QPBSM) a new variant (MPBSM): (a) data set 1 consist of tall and skinny (verticlally aligned) rectangles; (b) data set 2 consist of short and wide (horizontally aligned) rectangles; (c) running times on data set 1; (d) running times on data set 2. improved form O((E/V )Sort(V ))....
[...]
...The resulting I/O cost is O(Nh + Sort(bNh) + Scan(bNh)), which can be amortized against the G(Nh) updates that occurred since the last time the level-h invariant was violated, yielding an amortized update cost of O(1 + (b/B) logmn) I/Os per level....
[...]
...Like Greed Sort, the Sharesort algorithm is theoretically optimal (i.e., within a constant factor of optimal), but the constant factor is larger than the distribution sort methods....
[...]
...For the problem of bundle sorting, in which the N items have a total of K distinct key values (but the secondary information of each item is different), Matias et al. [2000] derive the matching lower bound BundleSort(N, K ) = ....
[...]
...Abello et al. [1998] and Matias et al. [2000] develop optimal distribution sort algorithms for bundle sorting using BundleSort(N, K ) = O(n max{1, logm min{K , n}}) I/Os, and Matias et al. [2000] prove the matching lower bound....
[...]

Journal Article•DOI•

A Survey on PageRank Computing

[...]

Pavel Berkhin¹•Institutions (1)

Yahoo!¹

01 Jan 2005-Internet Mathematics

TL;DR: The theoretical foundations of the PageRank formulation are examined, the acceleration of PageRank computing, in the effects of particular aspects of web graph structure on the optimal organization of computations, and in PageRank stability.

...read moreread less

Abstract: This survey reviews the research related to PageRank computing. Components of a PageRank vector serve as authority weights for web pages independent of their textual content, solely based on the hyperlink structure of the web. PageRank is typically used as a web search ranking component. This defines the importance of the model and the data structures that underly PageRank processing. Computing even a single PageRank is a difficult computational task. Computing many PageRanks is a much more complex challenge. Recently, significant effort has been invested in building sets of personalized PageRank vectors. PageRank is also used in many diverse applications other than ranking. We are interested in the theoretical foundations of the PageRank formulation, in the acceleration of PageRank computing, in the effects of particular aspects of web graph structure on the optimal organization of computations, and in PageRank stability. We also review alternative models that lead to authority indices similar to PageRan...

...read moreread less

479 citations

Journal Article•DOI•

On the sorting-complexity of suffix tree construction

[...]

Martin Farach-Colton¹, Paolo Ferragina², S. Muthukrishnan³•Institutions (3)

Rutgers University¹, University of Pisa², AT&T³

01 Nov 2000-Journal of the ACM

TL;DR: A recursive technique for building suffix trees that yields optimal algorithms in different computational models that match the sorting lower bound and for an alphabet consisting of integers in a polynomial range the authors get the first known linear-time algorithm.

...read moreread less

Abstract: The suffix tree of a string is the fundamental data structure of combinatorial pattern matching. We present a recursive technique for building suffix trees that yields optimal algorithms in different computational models. Sorting is an inherent bottleneck in building suffix trees and our algorithms match the sorting lower bound. Specifically, we present the following results. (1) Weiner [1973], who introduced the data structure, gave an optimal 0(n)-time algorithm for building the suffix tree of an n-character string drawn from a constant-size alphabet. In the comparison model, there is a trivial O(n log n)-time lower bound based on sorting, and Weiner's algorithm matches this bound. For integer alphabets, the fastest known algorithm is the O(n log n)time comparison-based algorithm, but no super-linear lower bound is known. Closing this gap is the main open question in stringology. We settle this open problem by giving a linear time reduction to sorting for building suffix trees. Since sorting is a lower-bound for building suffix trees, this algorithm is time-optimal in every alphabet mode. In particular, for an alphabet consisting of integers in a polynomial range we get the first known linear-time algorithm. (2) All previously known algorithms for building suffix trees exhibit a marked absence of locality of reference, and thus they tend to elicit many page faults (I/Os) when indexing very long strings. They are therefore unsuitable for building suffix trees in secondary storage devices, where I/Os dominate the overall computational cost. We give a linear-I/O reduction to sorting for suffix tree construction. Since sorting is a trivial I/O-lower bound for building suffix trees, our algorithm is I/O-optimal.

...read moreread less

246 citations

Cites methods from "External-memory graph algorithms"

...In the RAM model, ET(T') is computed via an explicit DFS of the tree T', whereas in the DAM model ET(T') is efficiently computed by simulating known PRAMalgorithms [Chiang et al. 1995]....
[...]
...In all computational models, we will use the fact that it is easy to find the least common ancestors of two nodes [Harel and Tarjan 1984; Bender and FarachColton 2000; Chiang et al. 1995]....
[...]
...the DAM model ET(T9) is efficiently computed by simulating known PRAMalgorithms [Chiang et al. 1995]....
[...]
...In all computational models, we will use the fact that it is easy to find the least common ancestors of two nodes [Harel and Tarjan 1984; Bender and Farach-Colton 2000; Chiang et al. 1995]....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

The input/output complexity of sorting and related problems

[...]

Alok Aggarwal¹, S. Vitter Jeffrey²•Institutions (2)

IBM¹, Brown University²

01 Sep 1988-Communications of The ACM

TL;DR: Tight upper and lower bounds are provided for the number of inputs and outputs (I/OS) between internal memory and secondary storage required for five sorting-related problems: sorting, the fast Fourier transform (FFT), permutation networks, permuting, and matrix transposition.

...read moreread less

Abstract: We provide tight upper and lower bounds, up to a constant factor, for the number of inputs and outputs (I/OS) between internal memory and secondary storage required for five sorting-related problems: sorting, the fast Fourier transform (FFT), permutation networks, permuting, and matrix transposition. The bounds hold both in the worst case and in the average case, and in several situations the constant factors match. Secondary storage is modeled as a magnetic disk capable of transferring P blocks each containing B records in a single time unit; the records in each block must be input from or output to B contiguous locations on the disk. We give two optimal algorithms for the problems, which are variants of merge sorting and distribution sorting. In particular we show for P = 1 that the standard merge sorting algorithm is an optimal external sorting method, up to a constant factor in the number of I/Os. Our sorting algorithms use the same number of I/Os as does the permutation phase of key sorting, except when the internal memory size is extremely small, thus affirming the popular adage that key sorting is not faster. We also give a simpler and more direct derivation of Hong and Kung's lower bound for the FFT for the special case B = P = O(1).

...read moreread less

1,344 citations

"External-memory graph algorithms" refers background or methods in this paper

...Indeed, just the problem of implementing various classes of permutation has been a central theme in external-memory I/O research [1, 6, 7, 8, 26]....
[...]
...The proof is an adaptation and generalization of that given by Aggarwal and Vitter [1] for the special case = 1 and c = 0....
[...]
...Early work in externalmemory algorithms for parallel disk systems concentrated largely on fundamental problems such as sorting, matrix multiplication, and FFT [1, 19, 26]....
[...]
...Instead, it is well-known that (perm(N)) I/Os are required in the worst case [1, 26] where...
[...]
...We obtain the given upper bounds by modifying the PRAM algorithms of Tamassia and Vitter [21], and applying the list ranking and the PRAM simulation techniques....
[...]

Journal Article•DOI•

An introduction to disk drive modeling

[...]

Chris Ruemmler¹, John Wilkes¹•Institutions (1)

Hewlett-Packard¹

01 Mar 1994-IEEE Computer

TL;DR: A calibrated, high-quality disk drive model is demonstrated in which the overall error factor is 14 times smaller than that of a simple first-order model, which enables an informed trade-off between effort and accuracy.

...read moreread less

Abstract: Although disk storage densities are improving impressively (60% to 130% compounded annually), performance improvements have been occurring at only about 7% to 10% compounded annually over the last decade. As a result, disk system performance is fast becoming a dominant factor in overall svstem behavior. Naturally, researchers want to improve overall I/O performance, of which a large component is the performance of the disk drive itself. This research often involves using analytical or simulation models to compare alternative approaches, and the quality of these models determines the quality of the conclusions: indeed, the wrong modeling assumptions can lead to erroneous conclusions. Nevertheless, little work has been done to develop or describe accurate disk drive models. This may explain the commonplace use of simple, relatively inaccurate models. We believe there is much room for improvement. This article demonstrates and describes a calibrated, high-quality disk drive model in which the overall error factor is 14 times smaller than that of a simple first-order model. We describe the various disk drive performance components separately, then show how their inclusion improves the simulation model. This enables an informed trade-off between effort and accuracy. In addition, we provide detailed characteristics for two disk drives, as well as a brief description of a simulation environment that uses the disk drive model. >

...read moreread less

938 citations

"External-memory graph algorithms" refers background in this paper

...In coming years we can expect the significance of the I/O bottleneck to increase to the point that we can ill a ord to ignore it, since technological advances are increasing CPU speeds at an annual rate of 40{60% while disk transfer rates are only increasing by 7{10% annually [20]....
[...]

Book•

Deterministic coin tossing with applications to optimal parallel list ranking

[...]

Richard Cole¹, Uzi Vishkin², Uzi Vishkin¹•Institutions (2)

New York University¹, Tel Aviv University²

22 Aug 2011

TL;DR: The algorithms apply a novel “random-like” deterministic technique that provides for a fast and efficient breaking of an apparently symmetric situation in parallel and distributed computation.

...read moreread less

Abstract: The following problem is considered: given a linked list of length n , compute the distance from each element of the linked list to the end of the list. The problem has two standard deterministic algorithms: a linear time serial algorithm, and an O (log n ) time parallel algorithm using n processors. We present new deterministic parallel algorithms for the problem. Our strongest results are (1) O (log n log* n ) time using n /(log n log* n ) processors (this algorithm achieves optimal speed-up); (2) O (log n ) time using n log ( k ) n /log n processors, for any fixed positive integer k . The algorithms apply a novel “random-like” deterministic technique. This technique provides for a fast and efficient breaking of an apparently symmetric situation in parallel and distributed computation.

...read moreread less

474 citations

"External-memory graph algorithms" refers methods in this paper

...Vishkin [25] uses PRAM simulation to facilitate prefetching for various problems, but without taking blocking issues into account....
[...]
...For biconnected components, we adapt the PRAM algorithm of Tarjan and Vishkin [22], which requires generating an arbitrary spanning tree, evaluating an expression tree, and computing connected components of a newly created graph....
[...]
...The use of PRAM simulation for prefetching, without the important consideration of blocking, is explored by Vishkin [25]....
[...]
...The method is based upon a non-trivial adaptation of the deterministic coin tossing technique of Cole and Vishkin [5]....
[...]
...It has also been used by Cole and Vishkin [5], who developed a deterministic version of Anderson and Miller's randomized algorithm....
[...]

Journal Article•DOI•

Algorithms for parallel memory, I: Two-level memories

[...]

Jeffrey Scott Vitter¹, Elizabeth Shriver²•Institutions (2)

Duke University¹, New York University²

01 Jan 1993-Algorithmica

TL;DR: In this article, the authors provided the first optimal algorithms in terms of the number of input/outputs (I/Os) required between internal memory and multiple secondary storage devices for sorting, FFT, matrix transposition, standard matrix multiplication, and related problems.

...read moreread less

Abstract: We provide the first optimal algorithms in terms of the number of input/outputs (I/Os) required between internal memory and multiple secondary storage devices for the problems of sorting, FFT, matrix transposition, standard matrix multiplication, and related problems. Our two-level memory model is new and gives a realistic treatment of {\em parallel block transfer}, in which dureing a single I/O each of the $P$ secondary storage devices can simultaneously transfer a contiguous block of $B$ records. The model pertains to a large-scale uniprocessor system or parallel multiprocessor system with $P$ disks. In addition, the sorting, FFT, permutation network, and standard matrixmultiplication algorithms are typically optimal in terms of the amount of internal processing time. The difficulty in developing optimal algorithms is to cope with the partitioning of memory into $P$ separate physical devices. Our algorithms'' performance can be significantly better than those obtained by the well-known but nonoptimal technique of disk striping. Our optimal sorting algorithm is randomized, but practical; the probability of using more than $\ell$ times the optimal number of I/Os is exponentially small in $\ell$ (log $\ell$)log($M/B$), where $M$ is the internal memory size.

...read moreread less

353 citations

Journal Article•DOI•

Recursive star-tree parallel data structure

[...]

Omer Berkman¹, Uzi Vishkin²•Institutions (2)

King's College London¹, University of Maryland, College Park²

01 Apr 1993-SIAM Journal on Computing

TL;DR: This paper introduces a novel parallel data structure called the recursive star-tree, derived by using recursion in the spirit of the inverse Ackermann function, which allows for extremely fast parallel computations, specifically, $O(\alpha (n)$ time.

...read moreread less

Abstract: This paper introduces a novel parallel data structure called the recursive star-tree (denoted “${}^ * $-tree”). For its definition a generalization of the $ * $ functional is used (where for a function $f * f(n) = \min \{ {i|f^{(i)} (n) \leqslant 1} \}$ and $f^{(i)} $ is the ith iterate of f). Recursive ${}^ * $-trees are derived by using recursion in the spirit of the inverse Ackermann function.The recursive ${}^ * $-tree data structure leads to a new design paradigm for parallel algorithms. This paradigm allows for extremely fast parallel computations, specifically, $O(\alpha (n))$ time (where $\alpha (n)$ is the inverse of the Ackermann function), using an optimal number of processors on the (weakest) concurrent-read, concurrent-write parallel random-access machine (CRCW PRAM).These computations need only constant time, and use an optimal number of processors if the following nonstandard assumption about the model of parallel computation is added to the CRCW PRAM: an extremely small number of processor...

...read moreread less

221 citations