scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

External-memory graph algorithms

TL;DR: A collection of new techniques for designing and analyzing external-memory algorithms for graph problems and illustrating how these techniques can be applied to a wide variety of speci c problems are presented.
Abstract: We present a collection of new techniques for designing and analyzing e cient external-memory algorithms for graph problems and illustrate how these techniques can be applied to a wide variety of speci c problems. Our results include: Proximate-neighboring. We present a simple method for deriving external-memory lower bounds via reductions from a problem we call the \proximate neighbors" problem. We use this technique to derive non-trivial lower bounds for such problems as list ranking, expression tree evaluation, and connected components. PRAM simulation. We give methods for e ciently simulating PRAM computations in external memory, even for some cases in which the PRAM algorithm is not work-optimal. We apply this to derive a number of optimal (and simple) external-memory graph algorithms. Time-forward processing. We present a general technique for evaluating circuits (or \circuit-like" computations) in external memory. We also use this in a deterministic list ranking algorithm. Department of Computer Science, Box 1910, Brown University, Providence, RI 02912{1910. y Supported in part by the National Science Foundation, by the U.S. Army Research O ce, and by the Advanced Research
Citations
More filters
ReportDOI
01 May 2014
TL;DR: This work presents GraphChi, a disk-based system for computing efficiently on graphs with billions of edges, and builds on the basis of Parallel Sliding Windows to propose a new data structure Partitioned Adjacency Lists, which is used to design an online graph database graphChi-DB.
Abstract: : Current systems for graph computation require a distributed computing cluster to handle very large real-world problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible developing distributed graph algorithms still remains challenging, especially to non-experts. In this work, we present GraphChi, a disk-based system for computing efficiently on graphs with billions of edges. By using a well-known method to break large graphs into small parts, and a novel Parallel Sliding Windows algorithm, GraphChi is able to execute several advanced data mining, graph mining and machine learning algorithms on very large graphs, using just a single consumer-level computer. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives. We build on the basis of Parallel Sliding Windows to propose a new data structure Partitioned Adjacency Lists, which we use to design an online graph database GraphChi-DB.We demonstrate that, on a single PC, GraphChi-DB can process over one hundred thousand graph updates per second, while simultaneously performing computation. GraphChi-DB compares favorably to existing graph databases, particularly on data that is much larger than the available memory. We evaluate our work both experimentally and theoretically. Based on the Parallel Sliding Windows algorithm, we propose new I/O efficient algorithms for solving fundamental graph problems. We also propose a novel algorithm for simulating billions of random walks in parallel on a single computer. By repeating experiments reported for existing distributed systems we show that with only fraction of the resources, GraphChi can solve the same problems in a very reasonable time. Our work makes large-scale graph computation available to anyone with a modern PC.

907 citations


Cites background from "External-memory graph algorithms"

  • ...[35], with an I/O bound ofO(min(sort(V (2)), log(V/M)sort(E)))....

    [...]

  • ...[35] O((1 + V M )scan(E) + V ) O((1 + V M )scan(E) + V ) PSW (Gauss-Seidel) O(sort(E) + DG(V+E) B ) O(sort(E) + V (V+E) B ) To our knowledge, our implementation of SCC on PSW is the first practical implementation for external memory that works on natural graphs....

    [...]

Proceedings ArticleDOI
08 Oct 2012
TL;DR: GraphChi as mentioned in this paper is a disk-based system for computing efficiently on graphs with billions of edges, using a well-known method to break large graphs into small parts, and a novel parallel sliding windows method.
Abstract: Current systems for graph computation require a distributed computing cluster to handle very large real-world problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains challenging, especially to non-experts.In this work, we present GraphChi, a disk-based system for computing efficiently on graphs with billions of edges. By using a well-known method to break large graphs into small parts, and a novel parallel sliding windows method, GraphChi is able to execute several advanced data mining, graph mining, and machine learning algorithms on very large graphs, using just a single consumer-level computer. We further extend GraphChi to support graphs that evolve over time, and demonstrate that, on a single computer, GraphChi can process over one hundred thousand graph updates per second, while simultaneously performing computation. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives.By repeating experiments reported for existing distributed systems, we show that, with only fraction of the resources, GraphChi can solve the same problems in very reasonable time. Our work makes large-scale graph computation available to anyone with a modern PC.

874 citations

Journal ArticleDOI
TL;DR: The state of the art in the design and analysis of external memory algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs is surveyed.
Abstract: Data sets in large applications are often too massive to fit completely inside the computers internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this article we survey the state of the art in the design and analysis of external memory (or EM) algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs. We consider a variety of EM paradigms for solving batched and online problems efficiently in external memory. For the batched problem of sorting and related problems such as permuting and fast Fourier transform, the key paradigms include distribution and merging. The paradigm of disk striping offers an elegant way to use multiple disks in parallel. For sorting, however, disk striping can be nonoptimal with respect to I/O, so to gain further improvements we discuss distribution and merging techniques for using the disks independently. We also consider useful techniques for batched EM problems involving matrices (such as matrix multiplication and transposition), geometric data (such as finding intersections and constructing convex hulls), and graphs (such as list ranking, connected components, topological sorting, and shortest paths). In the online domain, canonical EM applications include dictionary lookup and range searching. The two important classes of indexed data structures are based upon extendible hashing and B-trees. The paradigms of filtering and bootstrapping provide a convenient means in online data structures to make effective use of the data accessed from disk. We also reexamine some of the above EM problems in slightly different settings, such as when the data items are moving, when the data items are variable-length (e.g., text strings), or when the allocated amount of internal memory can change dynamically. Programming tools and environments are available for simplifying the EM programming task. During the course of the survey, we report on some experiments in the domain of spatial databases using the TPIE system (transparent parallel I/O programming environment). The newly developed EM algorithms and data structures that incorporate the paradigms we discuss are significantly faster than methods currently used in practice.

751 citations


Cites background or methods from "External-memory graph algorithms"

  • ...Comparision of scalable sweeping-based spatial join (SSSJ) with the original PBSM (QPBSM) a new variant (MPBSM): (a) data set 1 consist of tall and skinny (verticlally aligned) rectangles; (b) data set 2 consist of short and wide (horizontally aligned) rectangles; (c) running times on data set 1; (d) running times on data set 2. improved form O((E/V )Sort(V ))....

    [...]

  • ...The resulting I/O cost is O(Nh + Sort(bNh) + Scan(bNh)), which can be amortized against the G(Nh) updates that occurred since the last time the level-h invariant was violated, yielding an amortized update cost of O(1 + (b/B) logmn) I/Os per level....

    [...]

  • ...Like Greed Sort, the Sharesort algorithm is theoretically opti­mal (i.e., within a constant factor of opti­mal), but the constant factor is larger than the distribution sort methods....

    [...]

  • ...For the problem of bundle sorting, in which the N items have a total of K distinct key values (but the secondary information of each item is different), Matias et al. [2000] derive the matching lower bound BundleSort(N, K ) = ....

    [...]

  • ...Abello et al. [1998] and Matias et al. [2000] develop optimal distribu­tion sort algorithms for bundle sorting us­ing BundleSort(N, K ) = O(n max{1, logm min{K , n}}) I/Os, and Matias et al. [2000] prove the matching lower bound....

    [...]

Journal ArticleDOI
Pavel Berkhin1
TL;DR: The theoretical foundations of the PageRank formulation are examined, the acceleration of PageRank computing, in the effects of particular aspects of web graph structure on the optimal organization of computations, and in PageRank stability.
Abstract: This survey reviews the research related to PageRank computing. Components of a PageRank vector serve as authority weights for web pages independent of their textual content, solely based on the hyperlink structure of the web. PageRank is typically used as a web search ranking component. This defines the importance of the model and the data structures that underly PageRank processing. Computing even a single PageRank is a difficult computational task. Computing many PageRanks is a much more complex challenge. Recently, significant effort has been invested in building sets of personalized PageRank vectors. PageRank is also used in many diverse applications other than ranking. We are interested in the theoretical foundations of the PageRank formulation, in the acceleration of PageRank computing, in the effects of particular aspects of web graph structure on the optimal organization of computations, and in PageRank stability. We also review alternative models that lead to authority indices similar to PageRan...

479 citations

Journal ArticleDOI
TL;DR: A recursive technique for building suffix trees that yields optimal algorithms in different computational models that match the sorting lower bound and for an alphabet consisting of integers in a polynomial range the authors get the first known linear-time algorithm.
Abstract: The suffix tree of a string is the fundamental data structure of combinatorial pattern matching. We present a recursive technique for building suffix trees that yields optimal algorithms in different computational models. Sorting is an inherent bottleneck in building suffix trees and our algorithms match the sorting lower bound. Specifically, we present the following results. (1) Weiner [1973], who introduced the data structure, gave an optimal 0(n)-time algorithm for building the suffix tree of an n-character string drawn from a constant-size alphabet. In the comparison model, there is a trivial O(n log n)-time lower bound based on sorting, and Weiner's algorithm matches this bound. For integer alphabets, the fastest known algorithm is the O(n log n)time comparison-based algorithm, but no super-linear lower bound is known. Closing this gap is the main open question in stringology. We settle this open problem by giving a linear time reduction to sorting for building suffix trees. Since sorting is a lower-bound for building suffix trees, this algorithm is time-optimal in every alphabet mode. In particular, for an alphabet consisting of integers in a polynomial range we get the first known linear-time algorithm. (2) All previously known algorithms for building suffix trees exhibit a marked absence of locality of reference, and thus they tend to elicit many page faults (I/Os) when indexing very long strings. They are therefore unsuitable for building suffix trees in secondary storage devices, where I/Os dominate the overall computational cost. We give a linear-I/O reduction to sorting for suffix tree construction. Since sorting is a trivial I/O-lower bound for building suffix trees, our algorithm is I/O-optimal.

246 citations


Cites methods from "External-memory graph algorithms"

  • ...In the RAM model, ET(T') is computed via an explicit DFS of the tree T', whereas in the DAM model ET(T') is efficiently computed by simulating known PRAM­algorithms [Chiang et al. 1995]....

    [...]

  • ...In all computational models, we will use the fact that it is easy to find the least common ancestors of two nodes [Harel and Tarjan 1984; Bender and FarachColton 2000; Chiang et al. 1995]....

    [...]

  • ...the DAM model ET(T9) is efficiently computed by simulating known PRAMalgorithms [Chiang et al. 1995]....

    [...]

  • ...In all computational models, we will use the fact that it is easy to find the least common ancestors of two nodes [Harel and Tarjan 1984; Bender and Farach-Colton 2000; Chiang et al. 1995]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: Tight upper and lower bounds are provided for the number of inputs and outputs (I/OS) between internal memory and secondary storage required for five sorting-related problems: sorting, the fast Fourier transform (FFT), permutation networks, permuting, and matrix transposition.
Abstract: We provide tight upper and lower bounds, up to a constant factor, for the number of inputs and outputs (I/OS) between internal memory and secondary storage required for five sorting-related problems: sorting, the fast Fourier transform (FFT), permutation networks, permuting, and matrix transposition. The bounds hold both in the worst case and in the average case, and in several situations the constant factors match. Secondary storage is modeled as a magnetic disk capable of transferring P blocks each containing B records in a single time unit; the records in each block must be input from or output to B contiguous locations on the disk. We give two optimal algorithms for the problems, which are variants of merge sorting and distribution sorting. In particular we show for P = 1 that the standard merge sorting algorithm is an optimal external sorting method, up to a constant factor in the number of I/Os. Our sorting algorithms use the same number of I/Os as does the permutation phase of key sorting, except when the internal memory size is extremely small, thus affirming the popular adage that key sorting is not faster. We also give a simpler and more direct derivation of Hong and Kung's lower bound for the FFT for the special case B = P = O(1).

1,344 citations


"External-memory graph algorithms" refers background or methods in this paper

  • ...Indeed, just the problem of implementing various classes of permutation has been a central theme in external-memory I/O research [1, 6, 7, 8, 26]....

    [...]

  • ...The proof is an adaptation and generalization of that given by Aggarwal and Vitter [1] for the special case = 1 and c = 0....

    [...]

  • ...Early work in externalmemory algorithms for parallel disk systems concentrated largely on fundamental problems such as sorting, matrix multiplication, and FFT [1, 19, 26]....

    [...]

  • ...Instead, it is well-known that (perm(N)) I/Os are required in the worst case [1, 26] where...

    [...]

  • ...We obtain the given upper bounds by modifying the PRAM algorithms of Tamassia and Vitter [21], and applying the list ranking and the PRAM simulation techniques....

    [...]

Journal ArticleDOI
TL;DR: A calibrated, high-quality disk drive model is demonstrated in which the overall error factor is 14 times smaller than that of a simple first-order model, which enables an informed trade-off between effort and accuracy.
Abstract: Although disk storage densities are improving impressively (60% to 130% compounded annually), performance improvements have been occurring at only about 7% to 10% compounded annually over the last decade. As a result, disk system performance is fast becoming a dominant factor in overall svstem behavior. Naturally, researchers want to improve overall I/O performance, of which a large component is the performance of the disk drive itself. This research often involves using analytical or simulation models to compare alternative approaches, and the quality of these models determines the quality of the conclusions: indeed, the wrong modeling assumptions can lead to erroneous conclusions. Nevertheless, little work has been done to develop or describe accurate disk drive models. This may explain the commonplace use of simple, relatively inaccurate models. We believe there is much room for improvement. This article demonstrates and describes a calibrated, high-quality disk drive model in which the overall error factor is 14 times smaller than that of a simple first-order model. We describe the various disk drive performance components separately, then show how their inclusion improves the simulation model. This enables an informed trade-off between effort and accuracy. In addition, we provide detailed characteristics for two disk drives, as well as a brief description of a simulation environment that uses the disk drive model. >

938 citations


"External-memory graph algorithms" refers background in this paper

  • ...In coming years we can expect the significance of the I/O bottleneck to increase to the point that we can ill a ord to ignore it, since technological advances are increasing CPU speeds at an annual rate of 40{60% while disk transfer rates are only increasing by 7{10% annually [20]....

    [...]

Book
22 Aug 2011
TL;DR: The algorithms apply a novel “random-like” deterministic technique that provides for a fast and efficient breaking of an apparently symmetric situation in parallel and distributed computation.
Abstract: The following problem is considered: given a linked list of length n , compute the distance from each element of the linked list to the end of the list. The problem has two standard deterministic algorithms: a linear time serial algorithm, and an O (log n ) time parallel algorithm using n processors. We present new deterministic parallel algorithms for the problem. Our strongest results are (1) O (log n log* n ) time using n /(log n log* n ) processors (this algorithm achieves optimal speed-up); (2) O (log n ) time using n log ( k ) n /log n processors, for any fixed positive integer k . The algorithms apply a novel “random-like” deterministic technique. This technique provides for a fast and efficient breaking of an apparently symmetric situation in parallel and distributed computation.

474 citations


"External-memory graph algorithms" refers methods in this paper

  • ...Vishkin [25] uses PRAM simulation to facilitate prefetching for various problems, but without taking blocking issues into account....

    [...]

  • ...For biconnected components, we adapt the PRAM algorithm of Tarjan and Vishkin [22], which requires generating an arbitrary spanning tree, evaluating an expression tree, and computing connected components of a newly created graph....

    [...]

  • ...The use of PRAM simulation for prefetching, without the important consideration of blocking, is explored by Vishkin [25]....

    [...]

  • ...The method is based upon a non-trivial adaptation of the deterministic coin tossing technique of Cole and Vishkin [5]....

    [...]

  • ...It has also been used by Cole and Vishkin [5], who developed a deterministic version of Anderson and Miller's randomized algorithm....

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors provided the first optimal algorithms in terms of the number of input/outputs (I/Os) required between internal memory and multiple secondary storage devices for sorting, FFT, matrix transposition, standard matrix multiplication, and related problems.
Abstract: We provide the first optimal algorithms in terms of the number of input/outputs (I/Os) required between internal memory and multiple secondary storage devices for the problems of sorting, FFT, matrix transposition, standard matrix multiplication, and related problems. Our two-level memory model is new and gives a realistic treatment of {\em parallel block transfer}, in which dureing a single I/O each of the $P$ secondary storage devices can simultaneously transfer a contiguous block of $B$ records. The model pertains to a large-scale uniprocessor system or parallel multiprocessor system with $P$ disks. In addition, the sorting, FFT, permutation network, and standard matrixmultiplication algorithms are typically optimal in terms of the amount of internal processing time. The difficulty in developing optimal algorithms is to cope with the partitioning of memory into $P$ separate physical devices. Our algorithms'' performance can be significantly better than those obtained by the well-known but nonoptimal technique of disk striping. Our optimal sorting algorithm is randomized, but practical; the probability of using more than $\ell$ times the optimal number of I/Os is exponentially small in $\ell$ (log $\ell$)log($M/B$), where $M$ is the internal memory size.

353 citations

Journal ArticleDOI
TL;DR: This paper introduces a novel parallel data structure called the recursive star-tree, derived by using recursion in the spirit of the inverse Ackermann function, which allows for extremely fast parallel computations, specifically, $O(\alpha (n)$ time.
Abstract: This paper introduces a novel parallel data structure called the recursive star-tree (denoted “${}^ * $-tree”). For its definition a generalization of the $ * $ functional is used (where for a function $f * f(n) = \min \{ {i|f^{(i)} (n) \leqslant 1} \}$ and $f^{(i)} $ is the ith iterate of f). Recursive ${}^ * $-trees are derived by using recursion in the spirit of the inverse Ackermann function.The recursive ${}^ * $-tree data structure leads to a new design paradigm for parallel algorithms. This paradigm allows for extremely fast parallel computations, specifically, $O(\alpha (n))$ time (where $\alpha (n)$ is the inverse of the Ackermann function), using an optimal number of processors on the (weakest) concurrent-read, concurrent-write parallel random-access machine (CRCW PRAM).These computations need only constant time, and use an optimal number of processors if the following nonstandard assumption about the model of parallel computation is added to the CRCW PRAM: an extremely small number of processor...

221 citations


"External-memory graph algorithms" refers methods in this paper

  • ...The least common ancestor problem can be reduced to the range minima problem using Euler Tour and list ranking [3]....

    [...]