scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

A new parallel algorithm for connected components in dynamic graphs

01 Dec 2013-pp 246-255
TL;DR: This work presents a novel parallel algorithm for tracking the connected components of a dynamic graph that is up to 128X faster than well-known static algorithms and that the algorithm achieves a 14X parallel speedup on a x86 64-core shared-memory system.
Abstract: Social networks, communication networks, business intelligence databases, and large scientific data sources now contain hundreds of millions elements with billions of relationships. The relationships in these massive datasets are changing at ever-faster rates. Through representing these datasets as dynamic and semantic graphs of vertices and edges, it is possible to characterize the structure of the relationships and to quickly respond to queries about how the elements in the set are connected. Statically computing analytics on snapshots of these dynamic graphs is frequently not fast enough to provide current and accurate information as the graph changes. This has led to the development of dynamic graph algorithms that can maintain analytic information without resorting to full static recomputation. In this work we present a novel parallel algorithm for tracking the connected components of a dynamic graph. Our approach has a low memory requirement of O(V) and is appropriate for all graph densities. On a graph with 512 million edges, we show that our new dynamic algorithm is up to 128X faster than well-known static algorithms and that our algorithm achieves a 14X parallel speedup on a x86 64-core shared-memory system. To the best of the authors' knowledge, this is the first parallel implementation of dynamic connected components that does not eventually require static recomputation.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The state-of-the-art CCL algorithms presented in the last decade are reviewed, the main strategies and algorithms are explained, their pseudo codes are presented, and experimental results are given in order to bring order of the algorithms.

247 citations

Book ChapterDOI
24 Aug 2016
TL;DR: A dynamic graph analytics framework, GraphIn, that incrementally processes graphs on-the-fly using fixed-sized batches of updates and a novel programming model called I-GAS based on gather-apply-scatter programming paradigm that allows for implementing a large set of incremental graph processing algorithms seamlessly across multiple CPU cores are proposed.
Abstract: The massive explosion in social networks has led to a significant growth in graph analytics and specifically in dynamic, time-varying graphs. Most prior work processes dynamic graphs by first storing the updates and then repeatedly running static graph analytics on saved snapshots. To handle the extreme scale and fast evolution of real-world graphs, we propose a dynamic graph analytics framework, GraphIn, that incrementally processes graphs on-the-fly using fixed-sized batches of updates. As part of GraphIn, we propose a novel programming model called I-GAS based on gather-apply-scatter programming paradigm that allows for implementing a large set of incremental graph processing algorithms seamlessly across multiple CPU cores. We further propose a property-based, dual-path execution model to choose between incremental or static computation. Our experiments show that for a variety of graph inputs and algorithms, GraphIn achieves upi?źto 9.3 million updates/sec and over 400$$\times $$ speedup when compared to static graph recomputation.

58 citations


Cites background from "A new parallel algorithm for connec..."

  • ...in connected components algorithm, insertions can be handled by creating a component graph [8,15] G’ in the Phase II (see Table 1), where each edge insertion (u, v) in G results in an edge in G’ if u and v belong to separate components or results in a self-edge, which is ignored in the component graph....

    [...]

Proceedings ArticleDOI
21 May 2018
TL;DR: Afforest is proposed: an extension of the Shiloach-Vishkin connected components algorithm that approaches optimal work efficiency by processing subgraphs in each iteration, and it is shown that the algorithm exhibits higher memory locality than existing methods.
Abstract: Connected component identification is a fundamental problem in graph analytics, serving as a basis for subsequent computations in a wide range of applications. To determine connectivity, several parallel algorithms, whose complexity is proportional to the number of edges or graph diameter, have been proposed. However, an optimal algorithm may extract graph components by working proportionally to the number of vertices, which can be orders of magnitude lower than the number of edges. We propose Afforest: an extension of the Shiloach-Vishkin connected components algorithm that approaches optimal work efficiency by processing subgraphs in each iteration. We prove the convergence of the algorithm, analyze its work efficiency characteristics, and provide further techniques to speed up processing graphs containing a huge component. Designed with modern parallel architectures in mind, we show that the algorithm exhibits higher memory locality than existing methods. Using both synthetic and real-world graphs, we demonstrate that Afforest achieves speedups of up to 67x over the state-of-the-art on multi-core CPUs (Broadwell, POWER8) and up to 23x on GPUs (Pascal).

34 citations


Cites background from "A new parallel algorithm for connec..."

  • ...Regardless, in the average case, the expected number of iterations in SV is ∼D/2 [16]....

    [...]

Posted ContentDOI
29 Dec 2019
TL;DR: This work provides the first analysis and taxonomy of dynamic and streaming graph processing, focusing on identifying the fundamental system designs and on understanding their support for concurrency and parallelism, and for different graph updates as well as analytics workloads.
Abstract: Graph processing has become an important part of various areas of computing, including machine learning, medical applications, social network analysis, computational sciences, and others. A growing amount of the associated graph processing workloads are dynamic, with millions of edges added or removed per second. Graph streaming frameworks are specifically crafted to enable the processing of such highly dynamic workloads. Recent years have seen the development of many such frameworks. However, they differ in their general architectures (with key details such as the support for the parallel execution of graph updates, or the incorporated graph data organization), the types of updates and workloads allowed, and many others. To facilitate the understanding of this growing field, we provide the first analysis and taxonomy of dynamic and streaming graph processing. We focus on identifying the fundamental system designs and on understanding their support for concurrency and parallelism, and for different graph updates as well as analytics workloads. We also crystallize the meaning of different concepts associated with streaming graph processing, such as dynamic, temporal, online, and time-evolving graphs, edge-centric processing, models for the maintenance of updates, and graph databases. Moreover, we provide a bridge with the very rich landscape of graph streaming theory by giving a broad overview of recent theoretical related advances, and by analyzing which graph streaming models and settings could be helpful in developing more powerful streaming frameworks and designs. We also outline graph streaming workloads and research challenges.

33 citations


Additional excerpts

  • ...Betweenness Centrality [104], [199], [192], Triangle Counting [147], Katz Centrality [203], mincuts [133], [89] Connected Components [151], or PageRank [97], [55]....

    [...]

  • ...Targeted problems include graph clustering [103], mining periodic cliques [174], search for persistent communities [140], [176], tracking conductance [84], event pattern [166] and subgraph [162] discovery, solving ego-centric queries [161], pattern detection [53], [85], [186], [131], [141], [194], [54], [86], densest subgraph identification [113], frequent subgraph mining [19], dense subgraph detection [145], construction and querying of knowledge graphs [52], stream summarization [92], graph sparsification [11], [25], k-core maintenance [13], shortest paths [193], Betweenness Centrality [104], [199], [192], Triangle Counting [147], Katz Centrality [203], mincuts [133], [89] Connected Components [151], or PageRank [97], [55]....

    [...]

Proceedings ArticleDOI
17 Jun 2019
TL;DR: In this paper, a parallel batch-dynamic connectivity algorithm for small batch sizes was proposed, which achieves O(log n log(1+n / Δ) expected amortized work per edge insertion and deletion and O( log 3 n) depth w.h.p.
Abstract: In this paper, we study batch parallel algorithms for the dynamic connectivity problem, a fundamental problem that has received considerable attention in the sequential setting. The best sequential algorithm for dynamic connectivity is the elegant level-set algorithm of Holm, de Lichtenberg and Thorup (HDT), which achieves O(log2 n) amortized time per edge insertion or deletion, and O(log n) time per query. We design a parallel batch-dynamic connectivity algorithm that is work-efficient with respect to the HDT algorithm for small batch sizes, and is asymptotically faster when the average batch size is sufficiently large. Given a sequence of batched updates, where Δ is the average batch size of all deletions, our algorithm achieves O(log n log(1+n / Δ)) expected amortized work per edge insertion and deletion and O(log3 n) depth w.h.p. Our algorithm answers a batch of k connectivity queries in O(k log(1 + n/k)) expected work and O(log n) depth w.h.p. To the best of our knowledge, our algorithm is the first parallel batch-dynamic algorithm for connectivity.

30 citations

References
More filters
Journal ArticleDOI
15 Oct 1999-Science
TL;DR: A model based on these two ingredients reproduces the observed stationary scale-free distributions, which indicates that the development of large networks is governed by robust self-organizing phenomena that go beyond the particulars of the individual systems.
Abstract: Systems as diverse as genetic networks or the World Wide Web are best described as networks with complex topology. A common property of many large networks is that the vertex connectivities follow a scale-free power-law distribution. This feature was found to be a consequence of two generic mechanisms: (i) networks expand continuously by the addition of new vertices, and (ii) new vertices attach preferentially to sites that are already well connected. A model based on these two ingredients reproduces the observed stationary scale-free distributions, which indicates that the development of large networks is governed by robust self-organizing phenomena that go beyond the particulars of the individual systems.

33,771 citations


"A new parallel algorithm for connec..." refers background in this paper

  • ...[3] show that the distribution of the edges over the vertices in the graph follows a power law....

    [...]

23 Oct 2011

4,685 citations

Journal ArticleDOI
TL;DR: New algorithms for betweenness are introduced in this paper and require O(n + m) space and run in O(nm) and O( nm + n2 log n) time on unweighted and weighted networks, respectively, where m is the number of links.
Abstract: Motivated by the fast‐growing need to compute centrality indices on large, yet very sparse, networks, new algorithms for betweenness are introduced in this paper. They require O(n + m) space and run in O(nm) and O(nm + n2 log n) time on unweighted and weighted networks, respectively, where m is the number of links. Experimental evidence is provided that this substantially increases the range of networks for which centrality analysis is feasible. The betweenness centrality index is essential in the analysis of social networks, but costly to compute. Currently, the fastest known algorithms require ?(n 3) time and ?(n 2) space, where n is the number of actors in the network.

4,190 citations


"A new parallel algorithm for connec..." refers background in this paper

  • ...in Brandes’s betweenness centrality algorithm [5]....

    [...]

  • ...The key differences between our parent-neighbor sub-graph and the parent lists of [5] are that we have placed a bound on the maximum number of adjacencies in the list and that our list also stores adjacent vertices that are on the same level....

    [...]

Journal ArticleDOI
09 Sep 1999-Nature
TL;DR: The World-Wide Web becomes a large directed graph whose vertices are documents and whose edges are links that point from one document to another, which determines the web's connectivity and consequently how effectively the authors can locate information on it.
Abstract: Despite its increasing role in communication, the World-Wide Web remains uncontrolled: any individual or institution can create a website with any number of documents and links. This unregulated growth leads to a huge and complex web, which becomes a large directed graph whose vertices are documents and whose edges are links (URLs) that point from one document to another. The topology of this graph determines the web's connectivity and consequently how effectively we can locate information on it. But its enormous size (estimated to be at least 8×108 documents1) and the continual changing of documents and links make it impossible to catalogue all the vertices and edges.

4,135 citations


"A new parallel algorithm for connec..." refers background in this paper

  • ...[1] the authors present the small-world phenomena, which states that in many networks the distance between two vertices is relatively small....

    [...]

  • ...It has been shown that social networks have low diameters (maximal length of the shortest path connecting any two vertices) [1]....

    [...]