A new parallel algorithm for connected components in dynamic graphs

doi:10.1109/HIPC.2013.6799108

Home
/
Papers
/
A new parallel algorithm for connected components in dynamic graphs

Proceedings Article•DOI•

A new parallel algorithm for connected components in dynamic graphs

Robert McColl¹, Oded Green¹, David A. Bader¹•Institutions (1)

Georgia Institute of Technology¹

01 Dec 2013-pp 246-255

TL;DR: This work presents a novel parallel algorithm for tracking the connected components of a dynamic graph that is up to 128X faster than well-known static algorithms and that the algorithm achieves a 14X parallel speedup on a x86 64-core shared-memory system.

read less

Abstract: Social networks, communication networks, business intelligence databases, and large scientific data sources now contain hundreds of millions elements with billions of relationships. The relationships in these massive datasets are changing at ever-faster rates. Through representing these datasets as dynamic and semantic graphs of vertices and edges, it is possible to characterize the structure of the relationships and to quickly respond to queries about how the elements in the set are connected. Statically computing analytics on snapshots of these dynamic graphs is frequently not fast enough to provide current and accurate information as the graph changes. This has led to the development of dynamic graph algorithms that can maintain analytic information without resorting to full static recomputation. In this work we present a novel parallel algorithm for tracking the connected components of a dynamic graph. Our approach has a low memory requirement of O(V) and is appropriate for all graph densities. On a graph with 512 million edges, we show that our new dynamic algorithm is up to 128X faster than well-known static algorithms and that our algorithm achieves a 14X parallel speedup on a x86 64-core shared-memory system. To the best of the authors' knowledge, this is the first parallel implementation of dynamic connected components that does not eventually require static recomputation.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

The connected-component labeling problem

[...]

Lifeng He¹, Ren Xiwei², Gao Qihang², Xiao Zhao², Bin Yao², Yuyan Chao³ - Show less +2 more•Institutions (3)

Aichi Prefectural University¹, Shaanxi University of Science and Technology², Nagoya Sangyo University³

01 Oct 2017-Pattern Recognition

TL;DR: The state-of-the-art CCL algorithms presented in the last decade are reviewed, the main strategies and algorithms are explained, their pseudo codes are presented, and experimental results are given in order to bring order of the algorithms.

...read moreread less

247 citations

Book Chapter•DOI•

GraphIn: An Online High Performance Incremental Graph Processing Framework

[...]

Dipanjan Sengupta¹, Narayanan Sundaram², Xia Zhu², Theodore L. Willke², Jeffrey Young¹, Matthew Wolf¹, Karsten Schwan¹ - Show less +3 more•Institutions (2)

Georgia Institute of Technology¹, Intel²

24 Aug 2016

TL;DR: A dynamic graph analytics framework, GraphIn, that incrementally processes graphs on-the-fly using fixed-sized batches of updates and a novel programming model called I-GAS based on gather-apply-scatter programming paradigm that allows for implementing a large set of incremental graph processing algorithms seamlessly across multiple CPU cores are proposed.

...read moreread less

Abstract: The massive explosion in social networks has led to a significant growth in graph analytics and specifically in dynamic, time-varying graphs. Most prior work processes dynamic graphs by first storing the updates and then repeatedly running static graph analytics on saved snapshots. To handle the extreme scale and fast evolution of real-world graphs, we propose a dynamic graph analytics framework, GraphIn, that incrementally processes graphs on-the-fly using fixed-sized batches of updates. As part of GraphIn, we propose a novel programming model called I-GAS based on gather-apply-scatter programming paradigm that allows for implementing a large set of incremental graph processing algorithms seamlessly across multiple CPU cores. We further propose a property-based, dual-path execution model to choose between incremental or static computation. Our experiments show that for a variety of graph inputs and algorithms, GraphIn achieves upi?źto 9.3 million updates/sec and over 400$$\times $$ speedup when compared to static graph recomputation.

...read moreread less

58 citations

Cites background from "A new parallel algorithm for connec..."

...in connected components algorithm, insertions can be handled by creating a component graph [8,15] G’ in the Phase II (see Table 1), where each edge insertion (u, v) in G results in an edge in G’ if u and v belong to separate components or results in a self-edge, which is ignored in the component graph....
[...]

Proceedings Article•DOI•

Optimizing Parallel Graph Connectivity Computation via Subgraph Sampling

[...]

Michael Sutton¹, Tal Ben-Nun², Amnon Barak¹•Institutions (2)

Hebrew University of Jerusalem¹, ETH Zurich²

21 May 2018

TL;DR: Afforest is proposed: an extension of the Shiloach-Vishkin connected components algorithm that approaches optimal work efficiency by processing subgraphs in each iteration, and it is shown that the algorithm exhibits higher memory locality than existing methods.

...read moreread less

Abstract: Connected component identification is a fundamental problem in graph analytics, serving as a basis for subsequent computations in a wide range of applications. To determine connectivity, several parallel algorithms, whose complexity is proportional to the number of edges or graph diameter, have been proposed. However, an optimal algorithm may extract graph components by working proportionally to the number of vertices, which can be orders of magnitude lower than the number of edges. We propose Afforest: an extension of the Shiloach-Vishkin connected components algorithm that approaches optimal work efficiency by processing subgraphs in each iteration. We prove the convergence of the algorithm, analyze its work efficiency characteristics, and provide further techniques to speed up processing graphs containing a huge component. Designed with modern parallel architectures in mind, we show that the algorithm exhibits higher memory locality than existing methods. Using both synthetic and real-world graphs, we demonstrate that Afforest achieves speedups of up to 67x over the state-of-the-art on multi-core CPUs (Broadwell, POWER8) and up to 23x on GPUs (Pascal).

...read moreread less

34 citations

Cites background from "A new parallel algorithm for connec..."

...Regardless, in the average case, the expected number of iterations in SV is ∼D/2 [16]....
[...]

Posted Content•DOI•

Practice of Streaming and Dynamic Graphs: Concepts, Models, Systems, and Parallelism.

[...]

Maciej Besta, Marc Fischer, Vasiliki Kalavri, Michael Kapralov, Torsten Hoefler - Show less +1 more

29 Dec 2019

TL;DR: This work provides the first analysis and taxonomy of dynamic and streaming graph processing, focusing on identifying the fundamental system designs and on understanding their support for concurrency and parallelism, and for different graph updates as well as analytics workloads.

...read moreread less

Abstract: Graph processing has become an important part of various areas of computing, including machine learning, medical applications, social network analysis, computational sciences, and others. A growing amount of the associated graph processing workloads are dynamic, with millions of edges added or removed per second. Graph streaming frameworks are specifically crafted to enable the processing of such highly dynamic workloads. Recent years have seen the development of many such frameworks. However, they differ in their general architectures (with key details such as the support for the parallel execution of graph updates, or the incorporated graph data organization), the types of updates and workloads allowed, and many others. To facilitate the understanding of this growing field, we provide the first analysis and taxonomy of dynamic and streaming graph processing. We focus on identifying the fundamental system designs and on understanding their support for concurrency and parallelism, and for different graph updates as well as analytics workloads. We also crystallize the meaning of different concepts associated with streaming graph processing, such as dynamic, temporal, online, and time-evolving graphs, edge-centric processing, models for the maintenance of updates, and graph databases. Moreover, we provide a bridge with the very rich landscape of graph streaming theory by giving a broad overview of recent theoretical related advances, and by analyzing which graph streaming models and settings could be helpful in developing more powerful streaming frameworks and designs. We also outline graph streaming workloads and research challenges.

...read moreread less

33 citations

Additional excerpts

...Betweenness Centrality [104], [199], [192], Triangle Counting [147], Katz Centrality [203], mincuts [133], [89] Connected Components [151], or PageRank [97], [55]....
[...]
...Targeted problems include graph clustering [103], mining periodic cliques [174], search for persistent communities [140], [176], tracking conductance [84], event pattern [166] and subgraph [162] discovery, solving ego-centric queries [161], pattern detection [53], [85], [186], [131], [141], [194], [54], [86], densest subgraph identification [113], frequent subgraph mining [19], dense subgraph detection [145], construction and querying of knowledge graphs [52], stream summarization [92], graph sparsification [11], [25], k-core maintenance [13], shortest paths [193], Betweenness Centrality [104], [199], [192], Triangle Counting [147], Katz Centrality [203], mincuts [133], [89] Connected Components [151], or PageRank [97], [55]....
[...]

Proceedings Article•DOI•

Parallel Batch-Dynamic Graph Connectivity

[...]

Umut A. Acar¹, Daniel Anderson¹, Guy E. Blelloch¹, Laxman Dhulipala¹•Institutions (1)

Carnegie Mellon University¹

17 Jun 2019

TL;DR: In this paper, a parallel batch-dynamic connectivity algorithm for small batch sizes was proposed, which achieves O(log n log(1+n / Δ) expected amortized work per edge insertion and deletion and O( log 3 n) depth w.h.p.

...read moreread less

Abstract: In this paper, we study batch parallel algorithms for the dynamic connectivity problem, a fundamental problem that has received considerable attention in the sequential setting. The best sequential algorithm for dynamic connectivity is the elegant level-set algorithm of Holm, de Lichtenberg and Thorup (HDT), which achieves O(log2 n) amortized time per edge insertion or deletion, and O(log n) time per query. We design a parallel batch-dynamic connectivity algorithm that is work-efficient with respect to the HDT algorithm for small batch sizes, and is asymptotically faster when the average batch size is sufficiently large. Given a sequence of batched updates, where Δ is the average batch size of all deletions, our algorithm achieves O(log n log(1+n / Δ)) expected amortized work per edge insertion and deletion and O(log3 n) depth w.h.p. Our algorithm answers a batch of k connectivity queries in O(k log(1 + n/k)) expected work and O(log n) depth w.h.p. To the best of our knowledge, our algorithm is the first parallel batch-dynamic algorithm for connectivity.

...read moreread less

30 citations

1
2
3
4
…
5
6
7
8

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Emergence of Scaling in Random Networks

[...]

Albert-László Barabási¹, Réka Albert¹•Institutions (1)

University of Notre Dame¹

15 Oct 1999-Science

TL;DR: A model based on these two ingredients reproduces the observed stationary scale-free distributions, which indicates that the development of large networks is governed by robust self-organizing phenomena that go beyond the particulars of the individual systems.

...read moreread less

Abstract: Systems as diverse as genetic networks or the World Wide Web are best described as networks with complex topology. A common property of many large networks is that the vertex connectivities follow a scale-free power-law distribution. This feature was found to be a consequence of two generic mechanisms: (i) networks expand continuously by the addition of new vertices, and (ii) new vertices attach preferentially to sites that are already well connected. A model based on these two ingredients reproduces the observed stationary scale-free distributions, which indicates that the development of large networks is governed by robust self-organizing phenomena that go beyond the particulars of the individual systems.

...read moreread less

33,771 citations

"A new parallel algorithm for connec..." refers background in this paper

...[3] show that the distribution of the edges over the vertices in the graph follows a power law....
[...]

Journal Article•DOI•

The evolution of random graphs

[...]

Béla Bollobás

01 Jan 1984-Transactions of the American Mathematical Society

5,331 citations

On the evolution of random graphs

[...]

Paul Erdös¹, Alfréd Rényi•Institutions (1)

Hungarian Academy of Sciences¹

23 Oct 2011

4,685 citations

Journal Article•DOI•

A faster algorithm for betweenness centrality

[...]

Ulrik Brandes¹•Institutions (1)

University of Konstanz¹

01 Jun 2001-Journal of Mathematical Sociology

TL;DR: New algorithms for betweenness are introduced in this paper and require O(n + m) space and run in O(nm) and O( nm + n2 log n) time on unweighted and weighted networks, respectively, where m is the number of links.

...read moreread less

Abstract: Motivated by the fast‐growing need to compute centrality indices on large, yet very sparse, networks, new algorithms for betweenness are introduced in this paper. They require O(n + m) space and run in O(nm) and O(nm + n2 log n) time on unweighted and weighted networks, respectively, where m is the number of links. Experimental evidence is provided that this substantially increases the range of networks for which centrality analysis is feasible. The betweenness centrality index is essential in the analysis of social networks, but costly to compute. Currently, the fastest known algorithms require ?(n 3) time and ?(n 2) space, where n is the number of actors in the network.

...read moreread less

4,190 citations

"A new parallel algorithm for connec..." refers background in this paper

...in Brandes’s betweenness centrality algorithm [5]....
[...]
...The key differences between our parent-neighbor sub-graph and the parent lists of [5] are that we have placed a bound on the maximum number of adjacencies in the list and that our list also stores adjacent vertices that are on the same level....
[...]

Journal Article•DOI•

Diameter of the World-Wide Web

[...]

Réka Albert¹, Hawoong Jeong¹, Albert-László Barabási¹•Institutions (1)

University of Notre Dame¹

09 Sep 1999-Nature

TL;DR: The World-Wide Web becomes a large directed graph whose vertices are documents and whose edges are links that point from one document to another, which determines the web's connectivity and consequently how effectively the authors can locate information on it.

...read moreread less

Abstract: Despite its increasing role in communication, the World-Wide Web remains uncontrolled: any individual or institution can create a website with any number of documents and links. This unregulated growth leads to a huge and complex web, which becomes a large directed graph whose vertices are documents and whose edges are links (URLs) that point from one document to another. The topology of this graph determines the web's connectivity and consequently how effectively we can locate information on it. But its enormous size (estimated to be at least 8×108 documents1) and the continual changing of documents and links make it impossible to catalogue all the vertices and edges.

...read moreread less

4,135 citations

"A new parallel algorithm for connec..." refers background in this paper

...[1] the authors present the small-world phenomena, which states that in many networks the distance between two vertices is relatively small....
[...]
...It has been shown that social networks have low diameters (maximal length of the shortest path connecting any two vertices) [1]....
[...]