MapReduce Triangle Enumeration With Guarantees

doi:10.1145/2661829.2662017

Proceedings ArticleDOI

MapReduce Triangle Enumeration With Guarantees

Ha-Myung Park, +3 more

- pp 1739-1748

Chats0

TLDR

This work is the first to give guarantees on the maximum load of each reducer for an arbitrary input graph, and is competitive with existing methods improving the performance by a factor up to 2X, and can significantly increase the size of datasets that can be processed.

Abstract:

We describe an optimal randomized MapReduce algorithm for the problem of triangle enumeration that requires O(E3/2/(M√m) rounds, where m denotes the expected memory size of a reducer and M the total available space. This generalizes the well-known vertex partitioning approach proposed in (Suri and Vassilvitskii, 2011) to multiple rounds, significantly increasing the size of the graphs that can be handled on a given system. We also give new theoretical (high probability) bounds on the work needed in each reducer, addressing the "curse of the last reducer". Indeed, our work is the first to give guarantees on the maximum load of each reducer for an arbitrary input graph. Our experimental evaluation shows the scalability of our approach, that it is competitive with existing methods improving the performance by a factor up to 2X, and that it can significantly increase the size of datasets that can be processed.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Multicore triangle computations without tuning

Julian Shun, +1 more

TL;DR: This paper describes the design and implementation of simple and fast multicore parallel algorithms for exact, as well as approximate, triangle counting and other triangle computations that scale to billions of nodes and edges, and is much faster than existing parallel approximate triangle counting implementations.

...read moreread less

Proceedings ArticleDOI

TRIÈST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size

Lorenzo De Stefani, +3 more

TL;DR: This work presents TRIEST, a suite of one-pass streaming algorithms to compute unbiased, low-variance, high-quality approximations of the global and local number of triangles in a fully-dynamic graph represented as an adversarial stream of edge insertions and deletions.

...read moreread less

Proceedings ArticleDOI

MASCOT: Memory-efficient and Accurate Sampling for Counting Local Triangles in Graph Streams

Yongsub Lim, +1 more

TL;DR: MASCOT is proposed, a memory-efficient and accurate method for local triangle estimation in a graph stream based on edge sampling which achieves both accuracy and memory-efficiency of the two algorithms by an unconditional triangle counting for a new edge, regardless of whether it is sampled or not.

...read moreread less

Proceedings ArticleDOI

Fast linear algebra-based triangle counting with KokkosKernels

Michael M. Wolf, +4 more

TL;DR: This paper addresses the IEEE HPEC Static Graph Challenge problem of triangle counting, focusing on obtaining the best parallel performance on a single multicore node, using a linear algebra-based approach that has grown out of work related to miniTri data analytics miniapplication and efforts to pose graph algorithms in the language of linear algebra.

...read moreread less

Proceedings ArticleDOI

PTE: Enumerating Trillion Triangles On Distributed Systems

Ha-Myung Park, +2 more

TL;DR: Experimental results show that PTE provides up to 47 times faster performance than recent distributed algorithms on real world graphs, and succeeds in enumerating more than 3 trillion triangles on the ClueWeb12 graph, which any previous triangle computation algorithm fail to process.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Collective dynamics of small-world networks

Duncan J. Watts, +1 more

- 04 Jun 1998 -

Nature

TL;DR: Simple models of networks that can be tuned through this middle ground: regular networks ‘rewired’ to introduce increasing amounts of disorder are explored, finding that these systems can be highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs.

...read moreread less

Journal ArticleDOI

MapReduce: simplified data processing on large clusters

Jeffrey Dean, +1 more

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.

...read moreread less

Journal ArticleDOI

MapReduce: simplified data processing on large clusters

Jeffrey Dean, +1 more

- 01 Jan 2008 -

Communications of The ACM

TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.

...read moreread less

Proceedings ArticleDOI

What is Twitter, a social network or a news media?

Haewoon Kwak, +3 more

TL;DR: In this paper, the authors have crawled the entire Twittersphere and found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks.

...read moreread less

Proceedings ArticleDOI

PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations

U Kang, +2 more

TL;DR: This paper describes PEGASUS, an open source Peta Graph Mining library which performs typical graph mining tasks such as computing the diameter of the graph, computing the radius of each node and finding the connected components, and describes a very important primitive for PEGasUS, called GIM-V (Generalized Iterated Matrix-Vector multiplication).

...read moreread less

Collapse

MapReduce Triangle Enumeration With Guarantees

Citations

Multicore triangle computations without tuning

TRIÈST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size

MASCOT: Memory-efficient and Accurate Sampling for Counting Local Triangles in Graph Streams

Fast linear algebra-based triangle counting with KokkosKernels

PTE: Enumerating Trillion Triangles On Distributed Systems

References

Collective dynamics of small-world networks

MapReduce: simplified data processing on large clusters

MapReduce: simplified data processing on large clusters

What is Twitter, a social network or a news media?

PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations

Related Papers (5)

Counting triangles and the curse of the last reducer

Main-memory triangle computations for very large (sparse (power-law)) graphs

Finding, counting and listing all triangles in large graphs, an experimental study

Graph Twiddling in a MapReduce World

DOULION: counting triangles in massive graphs with a coin