scispace - formally typeset
Proceedings ArticleDOI

MapReduce Triangle Enumeration With Guarantees

Reads0
Chats0
TLDR
This work is the first to give guarantees on the maximum load of each reducer for an arbitrary input graph, and is competitive with existing methods improving the performance by a factor up to 2X, and can significantly increase the size of datasets that can be processed.
Abstract
We describe an optimal randomized MapReduce algorithm for the problem of triangle enumeration that requires O(E3/2/(M√m) rounds, where m denotes the expected memory size of a reducer and M the total available space. This generalizes the well-known vertex partitioning approach proposed in (Suri and Vassilvitskii, 2011) to multiple rounds, significantly increasing the size of the graphs that can be handled on a given system. We also give new theoretical (high probability) bounds on the work needed in each reducer, addressing the "curse of the last reducer". Indeed, our work is the first to give guarantees on the maximum load of each reducer for an arbitrary input graph. Our experimental evaluation shows the scalability of our approach, that it is competitive with existing methods improving the performance by a factor up to 2X, and that it can significantly increase the size of datasets that can be processed.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Multicore triangle computations without tuning

TL;DR: This paper describes the design and implementation of simple and fast multicore parallel algorithms for exact, as well as approximate, triangle counting and other triangle computations that scale to billions of nodes and edges, and is much faster than existing parallel approximate triangle counting implementations.
Proceedings ArticleDOI

TRIÈST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size

TL;DR: This work presents TRIEST, a suite of one-pass streaming algorithms to compute unbiased, low-variance, high-quality approximations of the global and local number of triangles in a fully-dynamic graph represented as an adversarial stream of edge insertions and deletions.
Proceedings ArticleDOI

MASCOT: Memory-efficient and Accurate Sampling for Counting Local Triangles in Graph Streams

Yongsub Lim, +1 more
TL;DR: MASCOT is proposed, a memory-efficient and accurate method for local triangle estimation in a graph stream based on edge sampling which achieves both accuracy and memory-efficiency of the two algorithms by an unconditional triangle counting for a new edge, regardless of whether it is sampled or not.
Proceedings ArticleDOI

Fast linear algebra-based triangle counting with KokkosKernels

TL;DR: This paper addresses the IEEE HPEC Static Graph Challenge problem of triangle counting, focusing on obtaining the best parallel performance on a single multicore node, using a linear algebra-based approach that has grown out of work related to miniTri data analytics miniapplication and efforts to pose graph algorithms in the language of linear algebra.
Proceedings ArticleDOI

PTE: Enumerating Trillion Triangles On Distributed Systems

TL;DR: Experimental results show that PTE provides up to 47 times faster performance than recent distributed algorithms on real world graphs, and succeeds in enumerating more than 3 trillion triangles on the ClueWeb12 graph, which any previous triangle computation algorithm fail to process.
References
More filters
Journal ArticleDOI

Collective dynamics of small-world networks

TL;DR: Simple models of networks that can be tuned through this middle ground: regular networks ‘rewired’ to introduce increasing amounts of disorder are explored, finding that these systems can be highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs.
Journal ArticleDOI

MapReduce: simplified data processing on large clusters

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Journal ArticleDOI

MapReduce: simplified data processing on large clusters

TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Proceedings ArticleDOI

What is Twitter, a social network or a news media?

TL;DR: In this paper, the authors have crawled the entire Twittersphere and found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks.
Proceedings ArticleDOI

PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations

TL;DR: This paper describes PEGASUS, an open source Peta Graph Mining library which performs typical graph mining tasks such as computing the diameter of the graph, computing the radius of each node and finding the connected components, and describes a very important primitive for PEGasUS, called GIM-V (Generalized Iterated Matrix-Vector multiplication).