Proceedings ArticleDOI
MapReduce Triangle Enumeration With Guarantees
Ha-Myung Park,Francesco Silvestri,U Kang,Rasmus Pagh +3 more
- pp 1739-1748
Reads0
Chats0
TLDR
This work is the first to give guarantees on the maximum load of each reducer for an arbitrary input graph, and is competitive with existing methods improving the performance by a factor up to 2X, and can significantly increase the size of datasets that can be processed.Abstract:
We describe an optimal randomized MapReduce algorithm for the problem of triangle enumeration that requires O(E3/2/(M√m) rounds, where m denotes the expected memory size of a reducer and M the total available space. This generalizes the well-known vertex partitioning approach proposed in (Suri and Vassilvitskii, 2011) to multiple rounds, significantly increasing the size of the graphs that can be handled on a given system. We also give new theoretical (high probability) bounds on the work needed in each reducer, addressing the "curse of the last reducer". Indeed, our work is the first to give guarantees on the maximum load of each reducer for an arbitrary input graph. Our experimental evaluation shows the scalability of our approach, that it is competitive with existing methods improving the performance by a factor up to 2X, and that it can significantly increase the size of datasets that can be processed.read more
Citations
More filters
Proceedings ArticleDOI
Multicore triangle computations without tuning
Julian Shun,Kanat Tangwongsan +1 more
TL;DR: This paper describes the design and implementation of simple and fast multicore parallel algorithms for exact, as well as approximate, triangle counting and other triangle computations that scale to billions of nodes and edges, and is much faster than existing parallel approximate triangle counting implementations.
Proceedings ArticleDOI
TRIÈST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size
TL;DR: This work presents TRIEST, a suite of one-pass streaming algorithms to compute unbiased, low-variance, high-quality approximations of the global and local number of triangles in a fully-dynamic graph represented as an adversarial stream of edge insertions and deletions.
Proceedings ArticleDOI
MASCOT: Memory-efficient and Accurate Sampling for Counting Local Triangles in Graph Streams
Yongsub Lim,U Kang +1 more
TL;DR: MASCOT is proposed, a memory-efficient and accurate method for local triangle estimation in a graph stream based on edge sampling which achieves both accuracy and memory-efficiency of the two algorithms by an unconditional triangle counting for a new edge, regardless of whether it is sampled or not.
Proceedings ArticleDOI
Fast linear algebra-based triangle counting with KokkosKernels
TL;DR: This paper addresses the IEEE HPEC Static Graph Challenge problem of triangle counting, focusing on obtaining the best parallel performance on a single multicore node, using a linear algebra-based approach that has grown out of work related to miniTri data analytics miniapplication and efforts to pose graph algorithms in the language of linear algebra.
Proceedings ArticleDOI
PTE: Enumerating Trillion Triangles On Distributed Systems
TL;DR: Experimental results show that PTE provides up to 47 times faster performance than recent distributed algorithms on real world graphs, and succeeds in enumerating more than 3 trillion triangles on the ClueWeb12 graph, which any previous triangle computation algorithm fail to process.
References
More filters
Journal ArticleDOI
Collective dynamics of small-world networks
TL;DR: Simple models of networks that can be tuned through this middle ground: regular networks ‘rewired’ to introduce increasing amounts of disorder are explored, finding that these systems can be highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs.
Journal ArticleDOI
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Journal ArticleDOI
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Proceedings ArticleDOI
What is Twitter, a social network or a news media?
TL;DR: In this paper, the authors have crawled the entire Twittersphere and found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks.
Proceedings ArticleDOI
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations
TL;DR: This paper describes PEGASUS, an open source Peta Graph Mining library which performs typical graph mining tasks such as computing the diameter of the graph, computing the radius of each node and finding the connected components, and describes a very important primitive for PEGasUS, called GIM-V (Generalized Iterated Matrix-Vector multiplication).