Pregel: a system for large-scale graph processing

doi:10.1145/1807167.1807184

Proceedings ArticleDOI

Pregel: a system for large-scale graph processing

Grzegorz Malewicz, +6 more

- pp 135-146

Chats0

TLDR

A model for processing large graphs that has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier.

Abstract:

Many practical computing problems concern large graphs. Standard examples include the Web graph and various social networks. The scale of these graphs - in some cases billions of vertices, trillions of edges - poses challenges to their efficient processing. In this paper we present a computational model suitable for this task. Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges or mutate graph topology. This vertex-centric approach is flexible enough to express a broad set of algorithms. The model has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier. Distribution-related details are hidden behind an abstract API. The result is a framework for processing large graphs that is expressive and easy to program.

Citations

PDF

Open Access

More filters

Book

Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

Stephen Boyd, +4 more

TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.

...read moreread less

Proceedings Article

Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing

Matei Zaharia, +8 more

TL;DR: Resilient Distributed Datasets is presented, a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner and is implemented in a system called Spark, which is evaluated through a variety of user applications and benchmarks.

...read moreread less

Journal ArticleDOI

Big Data: A Survey

Min Chen, +2 more

- 01 Apr 2014 -

Mobile Networks and Applications

TL;DR: The background and state-of-the-art of big data are reviewed, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid, as well as related technologies.

...read moreread less

Proceedings ArticleDOI

Apache Hadoop YARN: yet another resource negotiator

Vinod Kumar Vavilapalli, +15 more

TL;DR: The design, development, and current state of deployment of the next generation of Hadoop's compute platform: YARN is summarized, which decouples the programming model from the resource management infrastructure, and delegates many scheduling functions to per-application components.

...read moreread less

Book

Mining of Massive Datasets

Anand Rajaraman, +1 more

TL;DR: This book focuses on practical algorithms that have been used to solve key problems in data mining and which can be used on even the largest datasets, and explains the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing.

...read moreread less

Collapse

References

PDF

Open Access

More filters

BookDOI

Fault-Tolerant Parallel Computation

Paris C. Kanellakis, +1 more

TL;DR: This book presents models for Robust Computation with Shared Memory Randomized Algorithms and Distributed Models and Algorithm and explains how these models can be modified for distributed systems.

...read moreread less

Book

Data structures and algorithms in Java

Peter Drake

Book

Graph Theory and Its Applications, Second Edition (Discrete Mathematics and Its Applications)

Jonathan L. Gross, +1 more

Journal ArticleDOI

Inductive graphs and functional graph algorithms

Martin Erwig

- 01 Sep 2001 -

Journal of Functional Programming

TL;DR: A new style of writing graph algorithms in functional languages which is based on an alternative view of graphs as inductively defined data types is proposed, and it is demonstrated how graph algorithms can be succinctly given by recursive function definitions based on the inductive graph view.

...read moreread less

Proceedings ArticleDOI

Lifting sequential graph algorithms for distributed-memory parallel computation

Douglas Gregor, +1 more

TL;DR: This paper revisits the abstractions comprising the Boost Graph Library in the context of distributed-memory parallelism, lifting away the implicit requirements of sequential execution and a single shared address space and develops general principles and patterns for using (and reusing) generic, object-oriented parallel software libraries.

...read moreread less

Collapse

Related Papers (5)

MapReduce: simplified data processing on large clusters

Jeffrey Dean, +1 more

- 01 Jan 2008 -

Communications of The ACM

GraphX: graph processing in a distributed dataflow framework

Joseph E. Gonzalez, +5 more

Pregel: a system for large-scale graph processing

Citations

Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing

Big Data: A Survey

Apache Hadoop YARN: yet another resource negotiator

Mining of Massive Datasets

References

Fault-Tolerant Parallel Computation

Data structures and algorithms in Java

Graph Theory and Its Applications, Second Edition (Discrete Mathematics and Its Applications)

Inductive graphs and functional graph algorithms

Lifting sequential graph algorithms for distributed-memory parallel computation

Related Papers (5)

MapReduce: simplified data processing on large clusters

PowerGraph: distributed graph-parallel computation on natural graphs

Distributed GraphLab: a framework for machine learning and data mining in the cloud

A bridging model for parallel computation

GraphX: graph processing in a distributed dataflow framework