Pregel: a system for large-scale graph processing

doi:10.1145/1807167.1807184

Proceedings ArticleDOI

Pregel: a system for large-scale graph processing

Grzegorz Malewicz, +6 more

- pp 135-146

Chats0

TLDR

A model for processing large graphs that has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier.

Abstract:

Many practical computing problems concern large graphs. Standard examples include the Web graph and various social networks. The scale of these graphs - in some cases billions of vertices, trillions of edges - poses challenges to their efficient processing. In this paper we present a computational model suitable for this task. Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges or mutate graph topology. This vertex-centric approach is flexible enough to express a broad set of algorithms. The model has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier. Distribution-related details are hidden behind an abstract API. The result is a framework for processing large graphs that is expressive and easy to program.

Citations

PDF

Open Access

More filters

Proceedings Article

Serverless computation with openLambda

Scott Hendrickson, +5 more

TL;DR: OpenLambda as mentioned in this paper is an open-source platform for building next-generation web services and applications in the burgeoning model of serverless computation, and describes the key aspects and challenges that must be addressed in the design and implementation of such systems.

...read moreread less

Proceedings ArticleDOI

HAMA: An Efficient Matrix Computation with the MapReduce Framework

Sangwon Seo, +5 more

TL;DR: The state-of-the-art framework providing high-level matrix computation primitives with MapReduce is explored through the case study approach, and these primitives are demonstrated with different computation engines to show the performance and scalability.

...read moreread less

Journal ArticleDOI

Biscuit: a framework for near-data processing of big data workloads

Bon-Cheol Gu, +11 more

TL;DR: This work presents Biscuit, a novel near-data processing framework designed for modern solid-state drives that allows programmers to write a data-intensive application to run on the host system and the storage system in a distributed, yet seamless manner.

...read moreread less

Proceedings ArticleDOI

GraphR: Accelerating Graph Processing Using ReRAM

Linghao Song, +4 more

TL;DR: GRAPHR as discussed by the authors is the first ReRAM-based graph processing accelerator, which is based on the principle of near-data processing and explores the opportunity of performing massive parallel analog operations with low hardware and energy cost.

...read moreread less

Journal ArticleDOI

Clash of the titans: MapReduce vs. Spark for large scale data analytics

Juwei Shi, +6 more

TL;DR: This paper evaluates the major architectural components in MapReduce and Spark frameworks including: shuffle, execution model, and caching, by using a set of important analytic workloads and shows that Map Reduce's execution model is more efficient for shuffling data than Spark, thus making Sort run faster on MapReduces.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

A note on two problems in connexion with graphs

Edsger W. Dijkstra

- 01 Dec 1959 -

Numerische Mathematik

TL;DR: A tree is a graph with one and only one path between every two nodes, where at least one path exists between any two nodes and the length of each branch is given.

...read moreread less

Journal ArticleDOI

MapReduce: simplified data processing on large clusters

Jeffrey Dean, +1 more

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.

...read moreread less

Journal ArticleDOI

MapReduce: simplified data processing on large clusters

Jeffrey Dean, +1 more

- 01 Jan 2008 -

Communications of The ACM

TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.

...read moreread less

Journal ArticleDOI

The anatomy of a large-scale hypertextual Web search engine

Sergey Brin, +1 more

TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.

...read moreread less

Journal Article

The Anatomy of a Large-Scale Hypertextual Web Search Engine.

Sergey Brin, +1 more

- 01 Jan 1998 -

Computer Networks

TL;DR: Google as discussed by the authors is a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.

...read moreread less

Collapse

Related Papers (5)

MapReduce: simplified data processing on large clusters

Jeffrey Dean, +1 more

- 01 Jan 2008 -

Communications of The ACM

GraphX: graph processing in a distributed dataflow framework

Joseph E. Gonzalez, +5 more

Pregel: a system for large-scale graph processing

Citations

Serverless computation with openLambda

HAMA: An Efficient Matrix Computation with the MapReduce Framework

Biscuit: a framework for near-data processing of big data workloads

GraphR: Accelerating Graph Processing Using ReRAM

Clash of the titans: MapReduce vs. Spark for large scale data analytics

References

A note on two problems in connexion with graphs

MapReduce: simplified data processing on large clusters

MapReduce: simplified data processing on large clusters

The anatomy of a large-scale hypertextual Web search engine

The Anatomy of a Large-Scale Hypertextual Web Search Engine.

Related Papers (5)

MapReduce: simplified data processing on large clusters

PowerGraph: distributed graph-parallel computation on natural graphs

Distributed GraphLab: a framework for machine learning and data mining in the cloud

A bridging model for parallel computation

GraphX: graph processing in a distributed dataflow framework