Proceedings ArticleDOI
Pregel: a system for large-scale graph processing
Grzegorz Malewicz,Matthew H. Austern,Aart J. C. Bik,James C. Dehnert,Ilan Horn,Naty Leiser,Grzegorz Czajkowski +6 more
- pp 135-146
Reads0
Chats0
TLDR
A model for processing large graphs that has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier.Abstract:
Many practical computing problems concern large graphs. Standard examples include the Web graph and various social networks. The scale of these graphs - in some cases billions of vertices, trillions of edges - poses challenges to their efficient processing. In this paper we present a computational model suitable for this task. Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges or mutate graph topology. This vertex-centric approach is flexible enough to express a broad set of algorithms. The model has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier. Distribution-related details are hidden behind an abstract API. The result is a framework for processing large graphs that is expressive and easy to program.read more
Citations
More filters
Journal ArticleDOI
Adapting scientific computing problems to clouds using MapReduce
TL;DR: This work shows how to adapt algorithms from each class into the MapReduce model, what affects the efficiency and scalability of algorithms in each class and allows the judge which framework is more efficient for each of them, by mapping the advantages and disadvantages of the two frameworks.
Book ChapterDOI
Signal/collect: graph algorithms for the (semantic) web
TL;DR: This paper presents the Signal/Collect programming model for synchronous and asynchronous graph algorithms and demonstrates that this abstraction can capture the essence of many algorithms on graphs in a concise and elegant way by giving Signal/ collect adaptations of various relevant algorithms.
Proceedings ArticleDOI
GraM: scaling graph computation to the trillions
Ming Wu,Fan Yang,Jilong Xue,Wencong Xiao,Youshan Miao,Lan Wei,Haoxiang Lin,Yafei Dai,Lidong Zhou +8 more
TL;DR: GraM is an efficient and scalable graph engine for a large class of widely used graph algorithms that is designed to scale up to multicores on a single server, as well as scale out to multiple servers in a cluster, offering significant improvement over existing distributed graph engines on evaluated graph algorithms.
Journal ArticleDOI
Data mining in distributed environment: a survey
TL;DR: In this study, a survey of state‐of‐the‐art DDM techniques is provided, including distributed frequent itemset mining, distributed frequent sequence mining, distributing frequent graphmining, distributed clustering, and privacy preserving of distributed data mining.
Proceedings ArticleDOI
Gluon: a communication-optimizing substrate for distributed heterogeneous graph analytics
Roshan Dathathri,Gurbinder Gill,Loc Hoang,Hoang-Vu Dang,Alex Brooks,Nikoli Dryden,Marc Snir,Keshav Pingali +7 more
TL;DR: This paper introduces a new approach to building distributed-memory graph analytics systems that exploits heterogeneity in processor types (CPU and GPU), partitioning policies, and programming models, and Gluon, a communication-optimizing substrate that enables these programs to run on heterogeneous clusters and optimizes communication in a novel way.
References
More filters
Journal ArticleDOI
A note on two problems in connexion with graphs
TL;DR: A tree is a graph with one and only one path between every two nodes, where at least one path exists between any two nodes and the length of each branch is given.
Journal ArticleDOI
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Journal ArticleDOI
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Journal ArticleDOI
The anatomy of a large-scale hypertextual Web search engine
Sergey Brin,Lawrence Page +1 more
TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.
Journal Article
The Anatomy of a Large-Scale Hypertextual Web Search Engine.
Sergey Brin,Lawrence Page +1 more
TL;DR: Google as discussed by the authors is a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.