Towards Systematic Parallelization of Graph Transformations Over Pregel

doi:10.1007/S10766-016-0418-5

Home
/
Papers
/
Towards Systematic Parallelization of Graph Transformations Over Pregel

Journal Article•DOI•

Towards Systematic Parallelization of Graph Transformations Over Pregel

Le-Duc Tung¹, Zhenjiang Hu²•Institutions (2)

Graduate University for Advanced Studies¹, National Institute of Informatics²

01 Apr 2017-International Journal of Parallel Programming (Springer US)-Vol. 45, Iss: 2, pp 320-339

TL;DR: This paper design and implement a high-level parallel programming framework where a domain-specific language (DSL) is provided to ease the programing task and shows that the framework outperforms the original evaluation of structural recursion, and achieves good scalability and speedup for real datasets.

read less

Abstract: Graphs can be used to model many kinds of data, from traditional datasets to social networks or semi-structured datasets. To process large graphs, many systems have been proposed. The Pregel programming model is popular, thanks to its scalability. Although Pregel is simple to understand and use, it is of low-level in programming and requires developers to write programs that are hard to maintain and need to be carefully optimized. On the other hand, structural recursion is powerful to systematically construct efficient parallel programs on lists, arrays and trees, but it has not yet been applied to graphs. In this paper, we propose an efficient method for parallel evaluation of structural recursion on graphs, which is suitable for Pregel. We design and implement a high-level parallel programming framework where a domain-specific language (DSL) is provided to ease the programing task. Specifications written in the DSL are automatically compiled into Pregel programs that are scalable for large graphs. Experimental results show that our framework outperforms the original evaluation of structural recursion, and achieves good scalability and speedup for real datasets.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Think like a vertex, behave like a function! a functional DSL for vertex-centric big graph processing

[...]

Kento Emoto¹, Kiminori Matsuzaki², Zhenjiang Hu³, Akimasa Morihata⁴, Hideya Iwasaki⁵ - Show less +1 more•Institutions (5)

Kyushu Institute of Technology¹, Kochi University of Technology², National Institute of Informatics³, University of Tokyo⁴, University of Electro-Communications⁵

04 Sep 2016

TL;DR: A functional approach to vertex-centric graph processing in which the computation at every vertex is abstracted as a higher-order function and present Fregel, a new domain-specific language that has clear functional semantics, supports declarative description of vertex computation, and can be automatically translated into Pregel.

...read moreread less

Abstract: The vertex-centric programming model, known as “think like a vertex”, is being used more and more to support various big graph processing methods through iterative supersteps that execute in parallel a user-defined vertex program over each vertex of a graph. However, the imperative and message-passing style of existing systems makes defining a vertex program unintuitive. In this paper, we show that one can benefit more from “Thinking like a vertex” by “Behaving like a function” rather than “Acting like a procedure” with full use of side effects and explicit control of message passing, state, and termination. We propose a functional approach to vertex-centric graph processing in which the computation at every vertex is abstracted as a higher-order function and present Fregel, a new domain-specific language. Fregel has clear functional semantics, supports declarative description of vertex computation, and can be automatically translated into Pregel, an emerging imperative-style distributed graph processing framework, and thereby achieve promising performance. Experimental results for several typical examples show the promise of this functional approach.

...read moreread less

13 citations

Journal Article•DOI•

Distributing Relational Model Transformation on MapReduce

[...]

Amine Benelallam¹, Abel Gómez², Massimo Tisi³, Jordi Cabot²•Institutions (3)

University of Rennes¹, Open University of Catalonia², Centre national de la recherche scientifique³

01 Aug 2018-Journal of Systems and Software

TL;DR: This paper exploits the high level of abstraction of an existing relational MT language, ATL, and the semantics of a distributed programming model, MapReduce, to build an ATL engine with implicitly distributed execution.

...read moreread less

11 citations

Proceedings Article•DOI•

Towards transparent combination of model management execution strategies for low-code development platforms

[...]

Jolan Philippe, Hélène Coullon, Massimo Tisi, Gerson Sunyé¹•Institutions (1)

University of Nantes¹

16 Oct 2020

TL;DR: This paper motivates the need for a transparent multi-strategy execution mode for model-management operations, presents an overview of the different computational strategies used in the model-driven engineering ecosystem, and uses a running example to introduce the benefits of mixing strategies for performing a single computation.

...read moreread less

Abstract: Low-code development platforms are taking an important place in the model-driven engineering ecosystem, raising new challenges, among which transparent efficiency or scalability. Indeed, the increasing size of models leads to difficulties in interacting with them efficiently. To tackle this scalability issue, some tools are built upon specific computational strategies exploiting reactivity, or parallelism. However, their performances may vary depending on the specific nature of their usage. Choosing the most suitable computational strategy for a given usage is a difficult task which should be automated. Besides, the most efficient solutions may be obtained by the use of several strategies at the same time. This paper motivates the need for a transparent multi-strategy execution mode for model-management operations. We present an overview of the different computational strategies used in the model-driven engineering ecosystem, and use a running example to introduce the benefits of mixing strategies for performing a single computation. This example helps us present our design ideas for a multi-strategy model-management system. The code-related and DevOps challenges that emerged from this analysis are also presented.

...read moreread less

8 citations

Cites methods from "Towards Systematic Parallelization ..."

...Another possibility to use Pregel in model transformation is by using a DSL, such as [42] for graph transformation....
[...]

DOI•

Graph Data Transformations in Gradoop

[...]

Matthias Kricke, Eric Peukert, Erhard Rahm

01 Jan 2019

TL;DR: This work investigates transformation operations for property graphs managed by the distributed platform Gradoop to support ETL processes for graph data and provides initial results of a runtime evaluation of the proposed graph data transformations.

...read moreread less

Abstract: The analysis of graph data using graph database and distributed graph processing systems has gained significant interest. However, relatively little effort has been devoted to preparing the graph data for analysis, in particular to transform and integrate data from different sources. To support such ETL processes for graph data we investigate transformation operations for property graphs managed by the distributed platform Gradoop. We also provide initial results of a runtime evaluation of the proposed graph data transformations.

...read moreread less

7 citations

Patent•

Value added tax special invoice falsely making-out detecting method based on parallel loop detection

[...]

Ding Jun, Zhang Yu, Niu Zhen, Liu Zhuorui, Xie Feng, Liu Haiming, Lu Hua - Show less +3 more

04 Jul 2017

TL;DR: In this article, a value added tax special invoice falsely making-out detecting method based on parallel loop detection is presented, which is performed through a loop detection method, and furthermore loop detection was improved.

...read moreread less

Abstract: The invention provides a value added tax special invoice falsely making-out detecting method based on parallel loop detection. Detection for falsely making-out of the value added tax special invoice is performed through a loop detection method, and furthermore loop detection is improved. Through a distributed parallel calculating method, a calculating task is distributed to a plurality of computers in a distributed cluster, thereby greatly improving calculating efficiency.

...read moreread less

3 citations

References

PDF

Open Access

More filters

Journal Article•DOI•

MapReduce: simplified data processing on large clusters

[...]

Jeffrey Dean¹, Sanjay Ghemawat¹•Institutions (1)

Google¹

06 Dec 2004

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.

...read moreread less

Abstract: MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.

...read moreread less

20,309 citations

"Towards Systematic Parallelization ..." refers methods in this paper

...As a result, we design and implement a graph transformation framework on top of Pregel, inspired by high-level frameworks on top of MapReduce such as Generate-Test-Aggregate [5]....
[...]
...The pregel2 is similar to MapReduce computation....
[...]
...Distributed graph processing models: MapReduce [4] is big data processing model, hence it can be used to process graphs....
[...]

Journal Article•DOI•

MapReduce: simplified data processing on large clusters

[...]

Jeffrey Dean¹, Sanjay Ghemawat¹•Institutions (1)

Google¹

01 Jan 2008-Communications of The ACM

TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.

...read moreread less

Abstract: MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.

...read moreread less

17,663 citations

Journal Article•DOI•

A bridging model for parallel computation

[...]

Leslie G. Valiant¹•Institutions (1)

Harvard University¹

01 Aug 1990-Communications of The ACM

TL;DR: The bulk-synchronous parallel (BSP) model is introduced as a candidate for this role, and results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.

...read moreread less

Abstract: The success of the von Neumann model of sequential computation is attributable to the fact that it is an efficient bridge between software and hardware: high-level languages can be efficiently compiled on to this model; yet it can be effeciently implemented in hardware. The author argues that an analogous bridge between software and hardware in required for parallel computation if that is to become as widely used. This article introduces the bulk-synchronous parallel (BSP) model as a candidate for this role, and gives results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.

...read moreread less

3,885 citations

"Towards Systematic Parallelization ..." refers methods in this paper

...It was inspired by the Bulk Synchronous Parallel (BSP) model [17] whose computation consists of a sequence of supersteps....
[...]

Proceedings Article•DOI•

Pregel: a system for large-scale graph processing

[...]

Grzegorz Malewicz, Matthew H. Austern¹, Aart J. C. Bik¹, James C. Dehnert¹, Ilan Horn¹, Naty Leiser¹, Grzegorz Czajkowski¹ - Show less +3 more•Institutions (1)

Google¹

06 Jun 2010

TL;DR: A model for processing large graphs that has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier.

...read moreread less

Abstract: Many practical computing problems concern large graphs. Standard examples include the Web graph and various social networks. The scale of these graphs - in some cases billions of vertices, trillions of edges - poses challenges to their efficient processing. In this paper we present a computational model suitable for this task. Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges or mutate graph topology. This vertex-centric approach is flexible enough to express a broad set of algorithms. The model has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier. Distribution-related details are hidden behind an abstract API. The result is a framework for processing large graphs that is expressive and easy to program.

...read moreread less

3,840 citations

"Towards Systematic Parallelization ..." refers methods in this paper

...Many graph processing models have been proposed, and Pregel [11] has emerged as an efficient and scalable one....
[...]
...The reachability computation is a basic computation “skeleton” in the Pregel model [11], so we consider it as an efficient and scalable Pregel algorithm....
[...]

Proceedings Article•DOI•

PowerGraph: distributed graph-parallel computation on natural graphs

[...]

Joseph E. Gonzalez¹, Yucheng Low¹, Haijie Gu¹, Danny Bickson¹, Carlos Guestrin² - Show less +1 more•Institutions (2)

Carnegie Mellon University¹, University of Washington²

08 Oct 2012

TL;DR: This paper describes the challenges of computation on natural graphs in the context of existing graph-parallel abstractions and introduces the PowerGraph abstraction which exploits the internal structure of graph programs to address these challenges.

...read moreread less

Abstract: Large-scale graph-structured computation is central to tasks ranging from targeted advertising to natural language processing and has led to the development of several graph-parallel abstractions including Pregel and GraphLab. However, the natural graphs commonly found in the real-world have highly skewed power-law degree distributions, which challenge the assumptions made by these abstractions, limiting performance and scalability.In this paper, we characterize the challenges of computation on natural graphs in the context of existing graph-parallel abstractions. We then introduce the PowerGraph abstraction which exploits the internal structure of graph programs to address these challenges. Leveraging the PowerGraph abstraction we introduce a new approach to distributed graph placement and representation that exploits the structure of power-law graphs. We provide a detailed analysis and experimental evaluation comparing PowerGraph to two popular graph-parallel systems. Finally, we describe three different implementation strategies for PowerGraph and discuss their relative merits with empirical evaluations on large-scale real-world problems demonstrating order of magnitude gains.

...read moreread less

1,710 citations