scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Towards Systematic Parallelization of Graph Transformations Over Pregel

TL;DR: This paper design and implement a high-level parallel programming framework where a domain-specific language (DSL) is provided to ease the programing task and shows that the framework outperforms the original evaluation of structural recursion, and achieves good scalability and speedup for real datasets.
Abstract: Graphs can be used to model many kinds of data, from traditional datasets to social networks or semi-structured datasets. To process large graphs, many systems have been proposed. The Pregel programming model is popular, thanks to its scalability. Although Pregel is simple to understand and use, it is of low-level in programming and requires developers to write programs that are hard to maintain and need to be carefully optimized. On the other hand, structural recursion is powerful to systematically construct efficient parallel programs on lists, arrays and trees, but it has not yet been applied to graphs. In this paper, we propose an efficient method for parallel evaluation of structural recursion on graphs, which is suitable for Pregel. We design and implement a high-level parallel programming framework where a domain-specific language (DSL) is provided to ease the programing task. Specifications written in the DSL are automatically compiled into Pregel programs that are scalable for large graphs. Experimental results show that our framework outperforms the original evaluation of structural recursion, and achieves good scalability and speedup for real datasets.
Citations
More filters
Proceedings ArticleDOI
04 Sep 2016
TL;DR: A functional approach to vertex-centric graph processing in which the computation at every vertex is abstracted as a higher-order function and present Fregel, a new domain-specific language that has clear functional semantics, supports declarative description of vertex computation, and can be automatically translated into Pregel.
Abstract: The vertex-centric programming model, known as “think like a vertex”, is being used more and more to support various big graph processing methods through iterative supersteps that execute in parallel a user-defined vertex program over each vertex of a graph. However, the imperative and message-passing style of existing systems makes defining a vertex program unintuitive. In this paper, we show that one can benefit more from “Thinking like a vertex” by “Behaving like a function” rather than “Acting like a procedure” with full use of side effects and explicit control of message passing, state, and termination. We propose a functional approach to vertex-centric graph processing in which the computation at every vertex is abstracted as a higher-order function and present Fregel, a new domain-specific language. Fregel has clear functional semantics, supports declarative description of vertex computation, and can be automatically translated into Pregel, an emerging imperative-style distributed graph processing framework, and thereby achieve promising performance. Experimental results for several typical examples show the promise of this functional approach.

13 citations

Journal ArticleDOI
TL;DR: This paper exploits the high level of abstraction of an existing relational MT language, ATL, and the semantics of a distributed programming model, MapReduce, to build an ATL engine with implicitly distributed execution.

11 citations

Proceedings ArticleDOI
16 Oct 2020
TL;DR: This paper motivates the need for a transparent multi-strategy execution mode for model-management operations, presents an overview of the different computational strategies used in the model-driven engineering ecosystem, and uses a running example to introduce the benefits of mixing strategies for performing a single computation.
Abstract: Low-code development platforms are taking an important place in the model-driven engineering ecosystem, raising new challenges, among which transparent efficiency or scalability. Indeed, the increasing size of models leads to difficulties in interacting with them efficiently. To tackle this scalability issue, some tools are built upon specific computational strategies exploiting reactivity, or parallelism. However, their performances may vary depending on the specific nature of their usage. Choosing the most suitable computational strategy for a given usage is a difficult task which should be automated. Besides, the most efficient solutions may be obtained by the use of several strategies at the same time. This paper motivates the need for a transparent multi-strategy execution mode for model-management operations. We present an overview of the different computational strategies used in the model-driven engineering ecosystem, and use a running example to introduce the benefits of mixing strategies for performing a single computation. This example helps us present our design ideas for a multi-strategy model-management system. The code-related and DevOps challenges that emerged from this analysis are also presented.

8 citations


Cites methods from "Towards Systematic Parallelization ..."

  • ...Another possibility to use Pregel in model transformation is by using a DSL, such as [42] for graph transformation....

    [...]

DOI
01 Jan 2019
TL;DR: This work investigates transformation operations for property graphs managed by the distributed platform Gradoop to support ETL processes for graph data and provides initial results of a runtime evaluation of the proposed graph data transformations.
Abstract: The analysis of graph data using graph database and distributed graph processing systems has gained significant interest. However, relatively little effort has been devoted to preparing the graph data for analysis, in particular to transform and integrate data from different sources. To support such ETL processes for graph data we investigate transformation operations for property graphs managed by the distributed platform Gradoop. We also provide initial results of a runtime evaluation of the proposed graph data transformations.

7 citations

Patent
04 Jul 2017
TL;DR: In this article, a value added tax special invoice falsely making-out detecting method based on parallel loop detection is presented, which is performed through a loop detection method, and furthermore loop detection was improved.
Abstract: The invention provides a value added tax special invoice falsely making-out detecting method based on parallel loop detection. Detection for falsely making-out of the value added tax special invoice is performed through a loop detection method, and furthermore loop detection is improved. Through a distributed parallel calculating method, a calculating task is distributed to a plurality of computers in a distributed cluster, thereby greatly improving calculating efficiency.

3 citations

References
More filters
Journal ArticleDOI
Jeffrey Dean1, Sanjay Ghemawat1
06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Abstract: MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.

20,309 citations


"Towards Systematic Parallelization ..." refers methods in this paper

  • ...As a result, we design and implement a graph transformation framework on top of Pregel, inspired by high-level frameworks on top of MapReduce such as Generate-Test-Aggregate [5]....

    [...]

  • ...The pregel2 is similar to MapReduce computation....

    [...]

  • ...Distributed graph processing models: MapReduce [4] is big data processing model, hence it can be used to process graphs....

    [...]

Journal ArticleDOI
Jeffrey Dean1, Sanjay Ghemawat1
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Abstract: MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.

17,663 citations

Journal ArticleDOI
TL;DR: The bulk-synchronous parallel (BSP) model is introduced as a candidate for this role, and results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.
Abstract: The success of the von Neumann model of sequential computation is attributable to the fact that it is an efficient bridge between software and hardware: high-level languages can be efficiently compiled on to this model; yet it can be effeciently implemented in hardware. The author argues that an analogous bridge between software and hardware in required for parallel computation if that is to become as widely used. This article introduces the bulk-synchronous parallel (BSP) model as a candidate for this role, and gives results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.

3,885 citations


"Towards Systematic Parallelization ..." refers methods in this paper

  • ...It was inspired by the Bulk Synchronous Parallel (BSP) model [17] whose computation consists of a sequence of supersteps....

    [...]

Proceedings ArticleDOI
06 Jun 2010
TL;DR: A model for processing large graphs that has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier.
Abstract: Many practical computing problems concern large graphs. Standard examples include the Web graph and various social networks. The scale of these graphs - in some cases billions of vertices, trillions of edges - poses challenges to their efficient processing. In this paper we present a computational model suitable for this task. Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges or mutate graph topology. This vertex-centric approach is flexible enough to express a broad set of algorithms. The model has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier. Distribution-related details are hidden behind an abstract API. The result is a framework for processing large graphs that is expressive and easy to program.

3,840 citations


"Towards Systematic Parallelization ..." refers methods in this paper

  • ...Many graph processing models have been proposed, and Pregel [11] has emerged as an efficient and scalable one....

    [...]

  • ...The reachability computation is a basic computation “skeleton” in the Pregel model [11], so we consider it as an efficient and scalable Pregel algorithm....

    [...]

Proceedings ArticleDOI
08 Oct 2012
TL;DR: This paper describes the challenges of computation on natural graphs in the context of existing graph-parallel abstractions and introduces the PowerGraph abstraction which exploits the internal structure of graph programs to address these challenges.
Abstract: Large-scale graph-structured computation is central to tasks ranging from targeted advertising to natural language processing and has led to the development of several graph-parallel abstractions including Pregel and GraphLab. However, the natural graphs commonly found in the real-world have highly skewed power-law degree distributions, which challenge the assumptions made by these abstractions, limiting performance and scalability.In this paper, we characterize the challenges of computation on natural graphs in the context of existing graph-parallel abstractions. We then introduce the PowerGraph abstraction which exploits the internal structure of graph programs to address these challenges. Leveraging the PowerGraph abstraction we introduce a new approach to distributed graph placement and representation that exploits the structure of power-law graphs. We provide a detailed analysis and experimental evaluation comparing PowerGraph to two popular graph-parallel systems. Finally, we describe three different implementation strategies for PowerGraph and discuss their relative merits with empirical evaluations on large-scale real-world problems demonstrating order of magnitude gains.

1,710 citations


"Towards Systematic Parallelization ..." refers methods in this paper

  • ...[6] propose PowerGraph that uses the gather-apply-scatter model to exploit the parallelism in natural graphs....

    [...]