scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Dataflow Programming Language and its Compiler for Streaming Systems

01 Jan 2014-Vol. 29, pp 1289-1298
TL;DR: COStream, a programming language based on synchronous data flow execution model for data-driven application, is proposed and a compiler framework for COStream is proposed on general-purpose multi-core architectures.
Abstract: The dataflow programming paradigm shows an important way to improve programming productivity for streaming systems. In this paper we propose COStream, a programming language based on synchronous data flow execution model for data-driven application. We also propose a compiler framework for COStream on general-purpose multi-core architectures. It features an inter-thread software pipelining scheduler to exploit the parallelism among the cores. We implemented the COStream compiler framework on x86 multi-core architecture and performed experiments to evaluate the system.
Citations
More filters
Proceedings ArticleDOI
05 Jun 2017
TL;DR: This paper makes the preliminary attempt to develop the dataflow insight into a specialized graph accelerator and believes that this work would open a wide range of opportunities to improve the performance of computation and memory access for large-scale graph processing.
Abstract: Existing graph processing frameworks greatly improve the performance of memory subsystem, but they are still subject to the underlying modern processor, resulting in the potential inefficiencies for graph processing in the sense of low instruction level parallelism and high branch misprediction. These inefficiencies, in accordance with our comprehensive micro-architectural study, mainly arise out of a wealth of dependencies, serial semantic of instruction streams, and complex conditional instructions in graph processing. In this paper, we propose that a fundamental shift of approach is necessary to break through the inefficiencies of the underlying processor via the dataflow paradigm. It is verified that the idea of applying dataflow approach into graph processing is extremely appealing for the following two reasons. First, as the execution and retirement of instructions only depend on the availability of input data in dataflow model, a high degree of parallelism can be therefore provided to relax the heavy dependency and serial semantic. Second, dataflow is guaranteed to make it possible to reduce the costs of branch misprediction by simultaneously executing all branches of a conditional instruction. Consequently, we make the preliminary attempt to develop the dataflow insight into a specialized graph accelerator. We believe that our work would open a wide range of opportunities to improve the performance of computation and memory access for large-scale graph processing.

8 citations

Journal ArticleDOI
01 Jan 2015
TL;DR: This paper presents Fresh Breeze, a dataflow-based execution and programming model and computer architecture and how it provides a sound basis to develop future computing systems that match the DDDAS challenges.
Abstract: The DDDAS paradigm, unifying applications, mathematical modeling, and sensors, is now more relevant than ever with the advent of Large-Scale/Big-Data and Big-Computing. Large-Scale-Dynamic-Data (advertised as the next wave of Big Data) includes the integrated range of data from high-end systems and instruments together with the dynamic data arising from ubiquitous sensing and control in engineered, natural, and societal systems. In this paper we present Fresh Breeze, a dataflow-based execution and programming model and computer architecture and how it provides a sound basis to develop future computing systems that match the DDDAS challenges. Development of simulation models and a compiler for Fresh Breeze computer systems is discussed and initial results are reported.

8 citations


Cites background from "A Dataflow Programming Language and..."

  • ...Our previous works on stream are mainly focus on software support [16, 13, 10, 14, 12]....

    [...]

Dissertation
01 Jan 2015
TL;DR: This thesis presents an end-to-end framework that takes advantage of information gained by locality aware multithreading called “Group Locality” to seek a common ground where both processing and memory resources can collaborate to yield a better overall utilization.
Abstract: When powerful computational cores are matched against limited resources such as bandwidth and memory, competition across computational units can result in serious performance degradation due to contention at different levels of the memory hierarchy. In such a case, it becomes very important to maximize reuse at low latency memory space. The high performance community has been actively working on finding techniques that can improve data locality and reuse, as such endeavors have a direct and positive impact on both performance and power. In order to better utilize low latency memory such as caches and scratch-pad SRAMs, software techniques such as hierarchical tiling have proven very effective. However these techniques still operate under the paradigm that memory agnostic coarse grain parallelism is the norm. Therefore, they function in a coarse-grain fashion where inner tiles run serially. Such behavior can result in an under-utilization of processing and network resources. Even when the inner tiles are assigned to fine grain tasks, their memory agnostic placement will still incur heavy penalties when resources are contended. Finding a balance between parallelism and locality in such cases is an arduous task and it is essential to seek a common ground where both processing and memory resources can collaborate to yield a better overall utilization. This thesis explores the concept of locality aware multithreading called “Group Locality”. Multiple group of threads work together in close proximity in time and space as a unit, taking advantage of each other’s data movement and collaborating at a very fine-grain level. When an execution pattern is known a priori through static analysis and data is shared among different thread groups, the concept of group locality extends further to rearrange data to provide better access pattern for other thread groups. This thesis presents an end-to-end framework that takes advantage of information gained by

4 citations


Cites background from "A Dataflow Programming Language and..."

  • ...The dire need for fine grain execution technique to reach the next milestone in performance has also led to a dataflow language and compiler for multicore architecture[112]....

    [...]

Journal ArticleDOI
20 Jun 2016
TL;DR: In this article, a Domain Specific Language (DSL) is proposed to specify the elements related to the design phase of an adaptive software, and a functional prototype based on the Sirius plugin for Eclipse is presented.
Abstract: An adaptive software has the ability to modify its own behavior at runtime due to changes in the users and their context in the system, requirements, or environment in which the system is deployed, and thus, give the users a better experience. However, the development of this kind of systems is not a simple task. There are two main issues: (1) there is a lack of languages to specify, unambiguously, the elements related to the design phase. As a consequence, these systems are often developed in an ad-hoc manner, without the required formalism, augmenting the complexity in the process of derivation of design models to the next phases in the development cycle. (2) Design decisions and the adaptation model tend to be directly implemented into the source code and not thoroughly specified at the design level. Since the adaptation models become tangled with the code, system evolution becomes more difficult. To address the above issues, this paper proposes DMLAS, a Domain-Specific Language (DSL) to design adaptive systems. As proof of concept, this paper also provides a functional prototype based on the Sirius plugin for Eclipse. This prototype is a tool to model, in several layers of abstraction, the main components of an adaptive system. The notation used both in the models and the tool was validated against the nine principles for designing cognitively effective visual notations presented by Moody.

3 citations

References
More filters
Proceedings ArticleDOI
07 Nov 1998
TL;DR: This work focuses on developing new types of heuristics for coarsening, initial partitioning, and refinement that are capable of successfully handling multiple constraints that underlay many existing and emerging large-scale scientific simulations.
Abstract: Traditional graph partitioning algorithms compute a k-way partitioning of a graph such that the number of edges that are cut by the partitioning is minimized and each partition has an equal number of vertices. The task of minimizing the edge-cut can be considered as the objective and the requirement that the partitions will be of the same size can be considered as the constraint. In this paper we extend the partitioning problem by incorporating an arbitrary number of balancing constraints. In our formulation, a vector of weights is assigned to each vertex, and the goal is to produce a k-way partitioning such that the partitioning satisfies a balancing constraint associated with each weight, while attempting to minimize the edge-cut. Applications of this multi-constraint graph partitioning problem include parallel solution of multi-physics and multi-phase computations, that underlay many existing and emerging large-scale scientific simulations. We present new multi-constraint graph partitioning algorithms that are based on the multilevel graph partitioning paradigm. Our work focuses on developing new types of heuristics for coarsening, initial partitioning, and refinement that are capable of successfully handling multiple constraints. We experimentally evaluate the effectiveness of our multi-constraint partitioners on a variety of synthetically generated problems.

484 citations


"A Dataflow Programming Language and..." refers methods in this paper

  • ...The benchmarks tested in the experiments are from the StreamIt benchmarks [10]....

    [...]

  • ...We used the multilevel k-way partitioning [10] algorithm to perform load balancing while keeping connectivity....

    [...]

  • ...First, we use a multilevel k-way graph partitioning algorithm [10] to assign the operator to the core, and then we use a stage assignment algorithm to assign the stage to each operator....

    [...]

Proceedings ArticleDOI
01 Oct 2002
TL;DR: This paper describes a fully functional compiler that parallelizes StreamIt applications for Raw, including several load-balancing transformations, and demonstrates that the StreamIt compiler can automatically map a high-level stream abstraction to Raw without losing performance.
Abstract: With the increasing miniaturization of transistors, wire delays are becoming a dominant factor in microprocessor performance. To address this issue, a number of emerging architectures contain replicated processing units with software-exposed communication between one unit and another (e.g., Raw, SmartMemories, TRIPS). However, for their use to be widespread, it will be necessary to develop compiler technology that enables a portable, high-level language to execute efficiently across a range of wire-exposed architectures.In this paper, we describe our compiler for StreamIt: a high-level, architecture-independent language for streaming applications. We focus on our backend for the Raw processor. Though StreamIt exposes the parallelism and communication patterns of stream programs, some analysis is needed to adapt a stream program to a software-exposed processor. We describe a partitioning algorithm that employs fission and fusion transformations to adjust the granularity of a stream graph, a layout algorithm that maps a stream graph to a given network topology, and a scheduling strategy that generates a fine-grained static communication pattern for each computational element.We have implemented a fully functional compiler that parallelizes StreamIt applications for Raw, including several load-balancing transformations. Using the cycle-accurate Raw simulator, we demonstrate that the StreamIt compiler can automatically map a high-level stream abstraction to Raw without losing performance. We consider this work to be a first step towards a portable programming model for communication-exposed architectures.

351 citations


"A Dataflow Programming Language and..." refers methods in this paper

  • ...Compared with previous dataflow language like StreamIt [6], COStream adopts some grammar structure from IBM SPL [7] to improve the programbility and code reuse....

    [...]

  • ...The benchmarks tested in the experiments are from the StreamIt benchmarks [10]....

    [...]

Journal ArticleDOI
Davis1, Keller1
TL;DR: Data flow languages form a subclass of the languages which are based primarily upon function application and graphical representations and their applications are the subject of this article.
Abstract: The advantages of each are discussed here. fi Data Flow Program Graphs Data flow languages form a subclass of the languages which are based primarily upon function application (i.e., applicative languages). By data flow language we mean any applicative language based entirely upon the notion of data flowing from one function entity to another or any language that directly supports such flowing. This flow concept gives data flow languages the advantage of allowing program definitions to be represented exclusively by graphs. Graphical representations and their applications are the subject of this article. Applicative languages provide the benefits of extreme modularity, in that the function of each of several sub-programs that execute concurrently can be understood in vacuo. Therefore, the programmer need not assimilate a great deal of information about the environment of the subprogram in order to understand it. In these languages, there is no way to express constructs that produce global side-effects. This decoupling of the meaning of individual subprograms also makes possible a similar decoupling of their execution. Thus, when represented graphically, sub-programs that look independent can be executed independently and, therefore, concurrently. By contrast, concurrent programs written in more conventional assignment-based languages cannot always be understood in vacuo, since it is often necessary to understand complex sequences of interactions between a sub-program and its environment in order to understand the meaning of the subprogram itself. This is not to say that data flow subprograms cannot interact with their environments in specialized ways, but that it is possible to define a subprogram's meaning without appealing to those interactions. There are many reasons for describing data flow languages in graphical representations, including the following: (1) Data flow languages sequence program actions by a simple data availability firing rule: When a node's arguments are available, it is said to be firable. The function associated with a firable node can be fired, i.e., applied to is arguments, which are thereby absorbed. After firing, the node's results are sent to other functions, which need these results as their arguments. A mental image of this behavior is suggested by representing the program as a directed graph in which each node represents a function and each (directed) arc a conceptual medium over which data items flow. Phantom nodes, drawn with dashed lines, indicate points at which the program communicates with its environment by either receiving data from it or sending data to it. (2) …

317 citations


"A Dataflow Programming Language and..." refers background in this paper

  • ...Codelets are tagged with resource requirements and linked together by data dependencies to form a graph (analogous to a dataflow graph [2])....

    [...]

Journal ArticleDOI
01 Jun 1999
TL;DR: This paper reviews a set of algorithms for compiling dataflow programs for embedded DSP applications into efficient implementations on programmable digital signal processors that focus primarily on the minimization of code size, and the minimizing of the memory required for the buffers that implement the communication channels in the input dataflow graph.
Abstract: The implementation of software for embedded digital signal processing (DSP) applications is an extremely complex process. The complexity arises from escalating functionality in the applicationss intense time-to-market pressuress and stringent cost, power and speed constraints. To help cope with such complexity, DSP system designers have increasingly been employing high-level, graphical design environments in which system specification is based on hierarchical dataflow graphs. Consequently, a significant industry has emerged for the development of data-flow-based DSP design environments. Leading products in this industry include SPW from Cadence, COSSAP from Synopsys, ADS from Hewlett Packard, and DSP Station from Mentor Graphics. This paper reviews a set of algorithms for compiling dataflow programs for embedded DSP applications into efficient implementations on programmable digital signal processors. The algorithms focus primarily on the minimization of code size, and the minimization of the memory required for the buffers that implement the communication channels in the input dataflow graph. These are critical problems because programmable digital signal processors have very limited amounts of on-chip memory, and the speed, power, and cost penalties for using off-chip memory are often prohibitively high for embedded applications. Furthermore, memory demands of applications are increasing at a significantly higher rate than the rate of increase in on-chip memory capacity offered by improved integrated circuit technology.

234 citations


"A Dataflow Programming Language and..." refers methods in this paper

  • ...In the SAS schedule, each node appears exactly once and is surrounded by a loop denoting the number of times that it executes in one steady state execution of the stream graph....

    [...]

  • ...For the steady-state schedule, we use a Single Appearance Schedule (SAS) [1] for each operator....

    [...]

Journal ArticleDOI
17 May 1988
TL;DR: This paper examines the spectrum by proposing a new architecture which is a hybrid of dataflow and von Neumann organizations, and attempts to discover those features of the dataflow architecture, lacking in a vonNeumann machine, which are essential for tolerating latency and synchronization costs.
Abstract: Dataflow architectures offer the ability to trade program level parallelism in order to overcome machine level latency. Dataflow further offers a uniform synchronization paradigm, representing one end of a spectrum wherein the unit of scheduling is a single instruction. At the opposite extreme are the von Neumann architectures which schedule on a task, or process, basis.This paper examines the spectrum by proposing a new architecture which is a hybrid of dataflow and von Neumann organizations. The analysis attempts to discover those features of the dataflow architecture, lacking in a von Neumann machine, which are essential for tolerating latency and synchronization costs. These features are captured in the concept of a parallel machine language which can be grafted on top of an otherwise traditional von Neumann base. In such an architecture, the units of scheduling, called scheduling quanta, are bound at compile time rather than at instruction set design time. The parallel machine language supports this notion via a large synchronization name space.A prototypical architecture is described, and results of simulation studies are presented. A comparison is made between the MIT Tagged-Token Dataflow machine and the subject machine which presents a model for understanding the cost of synchronization in a parallel environment.

198 citations