scispace - formally typeset
Search or ask a question

Showing papers on "Synchronous Data Flow published in 2016"


Proceedings ArticleDOI
19 Oct 2016
TL;DR: A response time analysis technique for Synchronous Data Flow programs mapped to multiple parallel dependent tasks running on a compute cluster of the Kalray MPPA-256 many-core processor is introduced and leads to response times that are a factor of 4.15 smaller for this application.
Abstract: In this paper we introduce a response time analysis technique for Synchronous Data Flow programs mapped to multiple parallel dependent tasks running on a compute cluster of the Kalray MPPA-256 many-core processor. The analysis we derive computes a set of response times and release dates that respect the constraints in the task dependency graph. We extend the Multicore Response Time Analysis (MRTA) framework by deriving a mathematical model of the multi-level bus arbitration policy used by the MPPA. Further, we refine the analysis to account for the release dates and response times of co-runners, and the use of memory banks. Further improvements to the precision of the analysis were achieved by splitting each task into two sequential phases, with the majority of the memory accesses in the first phase, and a small number of writes in the second phase. Our experimental evaluation focused on an avionics case study. Using measurements from the Kalray MPPA-256 as a basis, we show that the new analysis leads to response times that are a factor of 4.15 smaller for this application, than the default approach of assuming worst-case interference on each memory access.

54 citations


Proceedings ArticleDOI
01 Oct 2016
TL;DR: Experiments demonstrate that the proposed polynomial-time solution approach to exploiting parallelism in a hard real-time streaming application modeled as a Synchronous Data Flow graph reduces energy consumption by 66% on average while meeting the same throughput requirement when compared to related energy minimization approaches.
Abstract: In this paper, we study the problem of exploiting parallelism in a hard real-time streaming application modeled as a Synchronous Data Flow (SDF) graph and scheduled on a cluster heterogeneous Multi-Processor System-on-Chip (MPSoC) platform such that energy consumption is minimized and a throughput requirement is satisfied. We propose a polynomial-time solution approach which: 1) determines a processor type for each task in an SDF graph such that the throughput constraint is met and energy consumption is minimized; 2) determines a replication factor for each task in an SDF graph such that the distribution of the workload on the same type of processors is balanced, which enables processors to run at a lower frequency, hence reducing the energy consumption. Experiments on a set of real-life streaming applications demonstrate that our approach reduces energy consumption by 66% on average while meeting the same throughput requirement when compared to related energy minimization approaches.

19 citations


Proceedings ArticleDOI
31 Aug 2016
TL;DR: The key result is that the size of the memory required in order to guarantee the liveness or a given throughput of an application may be evaluated in polynomial time.
Abstract: The search of a mapping of a Synchronous Data Flow Graph (SDFG) on a distributed architecture that achieves a given throughput while satisfying memory constraints is a difficult challenge Solving this problem calls for evaluating throughput and buffer capacities associated to a mapping Since the available mapping evaluation methods are not polynomial with respect to the SDFG description, mapping techniques using them are not scalable This paper develops a polynomial method for the evaluation of any given SDFG mapping on a distributed architecture The method is based on a simple transformation of the SDFG to model communications through a Network on Chip The key result is that the size of the memory required in order to guarantee the liveness or a given throughput of an application may be evaluated in polynomial time Experimentally, computing the memory size guaranteeing liveness of a mapping of a 670-node H264 graph on a 4-cluster architecture takes 70 ms on an Intel Core i5-660 processor and grows linearly with graph size

12 citations


Proceedings ArticleDOI
23 May 2016
TL;DR: This paper presents a transformation process that first produces a formal model of a Mixed-Criticality System, and from this formal model, it generates a PRISM automaton in order to compute availability.
Abstract: The safety-critical industry is compelled to continually increase the number of functionalities in embedded systems. These platforms tend to integrate software with various non-functional requirements, in particular different levels of criticality. As a consequence, Mixed-Criticality Systems emerged in order to assure robustness, safety and predictability for these embedded platforms. Although Mixed-Critcality Systems show promising results, formal methods to quantify availability are still missing for this type of systems and will most likely be required for deployment. This paper presents a transformation process that first produces a formal model of a Mixed-Criticality System. From this formal model, it generates a PRISM automaton in order to compute availability.

4 citations


Proceedings ArticleDOI
20 Apr 2016
TL;DR: This work shows how to use tools and techniques developed by the formal methods community to minimize the energy consumption of Finite Impulse Response (FIR) filters which are extensively used in SDR front-ends and finds that idle power becomes an important parameter when a high number of functional units are allocated.
Abstract: Software Defined Radio (SDR) devices are becoming increasingly popular due to their support for mode-, standard- and application-flexibility. At the same time however, the energy consumption of such devices typically suffers from the use of reconfigurable real-time platforms which are known to be severely power hungry. In this work we therefore show how to use tools and techniques developed by the formal methods community to minimize the energy consumption of Finite Impulse Response (FIR) filters which are extensively used in SDR front-ends. We conduct experiments with four different FIR filter structures where we initially derive data flow graphs and precedence graphs using the Synchronous Data Flow (SDF) notation. Based on actual measurements on the Altera Cyclone IV FPGA, we derive power and timing estimates for addition and multiplication, including idling power consumption. We next model the FIR structures in UPPAAL CORA and employ model checking to find energy-optimal solutions in linearly priced timed automata. In conclusion we state that there are significant energy-versus-time differences between the four structures when we experiment with varying numbers of adders and multipliers. Similarly, we find that idle power becomes an important parameter when a high number of functional units are allocated.

4 citations


Proceedings ArticleDOI
23 May 2016
TL;DR: The CSDFa model enables optimizing the balance between processing units and memory, resulting in a significant reduction of silicon area and it is shown that reducing the maximum allowed latency increases the minimum required amount of data parallelism by up to a factor of 16.
Abstract: Real-time stream processing applications, such as Software Defined Radio applications, are often executed concurrently on multiprocessor systems. A unified data flow model and analysis method have been proposed that can be used to simultaneously determine the amount of pipeline and coarse-grained data parallelism required to meet the temporal constraints of such applications. However, this unified model is only defined for Synchronous Data Flow (SDF) graphs. Defining a unified model for a more expressive model such as Cyclo-Static Data Flow (CSDF) is not possible, because auto-concurrency can cause a time-dependent order of tokens and dependencies. This paper introduces the Cyclo-Static Data Flow with Auto-concurrency (CSDFa) model. In CSDFa, tokens have indices and the consumption order of tokens is static and time-independent. This allows expressing and trading off pipeline and coarse-grained data parallelism in a single, unified model. Furthermore, we introduce a new type of circular buffer that implements the same static order as is used by the CSDFa model. The overhead of operations on this buffer is independent of the amount of auto-concurrency, which corresponds to the constant firing durations in the CSDFa model. Exploiting the trade-off between data and pipeline parallelism with the CSDFa model is demonstrated with a part of a FMCW radar processing pipeline. We show that the CSDFa model enables optimizing the balance between processing units and memory, resulting in a significant reduction of silicon area. Additionally, it is shown that reducing the maximum allowed latency increases the minimum required amount of data parallelism by up to a factor of 16.

4 citations


Patent
16 Nov 2016
TL;DR: In this paper, a real-time detection method of equipment exceptions on the basis of synchronous data flow compression is presented, where the characteristics of each piece of equipment are collected and are grouped, a group dataset which represents the normal operation state of the group of equipment and an own dataset which represent the normal operating states of the equipment are constructed, so that the records of the two datasets are compared to comprehensively obtain an exception detection result.
Abstract: The invention discloses a real-time detection method of equipment exceptions on the basis of synchronous data flow compression. The characteristics of each piece of equipment are collected and are grouped, a group dataset which represents the normal operation state of the group of equipment and an own dataset which represents the normal operation state of the equipment are constructed, so that the records of the two datasets are compared to comprehensively obtain an exception detection result, and detection accuracy is improved. Meanwhile, since the operation states of the equipment are different under different environments, a concept drifting detection method based on principal component analysis is adopted to detect operation state data, whether the operation state data is evolved or not is judged, the two datasets are initialized again if the operation state data is evolved, and therefore, detection accuracy is further improved. In addition, the synchronous data flow compression is adopted to reduce a calculated amount of a comparison process so as to realize the real-time detection of the equipment exceptions.

4 citations


Proceedings ArticleDOI
01 Nov 2016
TL;DR: A paralleled Pareto optimal scheduling method (PPOS) for SDFGs on heterogeneous multiprocessors that deals with both time arrangement and processor allocation of computations.
Abstract: Streaming applications usually run on heterogeneous multiprocessor platforms and are required to have a high throughput, which in turn may increase the energy consumption. A trade-off between these two criteria is important for a system. Synchronous data flow graphs (SDFGs) are widely used to model streaming applications. In this paper, we propose a paralleled Pareto optimal scheduling method (PPOS) for SDFGs on heterogeneous multiprocessors. It deals with both time arrangement and processor allocation of computations. PPOS is an exact method to chart the Pareto space of energy consumption and throughput, and to find all Pareto optimal schedules of a system model. An approximation technique is presented to further increase the scalability of our methods. Our experiments are carried out on a practical multimedia application with different configurations and hundreds of synthesis graphs. The results show that the proposed methods are capable of dealing with large-scale models.

2 citations


Dissertation
01 Jan 2016
TL;DR: A state-based real-time analysis methodology for Synchronous Data Flow (SDF) oriented applications running on MPSoCs is proposed, which utilizes Timed Automata (TA) as a common semantic model to represent execution time boundaries of SDF actors and communication FIFOs and their mapping as well as their utilization of MPSoC resources.
Abstract: The timing analysis of hard real-time applications running on Multi-Processor System-on-Chip (MPSoC) platforms is much more challenging compared to traditional single processor. This comes from the large number of shared processing, communication and memory resources available in today’s MPSoCs. Yet, this is an indispensable challenge for enabling their usage with hard-real time systems in safety critical application domains (e.g. avionics, automotive). In this thesis, a state-based real-time analysis methodology for Synchronous Data Flow (SDF) oriented applications running on MPSoCs is proposed. This approach utilizes Timed Automata (TA) as a common semantic model to represent execution time boundaries (best-case and worst-case execution times) of SDF actors and communication FIFOs and their mapping as well as their utilization of MPSoC resources, including the scheduling of SDFGs and shared communication resource access protocols for interconnects, local and shared memories. The resulting network of TA is analyzed using the UPPAAL model-checker for obtaining safe timing bounds of the chosen implementation.

1 citations


Journal ArticleDOI
TL;DR: In this article, a multi-mode dataflow model with task migration is proposed to minimize the resource requirement of multi-modal data-flow models, and a genetic algorithm is used to schedule all SDF graphs in all modes simultaneously.
Abstract: Synchronous Data Flow (SDF) model is widely used for specifying signal processing or streaming applications. Since modern embedded applications become more complex with dynamic behavior changes at run-time, several extensions of the SDF model have been proposed to specify the dynamic behavior changes while preserving static analyzability of the SDF model. They assume that an application has a finite number of behaviors (or modes) and each behavior (mode) is represented by an SDF graph. They are classified as multi-mode dataflow models in this paper. While there exist several scheduling techniques for multi-mode dataflow models, no one allows task migration between modes. By observing that the resource requirement can be additionally reduced if task migration is allowed, we propose a multiprocessor scheduling technique of a multi-mode dataflow graph considering task migration between modes. Based on a genetic algorithm, the proposed technique schedules all SDF graphs in all modes simultaneously to minimize the resource requirement. To satisfy the throughput constraint, the proposed technique calculates the actual throughput requirement of each mode and the output buffer size for tolerating throughput jitter. We compare the proposed technique with a method which analyzes SDF graphs in each execution mode separately and a method that does not allow task migration for synthetic examples and three real applications: H.264 decoder, vocoder, and LTE receiver algorithms.

1 citations


07 Jul 2016
TL;DR: This work identifies data channels as the dominating roadblock for achieving high performance of synchronous data-flow programs and identifies compiler optimizations and run-time techniques that aim at performance improvements of stream programs on multicore architectures.
Abstract: Because the demand for high performance with big data processing and distributed computing is increasing, the stream programming paradigm has been revisited for its abundance of parallelism in virtue of independent actors that communicate via data channels. The synchronous data-flow (SDF) programming model is frequently adopted with stream programming languages for its convenience to express stream programs as a set of nodes connected by data channels. Unlike general data-flow graphs, SDF requires the specification of the number of data items produced and consumed by a node already at compile-time. Static data-rates enable program transformations that greatly improve the performance of SDF programs on multicore architectures. The major application domain is for SDF programs are digital signal processing, audio, video, graphics kernels, networking, and security. The major optimization objective with stream programs is data throughput. Stream program orchestration is a term that denotes compiler optimizations and run-time techniques that aim at performance improvements of stream programs on multicore architectures. A large body of research has already been devoted to stream program orchestration. Nevertheless, current compilers and run-time systems for stream programming languages are not able yet to harvest the raw computing power of contemporary parallel architectures. We identify data channels as the dominating roadblock for achieving high performance of SDF programs. Data channels between communicating nodes, i.e., between a producer and a consumer, employ FIFO-queue semantics. Funneling a data item (token) from a producer to a consumer through a FIFO queue incurs non-negligible overhead. The producer is required to perform an enqueue-operation, followed by a dequeue operation in the consumer. The enqueue and dequeue operations induce the

Proceedings ArticleDOI
TL;DR: This work focuses on the modeling of other implementation concerns such as functional pipeline, timed cyclostatic behaviors, and inter-task communication schemes to get a better estimation of applications' throughput.
Abstract: One of the major challenges facing the design of embedded systems is to estimate their performance before their implementation. This operation is part of the design space exploration, where different implementation choices are investigated. Synchronous data flow graphs (SDF) are powerful analyzable computation models for regular multi-task streaming applications. SDFs have the capability to model and to estimate analytically, at an early design step, the maximal achievable throughput of a streaming application mapped to an implementation. The estimation accuracy depends on the implementation details integrated in the final SDF model. While previous works have focused on the SDF modeling of resource sharing between the application tasks, this work focuses on the modeling of other implementation concerns such as functional pipeline, timed cyclostatic behaviors, and inter-task communication schemes. The aim of the proposed patterns is to get a better estimation of applications' throughput. A case study is used to demonstrate how these patterns are used to estimate the throughput of various co-design alternatives of an MJPEG decoder.