scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Efficient communication support in predictable heterogeneous MPSoC designs for streaming applications

01 Nov 2013-Vol. 59, Iss: 10, pp 878-888
TL;DR: A predictable high-performance communication assist (CA) that helps to tackle design challenges in integrating IP cores into heterogeneous Multi-Processor System-on-Chips (MPSoCs), and a predictable heterogeneous multi-processor platform template for streaming applications is presented.
Abstract: Streaming applications are an important class of applications in emerging embedded systems such as smart camera network, unmanned vehicles, and industrial printing. These applications are usually very computationally intensive and have real-time constraints. To meet the increasing demand for performance and efficiency in these applications, the use of application specific IP cores in heterogeneous Multi-Processor System-on-Chips (MPSoCs) becomes inevitable. However, two of the key challenges in integrating these IP cores into MPSoCs are (i) how to properly handle inter-core communication; (ii) how to map streaming applications in an efficient and predictable way. In this paper, we first present a predictable high-performance communication assist (CA) that helps to tackle these design challenges. The proposed CA has zero throughput overhead, negligible latency overhead, and significantly less resource usage compared to existing CA designs. The proposed CA also provides a unified abstract interface for both processors and accelerator IP cores with flexible data access support. Based on the proposed CA design, we present a predictable heterogeneous multi-processor platform template for streaming applications. The template is used in a predictable design flow that uses Synchronous Data Flow (SDF) graphs for design time analysis. An accurate SDF model of our CA is introduced, enabling the mapping of applications onto heterogeneous MPSoCs in an efficient and predictable way. As a case study, we map the complete high-speed vision processing pipeline of an industrial application, Organic Light Emitting Diode (OLED) screen printing, onto one instance of the proposed platform. The result demonstrates that system design and analysis effort is greatly reduced with the proposed CA-based design flow.
Citations
More filters
Proceedings ArticleDOI
16 Dec 2013
TL;DR: A design framework to generate and program HMPSoC designs in a rapid and predictable manner that meets the throughput constraints and can provide a conservative bound on the worst-case throughput of the FPGA implementation is proposed.
Abstract: Heterogeneous Multiprocessor System-on-Chips (HMPSoC) are becoming popular as a means of meeting energy efficiency requirements of modern embedded systems. However, as these HMPSoCs run multimedia applications as well, they also need to meet real-time requirements. Designing these predictable HMPSoCs is a key challenge, as the current design methods for these platforms are either semi-automated, non-predictable, or have limited heterogeneity. In this paper, we propose a design framework to generate and program HMPSoC designs in a rapid and predictable manner. It takes the application specifications and the architecture model as input and generates the entire HMPSoC, for FPGA prototyping, that meets the throughput constraints. The experimental results show that our framework can provide a conservative bound on the worst-case throughput of the FPGA implementation. We also present results of a case study that computes the area-power trade-offs of an industrial vision application. The entire design space exploration of all configurations was completed in 8 hours. A tool-chain targeting the Xilinx Zynq FPGA is also presented.

12 citations


Cites background or methods from "Efficient communication support in ..."

  • ...Previously, a Communication Assist (CA) for homogeneous general purpose processors [5] and for accelerators [6] had also been introduced, but it was without a complete framework....

    [...]

  • ...The ARM PE CA was implemented through AXI DMA IP [3] while the accelerator CA implementation is from our previous work [6]....

    [...]

  • ...Additionally, actors Proj, Eros and Bin also have hardware accelerator implementations [6]....

    [...]

  • ...Actors wr and rd are for modeling write and read latencies of a word and actor d1 is for the de-serialization of words that belong to the same token [6]....

    [...]

DOI
01 Jan 2013
TL;DR: The final author version and the galley proof are versions of the publication after peer review and the final published version features the final layout of the paper including the volume, issue and page numbers.
Abstract: • A submitted manuscript is the author's version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers.

2 citations

Dissertation
01 Jan 2013
TL;DR: A prototype MPSoC has been developed specifically for real-time applications, like Software-Defined Radios (SDRs), which place strict requirements: of a guaranteed minimum bandwidth and a maximum bound on the latency of communication between tasks running on different cores.
Abstract: Increasing the computational power of future System-on-Chips (SoCs) is not possible by increasing the frequency of a processor, because power consumption will become a major issue. Energy efficient systems increase the number of cores to reach higher performance levels to form a multi-core embedded system: a Multi-Processor System-on-Chip (MPSoC). This flexible type of system can by customized for specific applications by changing the software running on it. At Th University of Twente, a prototype MPSoC has been developed specifically for real-time applications, like Software-Defined Radios (SDRs). This class of applications place strict requirements: of a guaranteed minimum bandwidth and a maximum bound on the latency of communication between tasks running on different cores. communication within the platform is performed over a connectionless Network-on-Chip (NoC), which should allow a small implementation. However, strict guarantees are harder to accomplish in a connectionless network compared to a connection-oriented network.

2 citations


Cites background from "Efficient communication support in ..."

  • ...The graphs can be used to analyze, at design-time, the temporal behavior and resource requirements (buffer sizes) of applications [24]....

    [...]

01 Jan 2011
TL;DR: In this paper, the authors present a tool flow to map throughput constrained applications on a multi-processor system-on-chip (MPSoC) to provide a tight, conservative bound on the worst-case throughput of the FPGA.
Abstract: This paper describes a design flow to map throughput constrained applications on a Multi-processor System-on-Chip (MPSoC). It integrates several state-of-the-art mapping and synthesis tools into an automated tool flow. This flow takes as input a throughput constrained application, modeled with a synchronous dataflow graph, a C-based implementation for each actor in the graph, and a template based architecture description. Using these inputs, the tool flow generates an MPSoC platform tailored to the application requirements and it subsequently maps the application to this platform. The output of the flow is an FPGA programmable bit file. An easily extensible template based architecture is presented, this architecture allows fast and flexible generation of a predictable platform that can be synthesized using the presented tool flow. The effectiveness of the tool flow is demonstrated by mapping an MJPEG-decoder onto our MPSoC platform. This case study shows that our flow is able to provide a tight, conservative bound on the worst-case throughput of the FPGA implementation. The presented tool flow is freely available at http://www.es.ele.tue.nl/mamps.

2 citations

Journal ArticleDOI
TL;DR: In this article , the authors provide a comprehensive and well-structured snapshot of the existing research on TSN-5G integration, identifying the trends, technical characteristics, and potential gaps in the state of the art.

1 citations

References
More filters
Journal ArticleDOI

37,017 citations


"Efficient communication support in ..." refers methods in this paper

  • ...The front end of the pipeline applies OTSU optimal threshold [20] to segment the OLED structures from the background....

    [...]

Journal ArticleDOI
01 Sep 1987
TL;DR: A preliminary SDF software system for automatically generating assembly language code for DSP microcomputers is described, and two new efficiency techniques are introduced, static buffering and an extension to SDF to efficiently implement conditionals.
Abstract: Data flow is a natural paradigm for describing DSP applications for concurrent implementation on parallel hardware. Data flow programs for signal processing are directed graphs where each node represents a function and each arc represents a signal path. Synchronous data flow (SDF) is a special case of data flow (either atomic or large grain) in which the number of data samples produced or consumed by each node on each invocation is specified a priori. Nodes can be scheduled statically (at compile time) onto single or parallel programmable processors so the run-time overhead usually associated with data flow evaporates. Multiple sample rates within the same system are easily and naturally handled. Conditions for correctness of SDF graph are explained and scheduling algorithms are described for homogeneous parallel processors sharing memory. A preliminary SDF software system for automatically generating assembly language code for DSP microcomputers is described. Two new efficiency techniques are introduced, static buffering and an extension to SDF to efficiently implement conditionals.

1,985 citations


"Efficient communication support in ..." refers background or methods in this paper

  • ...Synchronous Data Flow (SDF) is a model of computation commonly used to model streaming applications [1]....

    [...]

  • ...The Synchronous Data Flow (SDF) model-of-computation is a very powerful model for analyzing streaming applications [1]....

    [...]

Proceedings ArticleDOI
28 Jun 2006
TL;DR: SDF^3 is a tool for generating random Synchronous DataFlow Graphs (SDFGs), if desirable with certain guaranteed properties like strongly connectedness.
Abstract: SDF^3 is a tool for generating random Synchronous DataFlow Graphs (SDFGs), if desirable with certain guaranteed properties like strongly connectedness. It includes an extensive library of SDFG analysis and transformation algorithms as well as functionality to visualize them. The tool can create SDFG benchmarks that mimic DSP or multimedia applications.

305 citations


"Efficient communication support in ..." refers background or methods in this paper

  • ...The analysis tool that MAMPS uses is called SDF3 [8]....

    [...]

  • ..., through SDF analysis tools such as SDF3 [8]....

    [...]

  • ...It calculates buffer assignments, and predicts the throughput of this mapping [8]....

    [...]

  • ...By integrating this SDF model onto our SDF analysis tool, SDF3 [8], worst-case system properties, such as throughput, latency, and buffer sizes can be conservatively analyzed at design time....

    [...]

Proceedings ArticleDOI
28 Jun 2006
TL;DR: A method for throughput analysis of SDFGs, based on explicit state-space exploration, is presented and it is shown that the method, despite its worst-case complexity, works well in practice, while existing methods often fail.
Abstract: Synchronous Data Flow Graphs (SDFGs) are a useful tool for modeling and analyzing embedded data flow applications, both in a single processor and a multiprocessing context or for application mapping on platforms. Throughput analysis of these SDFGs is an important step for verifying throughput requirements of concurrent real-time applications, for instance within design-space exploration activities. Analysis of SDFGs can be hard, since the worst-case complexity of analysis algorithms is often high. This is also true for throughput analysis. In particular, many algorithms involve a conversion to another kind of data flow graph, the size of which can be exponentially larger than the size of the original graph. In this paper, we present a method for throughput analysis of SDFGs, based on explicit state-space exploration and we show that the method, despite its worst-case complexity, works well in practice, while existing methods often fail. We demonstrate this by comparing the method with state-ofthe- art cycle mean computation algorithms. Moreover, since the state-space exploration method is essentially the same as simulation of the graph, the results of this paper can be easily obtained as a byproduct in existing simulation tools.

300 citations

Journal ArticleDOI
TL;DR: The joint problems of identification, tracking and prediction in a multi-target, multi-sensor environment are considered and the previously developed Gaussian sum approach is used.

284 citations