scispace - formally typeset
Search or ask a question
Author

Ajm Arno Moonen

Bio: Ajm Arno Moonen is an academic researcher from Eindhoven University of Technology. The author has contributed to research in topics: System on a chip & Network on a chip. The author has an hindex of 2, co-authored 5 publications receiving 79 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: The authors show what is required from the NoC architecture and demonstrate how to construct an NoC model, with multiple levels of detail, and propose a dataflow model that enables the verification of end-to-end temporal behaviour.
Abstract: A growing number of applications, often with real-time requirements, are integrated on the same system on chip (SoC), in the form of hardware and software intellectual property (IP). To facilitate real-time applications, networks on chip (NoC) guarantee bounds on latency and throughput. These bounds, however, only extend to the network interfaces (NI), between the IP and the NoC. To give performance guarantees on the application level, the buffers in the NIs must be sufficiently large for the particular application. At the same time, it is imperative to minimise the size of the NI buffers, as they are major contributors to the area and power consumption of the NoC. Existing buffer-sizing methods use coarse-grained application models, based on linear traffic bounds or periodic producers and consumers, thus severely limiting their applicability. In this work, the authors propose to capture the behaviour of the NoC and the applications using a dataflow model. This enables one to verify the temporal behaviour and to compute buffer sizes using existing dataflow analysis techniques. The authors show what is required from the NoC architecture and demonstrate how to construct an NoC model, with multiple levels of detail. Using the proposed model, buffer sizes are determined for a range of SoC designs with a run time comparable to existing analytical methods, and results comparable to exhaustive simulation. For an application case study, where existing buffer-sizing methods are not applicable, the proposed model enables the verification of end-to-end temporal behaviour.

68 citations

01 Jan 2007
TL;DR: This paper analyzes three causes for the difference between the computed and measured throughput and measures the throughput with a cycle accurate simulation for channel equalizer application.
Abstract: Providing real-time guarantees in complex, heterogeneous, and embedded multiprocessor systems is an important issue because they affect the perceived quality. Digital signal processing algorithms are often modeled with dataflow models. A guaranteed minimum throughput can be computed from such dataflow model. In this paper we analyze three causes for the difference between the computed and measured throughput. We measure the throughput with a cycle accurate simulation. For our channel equalizer application the measured throughput is 10.1% higher than the computed minimum throughput.

6 citations

01 Jan 2007
TL;DR: A novel consistency model, streaming consistency, for the streaming domain in which tasks communicate through circular buffers is presented, which allows more reordering than release consistency and enables an efficient software cache coherency solution and posted writes.
Abstract: Multiprocessor systems-on-chip (MPSoC) with distributed shared memory and caches are flexible when it comes to inter-processor communication but require an efficient memory consistency and cache coherency solution. In this paper we present a novel consistency model, streaming consistency, for the streaming domain in which tasks communicate through circular buffers. The model allows more reordering than release consistency and, among other optimizations, enables an efficient software cache coherency solution and posted writes. We also present a software cache coherency implementation and discuss a software circular buffer administration that does not need an atomic read-modify-write instruction. A small experiment demonstrates the potential performance increase of posted writes in MPSoCs with high communication latencies.

2 citations

Book ChapterDOI
01 Jan 2008
TL;DR: In this article, the authors describe two existing bus-based reference designs and compare the original interconnects with an AEthereal NoC. They show through these two case study implementations that the area cost of the NoC, which is dominated by the number of network connections, is competitive with traditional interconnect.
Abstract: The growing complexity of multiprocessor systems on chip make the integration of Intellectual Property (IP) blocks into a working system a major challenge. Networks-on-Chip (NoCs) facilitate a modular design approach which addresses the hardware challenges in designing such a system. Guaranteed communication services, offered by the AEthereal NoC, address the software challenges by making the system more robust and easier to design. This paper describes two existing bus-based reference designs and compares the original interconnects with an AEthereal NoC. We show through these two case study implementations that the area cost of the NoC, which is dominated by the number of network connections, is competitive with traditional interconnects. Furthermore, we show that the latency in the NoC-based design is still acceptable for our application.

2 citations

01 Jan 2007
TL;DR: The worst-case execution time of tasks does not depend on communication bandwidth if a Communication Assist is applied, despite that memory ports are shared, and it is shown that adding a CA increases the processor utilization and reduces the required communication bandwidth.
Abstract: In an embedded multiprocessor system the minimum throughput and maximum latency of real-time applications are usually derived given the worst-case execution time of the software tasks. Derivation of the worst-case execution time becomes easier if it is independent of the available communication bandwidth. In this paper we show that the worst-case execution time of tasks does not depend on communication bandwidth if a Communication Assist (CA) is applied, despite that memory ports are shared. Furthermore we show that adding a CA increases the processor utilization and reduces the required communication bandwidth. Finally we show that the difference between the measured and computed worst-case processor utilization is less than 6%, for our MP3 playback application.

1 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: An exact technique is presented to chart the Pareto space of throughput and storage trade-offs, which can be used to determine the minimal buffer space needed to execute a graph under a given throughput constraint.
Abstract: Multimedia applications usually have throughput constraints. An implementation must meet these constraints, while it minimizes resource usage and energy consumption. The compute intensive kernels of these applications are often specified as cyclo-static or synchronous dataflow graphs. Communication between nodes in these graphs requires storage space which influences throughput. We present an exact technique to chart the Pareto space of throughput and storage trade-offs, which can be used to determine the minimal buffer space needed to execute a graph under a given throughput constraint. The feasibility of the exact technique is demonstrated with experiments on a set of realistic DSP and multimedia applications. To increase scalability of the approach, a fast approximation technique is developed that guarantees both throughput and a, tight, bound on the maximal overestimation of buffer requirements. The approximation technique allows to trade off worst-case overestimation versus run-time.

154 citations

Proceedings ArticleDOI
14 Mar 2011
TL;DR: Three general techniques to implement and model predictable and composable resources are presented, and their applicability in the context of a memory controller is demonstrated.
Abstract: Designing multi-processor systems-on-chips becomes increasingly complex, as more applications with realtime requirements execute in parallel. System resources, such as memories, are shared between applications to reduce cost, causing their timing behavior to become inter-dependent. Using conventional simulation-based verification, this requires all concurrently executing applications to be verified together, resulting in a rapidly increasing verification complexity. Predictable and composable systems have been proposed to address this problem. Predictable systems provide bounds on performance, enabling formal analysis to be used as an alternative to simulation. Composable systems isolate applications, enabling them to be verified independently. Predictable and composable systems are built from predictable and composable resources. This paper presents three general techniques to implement and model predictable and composable resources, and demonstrates their applicability in the context of a memory controller. The architecture of the memory controller is general and supports both SRAM and DDR2/DDR3 SDRAM and a wide range of arbiters, making it suitable for many predictable and composable systems. The modeling approach is based on a shared-resource abstraction that covers any combination of supported memory and arbiter and enables system-level performance analysis with a variety of well-known frameworks, such as network calculus or data-flow analysis.

77 citations

Book ChapterDOI
TL;DR: In this article, composability and predictability are used to reduce the complexity of system-on-chip (soc) design for real-time requirements such as minimum throughput or a maximum latency.
Abstract: System-on-chip (soc) design gets increasingly complex, as a growing number of applications are integrated in modern systems. Some of these applications have real-time requirements, such as a minimum throughput or a maximum latency. To reduce cost, system resources are shared between applications, making their timing behavior inter-dependent. Real-time requirements must hence e verified for all possible combinations of concurrently executing applications, which is not feasible with commonly used simulation-based techniques. This chapter addresses this problem using two complexity-reducing concepts: composability and predictability. Applications in a composable system are completely isolated and cannot affect each other's behaviors, enabling them to be independently verified. Predictable systems, on the other hand, provide lower bounds on performance, allowing applications to be verified using formal performance analysis. Five techniques to achieve composability and/or predictability in soc resources are presented and we explain their implementation for processors, interconnect, and memories in our platform.

55 citations

Journal ArticleDOI
TL;DR: Four popular mathematical formalisms—queueing theory, network calculus, schedulability analysis, and dataflow analysis—and how they have been applied to the analysis of on-chip communication performance in Systems-on-Chip are reviewed.
Abstract: This article reviews four popular mathematical formalisms—queueing theory, network calculus, schedulability analysis, and dataflow analysis—and how they have been applied to the analysis of on-chip communication performance in Systems-on-Chip. The article discusses the basic concepts and results of each formalism and provides examples of how they have been used in Networks-on-Chip (NoCs) performance analysis. Also, the respective strengths and weaknesses of each technique and its suitability for a specific purpose are investigated. An open research issue is a unified analytical model for a comprehensive performance evaluation of NoCs. To this end, this article reviews the attempts that have been made to bridge these formalisms.

55 citations

Proceedings ArticleDOI
12 Apr 2011
TL;DR: In this article, the authors introduce a theory of timed actors whose notion of refinement is based on the principle of worst-case design that permeates the world of performance-critical systems.
Abstract: Programming embedded and cyber-physical systems requires attention not only to functional behavior and correctness, but also to non-functional aspects and specifically timing and performance. A structured, compositional, model-based approach based on stepwise refinement and abstraction techniques can support the development process, increase its quality and reduce development time through automation of synthesis, analysis or verification. Toward this, we introduce a theory of timed actors whose notion of refinement is based on the principle of worst-case design that permeates the world of performance-critical systems. This is in contrast with the classical behavioral and functional refinements based on restricting sets of behaviors. Our refinement allows time-deterministic abstractions to be made of time-non-deterministic systems, improving efficiency and reducing complexity of formal analysis. We show how our theory relates to, and can be used to reconcile existing time and performance models and their established theories.

51 citations