Home
/
Authors
/
Ajm Arno Moonen

Author

Ajm Arno Moonen

Bio: Ajm Arno Moonen is an academic researcher from Eindhoven University of Technology. The author has contributed to research in topics: System on a chip & Network on a chip. The author has an hindex of 2, co-authored 5 publications receiving 79 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Enabling application-level performance guarantees in network-based systems on chip by applying dataflow analysis

[...]

MA Andreas Hansson¹, M Wiggers², Ajm Arno Moonen¹, Kgw Kees Goossens³, Mjg Marco Bekooij⁴ - Show less +1 more•Institutions (4)

Eindhoven University of Technology¹, University of Twente², Delft University of Technology³, NXP Semiconductors⁴

11 Aug 2009-Iet Computers and Digital Techniques

TL;DR: The authors show what is required from the NoC architecture and demonstrate how to construct an NoC model, with multiple levels of detail, and propose a dataflow model that enables the verification of end-to-end temporal behaviour.

...read moreread less

Abstract: A growing number of applications, often with real-time requirements, are integrated on the same system on chip (SoC), in the form of hardware and software intellectual property (IP). To facilitate real-time applications, networks on chip (NoC) guarantee bounds on latency and throughput. These bounds, however, only extend to the network interfaces (NI), between the IP and the NoC. To give performance guarantees on the application level, the buffers in the NIs must be sufficiently large for the particular application. At the same time, it is imperative to minimise the size of the NI buffers, as they are major contributors to the area and power consumption of the NoC. Existing buffer-sizing methods use coarse-grained application models, based on linear traffic bounds or periodic producers and consumers, thus severely limiting their applicability. In this work, the authors propose to capture the behaviour of the NoC and the applications using a dataflow model. This enables one to verify the temporal behaviour and to compute buffer sizes using existing dataflow analysis techniques. The authors show what is required from the NoC architecture and demonstrate how to construct an NoC model, with multiple levels of detail. Using the proposed model, buffer sizes are determined for a range of SoC designs with a run time comparable to existing analytical methods, and results comparable to exhaustive simulation. For an application case study, where existing buffer-sizing methods are not applicable, the proposed model enables the verification of end-to-end temporal behaviour.

...read moreread less

68 citations

Evaluation of the throughput computed with a dataflow model : a case study

[...]

Ajm Arno Moonen, Mjg Marco Bekooij, van den Rmj Berg, J. van Meerbergen

01 Jan 2007

TL;DR: This paper analyzes three causes for the difference between the computed and measured throughput and measures the throughput with a cycle accurate simulation for channel equalizer application.

...read moreread less

Abstract: Providing real-time guarantees in complex, heterogeneous, and embedded multiprocessor systems is an important issue because they affect the perceived quality. Digital signal processing algorithms are often modeled with dataflow models. A guaranteed minimum throughput can be computed from such dataflow model. In this paper we analyze three causes for the difference between the computed and measured throughput. We measure the throughput with a cycle accurate simulation. For our channel equalizer application the measured throughput is 10.1% higher than the computed minimum throughput.

...read moreread less

6 citations

Streaming memory consistency for efficient MPSoC design

[...]

J.W. van den Brand, Mjg Marco Bekooij, Ajm Arno Moonen

01 Jan 2007

TL;DR: A novel consistency model, streaming consistency, for the streaming domain in which tasks communicate through circular buffers is presented, which allows more reordering than release consistency and enables an efficient software cache coherency solution and posted writes.

...read moreread less

Abstract: Multiprocessor systems-on-chip (MPSoC) with distributed shared memory and caches are flexible when it comes to inter-processor communication but require an efficient memory consistency and cache coherency solution. In this paper we present a novel consistency model, streaming consistency, for the streaming domain in which tasks communicate through circular buffers. The model allows more reordering than release consistency and, among other optimizations, enables an efficient software cache coherency solution and posted writes. We also present a software cache coherency implementation and discuss a software circular buffer administration that does not need an atomic read-modify-write instruction. A small experiment demonstrates the potential performance increase of posted writes in MPSoCs with high communication latencies.

...read moreread less

2 citations

Book Chapter•DOI•

Comparison of an Æthereal Network on Chip and Traditional Interconnects - Two Case Studies

[...]

Ajm Arno Moonen¹, Chris Bartels¹, Marco J. G. Bekooij², René van den Berg², Harpreet Bhullar², Kees Goossens², Patrick Groeneveld¹, Jos Huisken, Jef van Meerbergen¹ - Show less +5 more•Institutions (2)

Eindhoven University of Technology¹, NXP Semiconductors²

01 Jan 2008

TL;DR: In this article, the authors describe two existing bus-based reference designs and compare the original interconnects with an AEthereal NoC. They show through these two case study implementations that the area cost of the NoC, which is dominated by the number of network connections, is competitive with traditional interconnect.

...read moreread less

Abstract: The growing complexity of multiprocessor systems on chip make the integration of Intellectual Property (IP) blocks into a working system a major challenge. Networks-on-Chip (NoCs) facilitate a modular design approach which addresses the hardware challenges in designing such a system. Guaranteed communication services, offered by the AEthereal NoC, address the software challenges by making the system more robust and easier to design. This paper describes two existing bus-based reference designs and compares the original interconnects with an AEthereal NoC. We show through these two case study implementations that the area cost of the NoC, which is dominated by the number of network connections, is competitive with traditional interconnects. Furthermore, we show that the latency in the NoC-based design is still acceptable for our application.

...read moreread less

2 citations

Analysing the impact of a communication assist in a multiprocessor system-on-chip

[...]

Ajm Arno Moonen, Mjg Marco Bekooij, van den Rmj Berg, J. van Meerbergen

01 Jan 2007

TL;DR: The worst-case execution time of tasks does not depend on communication bandwidth if a Communication Assist is applied, despite that memory ports are shared, and it is shown that adding a CA increases the processor utilization and reduces the required communication bandwidth.

...read moreread less

Abstract: In an embedded multiprocessor system the minimum throughput and maximum latency of real-time applications are usually derived given the worst-case execution time of the software tasks. Derivation of the worst-case execution time becomes easier if it is independent of the available communication bandwidth. In this paper we show that the worst-case execution time of tasks does not depend on communication bandwidth if a Communication Assist (CA) is applied, despite that memory ports are shared. Furthermore we show that adding a CA increases the processor utilization and reduces the required communication bandwidth. Finally we show that the difference between the measured and computed worst-case processor utilization is less than 6%, for our MP3 playback application.

...read moreread less

1 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Throughput-Buffering Trade-Off Exploration for Cyclo-Static and Synchronous Dataflow Graphs

[...]

Sander Stuijk¹, Marc Geilen¹, Twan Basten¹•Institutions (1)

Eindhoven University of Technology¹

01 Oct 2008-IEEE Transactions on Computers

TL;DR: An exact technique is presented to chart the Pareto space of throughput and storage trade-offs, which can be used to determine the minimal buffer space needed to execute a graph under a given throughput constraint.

...read moreread less

Abstract: Multimedia applications usually have throughput constraints. An implementation must meet these constraints, while it minimizes resource usage and energy consumption. The compute intensive kernels of these applications are often specified as cyclo-static or synchronous dataflow graphs. Communication between nodes in these graphs requires storage space which influences throughput. We present an exact technique to chart the Pareto space of throughput and storage trade-offs, which can be used to determine the minimal buffer space needed to execute a graph under a given throughput constraint. The feasibility of the exact technique is demonstrated with experiments on a set of realistic DSP and multimedia applications. To increase scalability of the approach, a fast approximation technique is developed that guarantees both throughput and a, tight, bound on the maximal overestimation of buffer requirements. The approximation technique allows to trade off worst-case overestimation versus run-time.

...read moreread less

154 citations

Proceedings Article•DOI•

Architectures and modeling of predictable memory controllers for improved system integration

[...]

Benny Akesson¹, Kees Goossens¹•Institutions (1)

Eindhoven University of Technology¹

14 Mar 2011

TL;DR: Three general techniques to implement and model predictable and composable resources are presented, and their applicability in the context of a memory controller is demonstrated.

...read moreread less

Abstract: Designing multi-processor systems-on-chips becomes increasingly complex, as more applications with realtime requirements execute in parallel. System resources, such as memories, are shared between applications to reduce cost, causing their timing behavior to become inter-dependent. Using conventional simulation-based verification, this requires all concurrently executing applications to be verified together, resulting in a rapidly increasing verification complexity. Predictable and composable systems have been proposed to address this problem. Predictable systems provide bounds on performance, enabling formal analysis to be used as an alternative to simulation. Composable systems isolate applications, enabling them to be verified independently. Predictable and composable systems are built from predictable and composable resources. This paper presents three general techniques to implement and model predictable and composable resources, and demonstrates their applicability in the context of a memory controller. The architecture of the memory controller is general and supports both SRAM and DDR2/DDR3 SDRAM and a wide range of arbiters, making it suitable for many predictable and composable systems. The modeling approach is based on a shared-resource abstraction that covers any combination of supported memory and arbiter and enables system-level performance analysis with a variety of well-known frameworks, such as network calculus or data-flow analysis.

...read moreread less

77 citations

Book Chapter•DOI•

Composability and Predictability for Independent Application Development,Verification, and Execution

[...]

Benny Akesson¹, Anca Molnos, Andreas Hansson, Jude Ambrose Angelo, Kees Goossens - Show less +1 more•Institutions (1)

Eindhoven University of Technology¹

01 Jan 2011-Circuits and Systems

TL;DR: In this article, composability and predictability are used to reduce the complexity of system-on-chip (soc) design for real-time requirements such as minimum throughput or a maximum latency.

...read moreread less

Abstract: System-on-chip (soc) design gets increasingly complex, as a growing number of applications are integrated in modern systems. Some of these applications have real-time requirements, such as a minimum throughput or a maximum latency. To reduce cost, system resources are shared between applications, making their timing behavior inter-dependent. Real-time requirements must hence e verified for all possible combinations of concurrently executing applications, which is not feasible with commonly used simulation-based techniques. This chapter addresses this problem using two complexity-reducing concepts: composability and predictability. Applications in a composable system are completely isolated and cannot affect each other's behaviors, enabling them to be independently verified. Predictable systems, on the other hand, provide lower bounds on performance, allowing applications to be verified using formal performance analysis. Five techniques to achieve composability and/or predictability in soc resources are presented and we explain their implementation for processors, interconnect, and memories in our platform.

...read moreread less

55 citations

Journal Article•DOI•

Mathematical formalisms for performance evaluation of networks-on-chip

[...]

Abbas Eslami Kiasari¹, Axel Jantsch¹, Zhonghai Lu¹•Institutions (1)

Royal Institute of Technology¹

03 Jul 2013-ACM Computing Surveys

TL;DR: Four popular mathematical formalisms—queueing theory, network calculus, schedulability analysis, and dataflow analysis—and how they have been applied to the analysis of on-chip communication performance in Systems-on-Chip are reviewed.

...read moreread less

Abstract: This article reviews four popular mathematical formalisms—queueing theory, network calculus, schedulability analysis, and dataflow analysis—and how they have been applied to the analysis of on-chip communication performance in Systems-on-Chip. The article discusses the basic concepts and results of each formalism and provides examples of how they have been used in Networks-on-Chip (NoCs) performance analysis. Also, the respective strengths and weaknesses of each technique and its suitability for a specific purpose are investigated. An open research issue is a unified analytical model for a comprehensive performance evaluation of NoCs. To this end, this article reviews the attempts that have been made to bridge these formalisms.

...read moreread less

55 citations

Proceedings Article•DOI•

The earlier the better: a theory of timed actor interfaces

[...]

Marc Geilen¹, Stavros Tripakis², Maarten H. Wiggers³•Institutions (3)

Eindhoven University of Technology¹, University of California, Berkeley², University of Twente³

12 Apr 2011

TL;DR: In this article, the authors introduce a theory of timed actors whose notion of refinement is based on the principle of worst-case design that permeates the world of performance-critical systems.

...read moreread less

Abstract: Programming embedded and cyber-physical systems requires attention not only to functional behavior and correctness, but also to non-functional aspects and specifically timing and performance. A structured, compositional, model-based approach based on stepwise refinement and abstraction techniques can support the development process, increase its quality and reduce development time through automation of synthesis, analysis or verification. Toward this, we introduce a theory of timed actors whose notion of refinement is based on the principle of worst-case design that permeates the world of performance-critical systems. This is in contrast with the classical behavioral and functional refinements based on restricting sets of behaviors. Our refinement allows time-deterministic abstractions to be made of time-non-deterministic systems, improving efficiency and reducing complexity of formal analysis. We show how our theory relates to, and can be used to reconcile existing time and performance models and their established theories.

...read moreread less

51 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16

Collapse