Showing papers by "Kees Goossens published in 2015"

PDF

Open Access

Journal Article•DOI•

[...]

Martin Schoeberl¹, Sahar Abbaspour¹, Benny Akesson², Neil Audsley³, Raffaele Capasso, Jamie Garside³, Kees Goossens⁴, Sven Goossens⁴, Scott Hansen⁵, Reinhold Heckmann, Stefan Hepp⁶, Benedikt Huber⁶, Alexander Jordan¹, Evangelia Kasapaki¹, Jens Knoop⁶, Yonghui Li⁴, Daniel Prokesch⁶, Wolfgang Puffitsch¹, Peter Puschner⁶, Andre Rocha, Claudio Silva, Jens Sparsø¹, Alessandro Tocchi - Show less +19 more•Institutions (6)

Technical University of Denmark¹, Czech Technical University in Prague², University of York³, Eindhoven University of Technology⁴, Open Group⁵, Vienna University of Technology⁶

01 Oct 2015

TL;DR: Within the T-CREST project the authors propose novel solutions for time-predictable multi-core architectures that are optimized for the WCET instead of the average-case execution time.

...read moreread less

Abstract: Real-time systems need time-predictable platforms to allow static analysis of the worst-case execution time (WCET). Standard multi-core processors are optimized for the average case and are hardly analyzable. Within the T-CREST project we propose novel solutions for time-predictable multi-core architectures that are optimized for the WCET instead of the average-case execution time. The resulting time-predictable resources (processors, interconnect, memory arbiter, and memory controller) and tools (compiler, WCET analysis) are designed to ease WCET analysis and to optimize WCET performance. Compared to other processors the WCET performance is outstanding.The T-CREST platform is evaluated with two industrial use cases. An application from the avionic domain demonstrates that tasks executing on different cores do not interfere with respect to their WCET. A signal processing application from the railway domain shows that the WCET can be reduced for computation-intensive tasks when distributing the tasks on several cores and using the network-on-chip for communication. With three cores the WCET is improved by a factor of 1.8 and with 15 cores by a factor of 5.7.The T-CREST project is the result of a collaborative research and development project executed by eight partners from academia and industry. The European Commission funded T-CREST.

...read moreread less

166 citations

Proceedings Article•DOI•

Enhanced Time-Slotted Channel Hopping in WSNs Using Non-intrusive Channel-Quality Estimation

[...]

Rasool Tavakoli¹, Majid Nabi¹, Twan Basten¹, Kees Goossens¹•Institutions (1)

Eindhoven University of Technology¹

19 Oct 2015

TL;DR: Experimental results show that ETSCH improves reliability of network communication, compared to basic TSCH and a more advanced mechanism ATSCH, which provides higher packet reception ratios and reduces the maximum length of burst packet losses.

...read moreread less

Abstract: Cross-technology interference on the license-free ISM bands has a major negative effect on the performance of Wireless Sensor Networks (WSNs). Channel hopping has been adopted in the Time-Slotted Channel Hopping (TSCH) mode of IEEE 802.15.4e to eliminate blocking of wireless links caused by external interference on some frequency channels. This paper proposes an Enhanced version of the TSCH protocol (ETSCH) which restricts the used channels for hopping to the channels that are measured to be of good quality. The quality of channels is extracted using a new Non-Intrusive Channel-quality Estimation (NICE) technique by performing energy detections in selected idle periods every timeslot. NICE enables ETSCH to follow dynamic interference well, while it does not reduce throughput of the network. It also does not change the protocol, and does not require non-standard hardware. ETSCH uses a small Enhanced Beacon hopping Sequence List (EBSL) to broadcast periodic Enhanced Beacons (EB) in the network to synchronize nodes at the start of timeslots. Experimental results show that ETSCH improves reliability of network communication, compared to basic TSCH and a more advanced mechanism ATSCH. It provides higher packet reception ratios and reduces the maximum length of burst packet losses.

...read moreread less

36 citations

Journal Article•DOI•

Dataflow formalisation of real-time streaming applications on a Composable and Predictable Multi-Processor SOC

[...]

Andrew Nelson¹, Kees Goossens¹, Benny Akesson²•Institutions (2)

Eindhoven University of Technology¹, Czech Technical University in Prague²

01 Oct 2015

TL;DR: A dataflow formalisation is described to independently model real-time applications executing on the CompSOC platform, including new models of the entire software stack, and correctly predicts trends, such as application speed-up when mapping an application to more processors.

...read moreread less

Abstract: Embedded systems often contain multiple applications, some of which have real-time requirements and whose performance must be guaranteed. To efficiently execute applications, modern embedded systems contain Globally Asynchronous Locally Synchronous (GALS) processors, network on chip, DRAM and SRAM memories, and system software, e.g. microkernel and communication libraries. In this paper we describe a dataflow formalisation to independently model real-time applications executing on the CompSOC platform, including new models of the entire software stack. We compare the guaranteed application throughput as computed by our tool flow to the throughput measured on an FPGA implementation of the platform, for both synthetic and real H.263 applications. The dataflow formalisation is composable (i.e. independent for each real-time application), conservative, models the impact of GALS on performance, and correctly predicts trends, such as application speed-up when mapping an application to more processors.

...read moreread less

28 citations

Proceedings Article•DOI•

A generic, scalable and globally arbitrated memory tree for shared DRAM access in real-time systems

[...]

Manil Dev Gomony¹, Jamie Garside², Benny Akesson³, Neil Audsley², Kees Goossens¹ - Show less +1 more•Institutions (3)

Eindhoven University of Technology¹, University of York², Czech Technical University in Prague³

09 Mar 2015

TL;DR: A novel generic, scalable and globally arbitrated memory tree (GSMT) architecture for distributed implementation of several predictable arbitration policies and compares the performance of GSMT with different centralized implementations by synthesizing the designs in a 40 nm process.

...read moreread less

Abstract: Predictable arbitration policies, such as Time Division Multiplexing (TDM) and Round-Robin (RR), are used to provide firm real-time guarantees to clients sharing a single memory resource (DRAM) between the multiple memory clients in multi-core real-time systems. Traditional centralized implementations of predictable arbitration policies in a shared memory bus or interconnect are not scalable in terms of the number of clients. On the other hand, existing distributed memory interconnects are either globally arbitrated, which do not offer diverse service according to the heterogeneous client requirements, or locally arbitrated, which suffers from larger area, power and latency overhead. Moreover, selecting the right arbitration policy according to the diverse and dynamic client requirements in reusable platforms requires a generic re-configurable architecture supporting different arbitration policies. The main contributions in this paper are: (1) We propose a novel generic, scalable and globally arbitrated memory tree (GSMT) architecture for distributed implementation of several predictable arbitration policies. (2) We present an RTL-level implementation of Accounting and Priority assignment (APA) logic of GSMT that can be configured with five different arbitration policies typically used for shared memory access in real-time systems. (3) We compare the performance of GSMT with different centralized implementations by synthesizing the designs in a 40 nm process. Our experiments show that with 64 clients GSMT can run up to four times faster than traditional architectures and have over 51% and 37% reduction in area and power consumption, respectively.

...read moreread less

20 citations

Journal Article•DOI•

A Real-Time Multichannel Memory Controller and Optimal Mapping of Memory Clients to Memory Channels

[...]

Manil Dev Gomony¹, Benny Akesson², Kees Goossens¹•Institutions (2)

Eindhoven University of Technology¹, Czech Technical University in Prague²

17 Feb 2015-ACM Transactions in Embedded Computing Systems

TL;DR: A configurable real-time multichannel memory controller architecture with a novel method for logical-to-physical address translation and two design-time methods to map memory clients to the memory channels, one an optimal algorithm based on an integer programming formulation of the mapping problem, and the other a fast heuristic algorithm.

...read moreread less

Abstract: Ever-increasing demands for main memory bandwidth and memory speed/power tradeoff led to the introduction of memories with multiple memory channels, such as Wide IO DRAM. Efficient utilization of a multichannel memory as a shared resource in multiprocessor real-time systems depends on mapping of the memory clients to the memory channels according to their requirements on latency, bandwidth, communication, and memory capacity. However, there is currently no real-time memory controller for multichannel memories, and there is no methodology to optimally configure multichannel memories in real-time systems. As a first work toward this direction, we present two main contributions in this article: (1) a configurable real-time multichannel memory controller architecture with a novel method for logical-to-physical address translation and (2) two design-time methods to map memory clients to the memory channels, one an optimal algorithm based on an integer programming formulation of the mapping problem, and the other a fast heuristic algorithm. We demonstrate the real-time guarantees on bandwidth and latency provided by our multichannel memory controller architecture by experimental evaluation. Furthermore, we compare the performance of the mapping problem formulation in a solver and the heuristic algorithm against two existing mapping algorithms in terms of computation time and mapping success ratio. We show that an optimal solution can be found in 2 hours using the solver and in less than 1 second with less than 7p mapping failure using the heuristic for realistically sized problems. Finally, we demonstrate configuring a Wide IO DRAM in a high-definition (HD) video and graphics processing system to emphasize the practical applicability and effectiveness of this work.

...read moreread less

19 citations

Proceedings Article•DOI•

Composable Platform-Aware Embedded Control Systems on a Multi-core Architecture

[...]

Juan Valencia¹, Dip Goswami¹, Kees Goossens¹•Institutions (1)

Eindhoven University of Technology¹

26 Aug 2015

TL;DR: The proposed design flow implements the feedback loops in a data-driven fashion leading to time-varying sampling periods with short average sampling period, and outperforms traditional control design flows in terms of quality of control (QoC).

...read moreread less

Abstract: In this work, we propose a design flow for efficient implementation of embedded feedback control systems targeted for multi-core platforms. We consider a composable tile-based architecture as an implementation platform and realise the proposed design flow onto one instance of this architecture. The proposed design flow implements the feedback loops in a data-driven fashion leading to time-varying sampling periods with short average sampling period. Our design flow is composed of two phases: (i) representing the timing behaviour imposed by the platform by a finite and known set of sampling periods, which is achieved exploiting the composability of the platform, and (ii) a linear matrix inequality (LMI) based platform-aware control algorithm that explicitly takes the derived platform timing characteristics and the shorter average sampling period into account. Our results show that the platform-aware implementation outperforms traditional control design flows (i.e., almost 2 times) in terms of quality of control (QoC).

...read moreread less

15 citations

Journal Article•DOI•

Composable and predictable dynamic loading for time-critical partitioned systems on multiprocessor architectures

[...]

Shubhendu Sinha¹, Martijn Koedam¹, Gabriela Breaban¹, Andrew Nelson¹, Ashkan Beyranvand Nejad², Marc Geilen¹, Kees Goossens¹ - Show less +3 more•Institutions (2)

Eindhoven University of Technology¹, Delft University of Technology²

01 Nov 2015-Microprocessors and Microsystems

TL;DR: This paper proposes a software architecture to dynamically create and manage partitions and a method for compostable dynamic loading and implements the software architecture for a SoC prototype on an FPGA board and demonstrates its composability and predictability properties.

...read moreread less

13 citations

Proceedings Article•DOI•

Designing applications for heterogeneous many-core architectures with the FlexTiles Platform

[...]

Benedikt Janssen¹, Fynn Schwiegelshohn¹, Martijn Koedam², Francois Duhem, Leonard Masing³, Stephan Werner³, Christophe Huriaux⁴, Antoine Courtay⁴, Emilie Wheatley³, Kees Goossens², Fabrice Lemonnier, Philippe Millet, Jürgen Becker³, Olivier Sentieys⁵, Michael Hübner¹ - Show less +11 more•Institutions (5)

Ruhr University Bochum¹, Eindhoven University of Technology², Karlsruhe Institute of Technology³, University of Rennes⁴, French Institute for Research in Computer Science and Automation⁵

19 Jul 2015

TL;DR: The FlexTiles Platform aims to create a self-adaptive heterogeneous many-core architecture which is able to dynamically manage load balancing, power consumption and faulty modules to make the architecture efficient and to keep programming effort low.

...read moreread less

Abstract: The FlexTiles Platform has been developed within a Seventh Framework Programme project which is co-funded by the European Union with ten participants of five countries. It aims to create a self-adaptive heterogeneous many-core architecture which is able to dynamically manage load balancing, power consumption and faulty modules. Its focus is to make the architecture efficient and to keep programming effort low. Therefore, the concept contains a dedicated automated tool-flow for creating both the hardware and the software, a simulation platform that can execute the same binaries as the FPGA prototype and a virtualization layer to manage the final heterogeneous many-core architecture for run-time adaptability. With this approach software development productivity can be increased and thus, the time-to-market and development costs can be decreased. In this paper we present the FlexTiles Development Platform with a many-core architecture demonstration. The steps to implement, validate and integrate two use-cases are discussed.

...read moreread less

9 citations

Proceedings Article•DOI•

A Scenario-Aware Dataflow Programming Model

[...]

Reinier van Kampenhout, Sander Stuijk, Kees Goossens

26 Aug 2015

TL;DR: An FSM-SADF programming model is explored, and three different alternatives for scenario switching are proposed, showing that design choices offer interesting trade-offs between run-time cost and resource budgets.

...read moreread less

Abstract: The FSM-SADF model of computation allows to find a tight bound on the throughput of firm real-time applications by capturing dynamic variations in scenarios. We explore an FSM-SADF programming model, and propose three different alternatives for scenario switching. The best candidate for our CompSOC platform was implemented, and experiments confirm that the tight throughput bound results in a reduced resource budget. This comes at the cost of a predictable overhead at run-time as well as increased communication and memory budgets. We show that design choices offer interesting trade-offs between run-time cost and resource budgets.

...read moreread less

5 citations

Proceedings Article•DOI•

Mode-controlled data-flow modeling of real-time memory controllers

[...]

Yonghui Li¹, Hrishikesh Salunkhe¹, João Bastos¹, Orlando Moreira², Benny Akesson³, Kees Goossens¹ - Show less +2 more•Institutions (3)

Eindhoven University of Technology¹, Intel², International Student Exchange Programs³

08 Oct 2015

TL;DR: A new mode-controlled data-flow (MCDF) model is proposed to capture the command scheduling dependencies of memory transactions with variable sizes and outperforms state-of-the-art analysis approaches and improves the WCBW by 22% without known transaction sequences.

...read moreread less

Abstract: SDRAM is a shared resource in modern multi-core platforms executing multiple real-time (RT) streaming applications. It is crucial to analyze the minimum guaranteed SDRAM bandwidth to ensure that the requirements of the RT streaming applications are always satisfied. However, deriving the worstcase bandwidth (WCBW) is challenging because of the diverse memory traffic with variable transaction sizes. In fact, existing RT memory controllers either do not efficiently support variable transaction sizes or do not provide an analysis to tightly bound WCBW in their presence. We propose a new mode-controlled data-flow (MCDF) model to capture the command scheduling dependencies of memory transactions with variable sizes. The WCBW can be obtained by employing an existing tool to automatically analyze our MCDF model rather than using existing static analysis techniques, which in contrast to our model are hard to extend to cover different RT memory controllers. Moreover, the MCDF analysis can exploit static information about known transaction sequences provided by the applications or by the memory arbiter. Experimental results show that 77% improvement of WCBW can be achieved compared to the case without known transaction sequences. In addition, the results demonstrate that the proposed MCDF model outperforms state-of-the-art analysis approaches and improves the WCBW by 22% without known transaction sequences.

...read moreread less

4 citations

Proceedings Article•DOI•

Run-time middleware to support real-time system scenarios

[...]

Kees Goossens¹, Martijn Koedam¹, Shubhendu Sinha¹, Andrew Nelson¹, Marc Geilen¹ - Show less +1 more•Institutions (1)

Eindhoven University of Technology¹

01 Aug 2015

TL;DR: This paper describes how the CompSOC platform offers system integrators and application writers the capability to implement multiple scenarios, and proposes a solution to address this need.

...read moreread less

Abstract: Systems on Chip (SOC) are powerful multiprocessor systems capable of running multiple independent applications, often with both real-time and non-real-time requirements. Scenarios exist at two levels: first, combinations of independent applications, and second, different states of a single application. Scenarios are dynamic since applications can be started and stopped independently, and a single application's behaviour can depend on its inputs, on different stages in processing, and so on. In this paper we describe how the CompSOC platform offers system integrators and application writers the capability to implement multiple scenarios.

...read moreread less

Journal Article•DOI•

Maximizing the Number of Good Dies for Streaming Applications in NoC-Based MPSoCs Under Process Variation

[...]

Davit Mirzoyan¹, Benny Akesson², Sander Stuijk³, Kees Goossens³•Institutions (3)

Delft University of Technology¹, Czech Technical University in Prague², Eindhoven University of Technology³

24 Sep 2015-ACM Transactions in Embedded Computing Systems

TL;DR: It is shown on both synthetic and real applications that the proposed better-than-worst-case design approach can increase the number of good dies by up to 9.6% and 18.8% for designs with and without fixed SRAM and IO blocks, respectively.

...read moreread less

Abstract: Scaling CMOS technology into nanometer feature-size nodes has made it practically impossible to precisely control the manufacturing process. This results in variation in the speed and power consumption of a circuit. As a solution to process-induced variations, circuits are conventionally implemented with conservative design margins to guarantee the target frequency of each hardware component in manufactured multiprocessor chips. This approach, referred to as worst-case design, results in a considerable circuit upsizing, in turn reducing the number of dies on a wafer.This work deals with the design of real-time systems for streaming applications (e.g., video decoders) constrained by a throughput requirement (e.g., frames per second) with reduced design margins, referred to as better-than-worst-case design. To this end, the first contribution of this work is a complete modeling framework that captures a streaming application mapped to an NoC-based multiprocessor system with voltage-frequency islands under process-induced die-to-die and within-die frequency variations. The framework is used to analyze the impact of variations in the frequency of hardware components on application throughput at the system level. The second contribution of this work is a methodology to use the proposed framework and estimate the impact of reducing circuit design margins on the number of good dies that satisfy the throughput requirement of a real-time streaming application. We show on both synthetic and real applications that the proposed better-than-worst-case design approach can increase the number of good dies by up to 9.6p and 18.8p for designs with and without fixed SRAM and IO blocks, respectively.

...read moreread less

Proceedings Article•DOI•

Distributed power management of real-time applications on a GALS multiprocessor SOC

[...]

Andrew Nelson¹, Kees Goossens¹•Institutions (1)

Eindhoven University of Technology¹

04 Oct 2015

TL;DR: This work contributes its distributed multi-core run-time power-management technique for real-time dataflow applications that uses per-core lookup-tables to select low-power DVFS operating points that meet the application's timing requirement.

...read moreread less

Abstract: It is generally desirable to reduce the power consumption of embedded systems. Dynamic Voltage and Frequency Scaling (DVFS) is a commonly applied technique to achieve power reduction at the cost of computational performance. Multiprocessor System on Chips (MPSoCs) can have multiple voltage and frequency domains, e.g. per-core. When DVFS is applied to real-time applications, the effects must be accounted for in the associated formal timing model. In this work, we contribute our distributed multi-core run-time power-management technique for real-time dataflow applications that uses per-core lookup-tables to select low-power DVFS operating points that meet the application's timing requirement. We describe in detail how timing slack is observed locally at run-time on each core and is used to select a local DVFS operating point that meets the application's timing requirement. We further describe our static off-line formal analysis technique to generate these per-core lookup-tables that link timing slack to low-power DVFS operating points. We provide an experimental analysis of our proposed technique using an H.263 decoder application that is mapped onto an FPGA prototyped hardware platform.

...read moreread less

Dataflow modeling of real-time memory controllers

[...]

Yonghui Li, Orlando Moreira, K.B. Akesson, Kees Goossens

01 Jan 2015

TL;DR: Dynamic Command Scheduling for Real-Time Memory Controllers is presented, which automates the very labor-intensive and therefore time-heavy and expensive process of manually scheduling memory controllers for real-time operations.

...read moreread less

Abstract: Reference [1] Yonghui Li, Benny Akesson, and Kees Goossens. Dynamic Command Scheduling for Real-Time Memory Controllers. In Proc. ECRTS 2014. 1. Real-Time Applications & Multicore Systems Various applications on modern multicore systems  Some of them have real-time requirements, which are caused by the interaction with physical world  Others are non-real-time, but must be responsive Czech Technical University in Prague

...read moreread less