scispace - formally typeset
Search or ask a question

Showing papers by "Kees Goossens published in 2015"


Journal ArticleDOI

[...]

01 Oct 2015
TL;DR: Within the T-CREST project the authors propose novel solutions for time-predictable multi-core architectures that are optimized for the WCET instead of the average-case execution time.
Abstract: Real-time systems need time-predictable platforms to allow static analysis of the worst-case execution time (WCET). Standard multi-core processors are optimized for the average case and are hardly analyzable. Within the T-CREST project we propose novel solutions for time-predictable multi-core architectures that are optimized for the WCET instead of the average-case execution time. The resulting time-predictable resources (processors, interconnect, memory arbiter, and memory controller) and tools (compiler, WCET analysis) are designed to ease WCET analysis and to optimize WCET performance. Compared to other processors the WCET performance is outstanding.The T-CREST platform is evaluated with two industrial use cases. An application from the avionic domain demonstrates that tasks executing on different cores do not interfere with respect to their WCET. A signal processing application from the railway domain shows that the WCET can be reduced for computation-intensive tasks when distributing the tasks on several cores and using the network-on-chip for communication. With three cores the WCET is improved by a factor of 1.8 and with 15 cores by a factor of 5.7.The T-CREST project is the result of a collaborative research and development project executed by eight partners from academia and industry. The European Commission funded T-CREST.

166 citations


Proceedings ArticleDOI
19 Oct 2015
TL;DR: Experimental results show that ETSCH improves reliability of network communication, compared to basic TSCH and a more advanced mechanism ATSCH, which provides higher packet reception ratios and reduces the maximum length of burst packet losses.
Abstract: Cross-technology interference on the license-free ISM bands has a major negative effect on the performance of Wireless Sensor Networks (WSNs). Channel hopping has been adopted in the Time-Slotted Channel Hopping (TSCH) mode of IEEE 802.15.4e to eliminate blocking of wireless links caused by external interference on some frequency channels. This paper proposes an Enhanced version of the TSCH protocol (ETSCH) which restricts the used channels for hopping to the channels that are measured to be of good quality. The quality of channels is extracted using a new Non-Intrusive Channel-quality Estimation (NICE) technique by performing energy detections in selected idle periods every timeslot. NICE enables ETSCH to follow dynamic interference well, while it does not reduce throughput of the network. It also does not change the protocol, and does not require non-standard hardware. ETSCH uses a small Enhanced Beacon hopping Sequence List (EBSL) to broadcast periodic Enhanced Beacons (EB) in the network to synchronize nodes at the start of timeslots. Experimental results show that ETSCH improves reliability of network communication, compared to basic TSCH and a more advanced mechanism ATSCH. It provides higher packet reception ratios and reduces the maximum length of burst packet losses.

36 citations


Journal ArticleDOI
01 Oct 2015
TL;DR: A dataflow formalisation is described to independently model real-time applications executing on the CompSOC platform, including new models of the entire software stack, and correctly predicts trends, such as application speed-up when mapping an application to more processors.
Abstract: Embedded systems often contain multiple applications, some of which have real-time requirements and whose performance must be guaranteed. To efficiently execute applications, modern embedded systems contain Globally Asynchronous Locally Synchronous (GALS) processors, network on chip, DRAM and SRAM memories, and system software, e.g. microkernel and communication libraries. In this paper we describe a dataflow formalisation to independently model real-time applications executing on the CompSOC platform, including new models of the entire software stack. We compare the guaranteed application throughput as computed by our tool flow to the throughput measured on an FPGA implementation of the platform, for both synthetic and real H.263 applications. The dataflow formalisation is composable (i.e. independent for each real-time application), conservative, models the impact of GALS on performance, and correctly predicts trends, such as application speed-up when mapping an application to more processors.

28 citations


Proceedings ArticleDOI
09 Mar 2015
TL;DR: A novel generic, scalable and globally arbitrated memory tree (GSMT) architecture for distributed implementation of several predictable arbitration policies and compares the performance of GSMT with different centralized implementations by synthesizing the designs in a 40 nm process.
Abstract: Predictable arbitration policies, such as Time Division Multiplexing (TDM) and Round-Robin (RR), are used to provide firm real-time guarantees to clients sharing a single memory resource (DRAM) between the multiple memory clients in multi-core real-time systems. Traditional centralized implementations of predictable arbitration policies in a shared memory bus or interconnect are not scalable in terms of the number of clients. On the other hand, existing distributed memory interconnects are either globally arbitrated, which do not offer diverse service according to the heterogeneous client requirements, or locally arbitrated, which suffers from larger area, power and latency overhead. Moreover, selecting the right arbitration policy according to the diverse and dynamic client requirements in reusable platforms requires a generic re-configurable architecture supporting different arbitration policies. The main contributions in this paper are: (1) We propose a novel generic, scalable and globally arbitrated memory tree (GSMT) architecture for distributed implementation of several predictable arbitration policies. (2) We present an RTL-level implementation of Accounting and Priority assignment (APA) logic of GSMT that can be configured with five different arbitration policies typically used for shared memory access in real-time systems. (3) We compare the performance of GSMT with different centralized implementations by synthesizing the designs in a 40 nm process. Our experiments show that with 64 clients GSMT can run up to four times faster than traditional architectures and have over 51% and 37% reduction in area and power consumption, respectively.

20 citations


Journal ArticleDOI
TL;DR: A configurable real-time multichannel memory controller architecture with a novel method for logical-to-physical address translation and two design-time methods to map memory clients to the memory channels, one an optimal algorithm based on an integer programming formulation of the mapping problem, and the other a fast heuristic algorithm.
Abstract: Ever-increasing demands for main memory bandwidth and memory speed/power tradeoff led to the introduction of memories with multiple memory channels, such as Wide IO DRAM. Efficient utilization of a multichannel memory as a shared resource in multiprocessor real-time systems depends on mapping of the memory clients to the memory channels according to their requirements on latency, bandwidth, communication, and memory capacity. However, there is currently no real-time memory controller for multichannel memories, and there is no methodology to optimally configure multichannel memories in real-time systems. As a first work toward this direction, we present two main contributions in this article: (1) a configurable real-time multichannel memory controller architecture with a novel method for logical-to-physical address translation and (2) two design-time methods to map memory clients to the memory channels, one an optimal algorithm based on an integer programming formulation of the mapping problem, and the other a fast heuristic algorithm. We demonstrate the real-time guarantees on bandwidth and latency provided by our multichannel memory controller architecture by experimental evaluation. Furthermore, we compare the performance of the mapping problem formulation in a solver and the heuristic algorithm against two existing mapping algorithms in terms of computation time and mapping success ratio. We show that an optimal solution can be found in 2 hours using the solver and in less than 1 second with less than 7p mapping failure using the heuristic for realistically sized problems. Finally, we demonstrate configuring a Wide IO DRAM in a high-definition (HD) video and graphics processing system to emphasize the practical applicability and effectiveness of this work.

19 citations


Proceedings ArticleDOI
26 Aug 2015
TL;DR: The proposed design flow implements the feedback loops in a data-driven fashion leading to time-varying sampling periods with short average sampling period, and outperforms traditional control design flows in terms of quality of control (QoC).
Abstract: In this work, we propose a design flow for efficient implementation of embedded feedback control systems targeted for multi-core platforms. We consider a composable tile-based architecture as an implementation platform and realise the proposed design flow onto one instance of this architecture. The proposed design flow implements the feedback loops in a data-driven fashion leading to time-varying sampling periods with short average sampling period. Our design flow is composed of two phases: (i) representing the timing behaviour imposed by the platform by a finite and known set of sampling periods, which is achieved exploiting the composability of the platform, and (ii) a linear matrix inequality (LMI) based platform-aware control algorithm that explicitly takes the derived platform timing characteristics and the shorter average sampling period into account. Our results show that the platform-aware implementation outperforms traditional control design flows (i.e., almost 2 times) in terms of quality of control (QoC).

15 citations


Journal ArticleDOI
TL;DR: This paper proposes a software architecture to dynamically create and manage partitions and a method for compostable dynamic loading and implements the software architecture for a SoC prototype on an FPGA board and demonstrates its composability and predictability properties.

13 citations


Proceedings ArticleDOI
19 Jul 2015
TL;DR: The FlexTiles Platform aims to create a self-adaptive heterogeneous many-core architecture which is able to dynamically manage load balancing, power consumption and faulty modules to make the architecture efficient and to keep programming effort low.
Abstract: The FlexTiles Platform has been developed within a Seventh Framework Programme project which is co-funded by the European Union with ten participants of five countries. It aims to create a self-adaptive heterogeneous many-core architecture which is able to dynamically manage load balancing, power consumption and faulty modules. Its focus is to make the architecture efficient and to keep programming effort low. Therefore, the concept contains a dedicated automated tool-flow for creating both the hardware and the software, a simulation platform that can execute the same binaries as the FPGA prototype and a virtualization layer to manage the final heterogeneous many-core architecture for run-time adaptability. With this approach software development productivity can be increased and thus, the time-to-market and development costs can be decreased. In this paper we present the FlexTiles Development Platform with a many-core architecture demonstration. The steps to implement, validate and integrate two use-cases are discussed.

9 citations


Proceedings ArticleDOI
26 Aug 2015
TL;DR: An FSM-SADF programming model is explored, and three different alternatives for scenario switching are proposed, showing that design choices offer interesting trade-offs between run-time cost and resource budgets.
Abstract: The FSM-SADF model of computation allows to find a tight bound on the throughput of firm real-time applications by capturing dynamic variations in scenarios. We explore an FSM-SADF programming model, and propose three different alternatives for scenario switching. The best candidate for our CompSOC platform was implemented, and experiments confirm that the tight throughput bound results in a reduced resource budget. This comes at the cost of a predictable overhead at run-time as well as increased communication and memory budgets. We show that design choices offer interesting trade-offs between run-time cost and resource budgets.

5 citations


Proceedings ArticleDOI
08 Oct 2015
TL;DR: A new mode-controlled data-flow (MCDF) model is proposed to capture the command scheduling dependencies of memory transactions with variable sizes and outperforms state-of-the-art analysis approaches and improves the WCBW by 22% without known transaction sequences.
Abstract: SDRAM is a shared resource in modern multi-core platforms executing multiple real-time (RT) streaming applications. It is crucial to analyze the minimum guaranteed SDRAM bandwidth to ensure that the requirements of the RT streaming applications are always satisfied. However, deriving the worstcase bandwidth (WCBW) is challenging because of the diverse memory traffic with variable transaction sizes. In fact, existing RT memory controllers either do not efficiently support variable transaction sizes or do not provide an analysis to tightly bound WCBW in their presence. We propose a new mode-controlled data-flow (MCDF) model to capture the command scheduling dependencies of memory transactions with variable sizes. The WCBW can be obtained by employing an existing tool to automatically analyze our MCDF model rather than using existing static analysis techniques, which in contrast to our model are hard to extend to cover different RT memory controllers. Moreover, the MCDF analysis can exploit static information about known transaction sequences provided by the applications or by the memory arbiter. Experimental results show that 77% improvement of WCBW can be achieved compared to the case without known transaction sequences. In addition, the results demonstrate that the proposed MCDF model outperforms state-of-the-art analysis approaches and improves the WCBW by 22% without known transaction sequences.

4 citations


Proceedings ArticleDOI
01 Aug 2015
TL;DR: This paper describes how the CompSOC platform offers system integrators and application writers the capability to implement multiple scenarios, and proposes a solution to address this need.
Abstract: Systems on Chip (SOC) are powerful multiprocessor systems capable of running multiple independent applications, often with both real-time and non-real-time requirements. Scenarios exist at two levels: first, combinations of independent applications, and second, different states of a single application. Scenarios are dynamic since applications can be started and stopped independently, and a single application's behaviour can depend on its inputs, on different stages in processing, and so on. In this paper we describe how the CompSOC platform offers system integrators and application writers the capability to implement multiple scenarios.

Journal ArticleDOI
TL;DR: It is shown on both synthetic and real applications that the proposed better-than-worst-case design approach can increase the number of good dies by up to 9.6% and 18.8% for designs with and without fixed SRAM and IO blocks, respectively.
Abstract: Scaling CMOS technology into nanometer feature-size nodes has made it practically impossible to precisely control the manufacturing process. This results in variation in the speed and power consumption of a circuit. As a solution to process-induced variations, circuits are conventionally implemented with conservative design margins to guarantee the target frequency of each hardware component in manufactured multiprocessor chips. This approach, referred to as worst-case design, results in a considerable circuit upsizing, in turn reducing the number of dies on a wafer.This work deals with the design of real-time systems for streaming applications (e.g., video decoders) constrained by a throughput requirement (e.g., frames per second) with reduced design margins, referred to as better-than-worst-case design. To this end, the first contribution of this work is a complete modeling framework that captures a streaming application mapped to an NoC-based multiprocessor system with voltage-frequency islands under process-induced die-to-die and within-die frequency variations. The framework is used to analyze the impact of variations in the frequency of hardware components on application throughput at the system level. The second contribution of this work is a methodology to use the proposed framework and estimate the impact of reducing circuit design margins on the number of good dies that satisfy the throughput requirement of a real-time streaming application. We show on both synthetic and real applications that the proposed better-than-worst-case design approach can increase the number of good dies by up to 9.6p and 18.8p for designs with and without fixed SRAM and IO blocks, respectively.

Proceedings ArticleDOI
04 Oct 2015
TL;DR: This work contributes its distributed multi-core run-time power-management technique for real-time dataflow applications that uses per-core lookup-tables to select low-power DVFS operating points that meet the application's timing requirement.
Abstract: It is generally desirable to reduce the power consumption of embedded systems. Dynamic Voltage and Frequency Scaling (DVFS) is a commonly applied technique to achieve power reduction at the cost of computational performance. Multiprocessor System on Chips (MPSoCs) can have multiple voltage and frequency domains, e.g. per-core. When DVFS is applied to real-time applications, the effects must be accounted for in the associated formal timing model. In this work, we contribute our distributed multi-core run-time power-management technique for real-time dataflow applications that uses per-core lookup-tables to select low-power DVFS operating points that meet the application's timing requirement. We describe in detail how timing slack is observed locally at run-time on each core and is used to select a local DVFS operating point that meets the application's timing requirement. We further describe our static off-line formal analysis technique to generate these per-core lookup-tables that link timing slack to low-power DVFS operating points. We provide an experimental analysis of our proposed technique using an H.263 decoder application that is mapped onto an FPGA prototyped hardware platform.

01 Jan 2015
TL;DR: Dynamic Command Scheduling for Real-Time Memory Controllers is presented, which automates the very labor-intensive and therefore time-heavy and expensive process of manually scheduling memory controllers for real-time operations.
Abstract: Reference [1] Yonghui Li, Benny Akesson, and Kees Goossens. Dynamic Command Scheduling for Real-Time Memory Controllers. In Proc. ECRTS 2014. 1. Real-Time Applications & Multicore Systems Various applications on modern multicore systems  Some of them have real-time requirements, which are caused by the interaction with physical world  Others are non-real-time, but must be responsive Czech Technical University in Prague