scispace - formally typeset
Search or ask a question
Author

Kgw Kees Goossens

Bio: Kgw Kees Goossens is an academic researcher from Delft University of Technology. The author has contributed to research in topics: Network on a chip & Deadlock prevention algorithms. The author has an hindex of 4, co-authored 5 publications receiving 283 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: This paper analyzes message-dependent deadlock, arising due to protocol interactions between the NoC and the IP modules, and compares the possible solutions and shows that deadlock avoidance, in the presence of higher-level protocols, poses a serious challenge for many current NoC architectures.
Abstract: Networks on chip (NoCs) are an essential component of systems on chip (SoCs) and much research is devoted to deadlock avoidance in NoCs. Prior work focuses on the router network while protocol interactions between NoC and intellectual property (IP) modules are not considered. These interactions introduce message dependencies that affect deadlock properties of the SoC as a whole. Even when NoC and IP dependency graphs are cycle-free in isolation, put together they may still create cycles. Traditionally, SoCs rely solely on request-response protocols. However, emerging SoCs adopt higher-level protocols for cache coherency, slave locking, and peer-to-peer streaming, thereby increasing the complexity in the interaction between the NoC and the IPs. In this paper, we analyze message-dependent deadlock, arising due to protocol interactions between the NoC and the IP modules. We compare the possible solutions and show that deadlock avoidance, in the presence of higher-level protocols, poses a serious challenge for many current NoC architectures. We evaluate the solutions qualitatively, and for a number of designs we quantify the area cost for the two most economical solutions, strict ordering and end-to-end flow control. We show that the latter, which avoids deadlock for all protocols, adds an area and power cost of 4% and 6%, respectively, of a typical hereal NoC instance.

101 citations

Journal ArticleDOI
TL;DR: This paper presents a unified single-objective algorithm, called Unified MApping, Routing, and Slot allocation (UMARS+), which shows how to couple path selection, mapping of cores, and channel time-slot allocation to minimize the network required to meet the constraints of the application.
Abstract: One of the key steps in Network-on-Chip-based design is spatial mapping of cores and routing of the communication between those cores. Known solutions to the mapping and routing problems first map cores onto a topology and then route communication, using separate and possibly conflicting objective functions. In this paper, we present a unified single-objective algorithm, called Unified MApping, Routing, and Slot allocation (UMARS+). As the main contribution, we show how to couple path selection, mapping of cores, and channel time-slot allocation to minimize the network required to meet the constraints of the application. The time-complexity of UMARS+ is low and experimental results indicate a run-time only 20% higher than that of path selection alone. We apply the algorithm to an MPEG decoder System-on-Chip, reducing area by 33%, power dissipation by 35%, and worst-case latency by a factor four over a traditional waterfall approach.

88 citations

Journal ArticleDOI
TL;DR: The authors show what is required from the NoC architecture and demonstrate how to construct an NoC model, with multiple levels of detail, and propose a dataflow model that enables the verification of end-to-end temporal behaviour.
Abstract: A growing number of applications, often with real-time requirements, are integrated on the same system on chip (SoC), in the form of hardware and software intellectual property (IP). To facilitate real-time applications, networks on chip (NoC) guarantee bounds on latency and throughput. These bounds, however, only extend to the network interfaces (NI), between the IP and the NoC. To give performance guarantees on the application level, the buffers in the NIs must be sufficiently large for the particular application. At the same time, it is imperative to minimise the size of the NI buffers, as they are major contributors to the area and power consumption of the NoC. Existing buffer-sizing methods use coarse-grained application models, based on linear traffic bounds or periodic producers and consumers, thus severely limiting their applicability. In this work, the authors propose to capture the behaviour of the NoC and the applications using a dataflow model. This enables one to verify the temporal behaviour and to compute buffer sizes using existing dataflow analysis techniques. The authors show what is required from the NoC architecture and demonstrate how to construct an NoC model, with multiple levels of detail. Using the proposed model, buffer sizes are determined for a range of SoC designs with a run time comparable to existing analytical methods, and results comparable to exhaustive simulation. For an application case study, where existing buffer-sizing methods are not applicable, the proposed model enables the verification of end-to-end temporal behaviour.

68 citations

01 Feb 2001
TL;DR: In this paper, a case study of a rake receiver in combination with a turbo decoder has been done to find out the requirements of a model that makes a trade-off between Quality of Service (QoS) and costs at run time.
Abstract: Mobile handheld devices need to have low energy consumption and should be able to adapt to their environment. To achieve this, a model is needed that makes a trade-off between Quality of Service (QoS) and costs at run-time. A case study of a rake receiver in combination with a turbo decoder has been done to find out the requirements of such a model. Our conclusion is that it is not feasible to express the whole behavior of the system in an analytical way due to a complex external unpredictable time-variant environment. Instead a control system is proposed.

3 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This paper provides a general description of NoC architectures and applications and enumerates several related research problems organized under five main categories: Application characterization, communication paradigm, communication infrastructure, analysis, and solution evaluation.
Abstract: To alleviate the complex communication problems that arise as the number of on-chip components increases, network-on-chip (NoC) architectures have been recently proposed to replace global interconnects. In this paper, we first provide a general description of NoC architectures and applications. Then, we enumerate several related research problems organized under five main categories: Application characterization, communication paradigm, communication infrastructure, analysis, and solution evaluation. Motivation, problem description, proposed approaches, and open issues are discussed for each problem from system, microarchitecture, and circuit perspectives. Finally, we address the interactions among these research problems and put the NoC design process into perspective.

733 citations

Proceedings ArticleDOI
30 Sep 2007
TL;DR: In this article, the authors present a memory controller design that provides a guaranteed minimum bandwidth and a maximum latency bound to the IPs, which is accomplished using a novel two-step approach to predictable SDRAM sharing.
Abstract: Memory requirements of intellectual property components (IP) in contemporary multi-processor systems-on-chip are increasing. Large high-speed external memories, such as DDR2 SDRAMs, are shared between a multitude of IPs to satisfy these requirements at a low cost per bit. However, SDRAMs have highly variable access times that depend on previous requests. This makes it difficult to accurately and analytically determine latencies and the useful bandwidth at design time, and hence to guarantee that hard real-time requirements are met. The main contribution of this paper is a memory controller design that provides a guaranteed minimum bandwidth and a maximum latency bound to the IPs. This is accomplished using a novel two-step approach to predictable SDRAM sharing. First, we define memory access groups, corresponding to precomputed sequences of SDRAM commands, with known efficiency and latency. Second, a predictable arbiter is used to schedule these groups dynamically at run-time, such that an allocated bandwidth and a maximum latency bound is guaranteed to the IPs. The approach is general and covers all generations of SDRAM. We present a modular implementation of our memory controller that is efficiently integrated into the network interface of a network-on-chip. The area of the implementation is cheap, and scales linearly with the number of IPs. An instance with six ports runs at 200 MHz and requires 0.042 mm2 in 0.13μm CMOS technology.

239 citations

Proceedings ArticleDOI
12 Feb 2011
TL;DR: CHIPPER (Cheap-Interconnect Partially Permuting Router), a simplified router microarchitecture that eliminates in-router buffers and the crossbar is proposed, introducing three key insights: that deflection routing port allocation maps naturally to a permutation network within the router; that livelock freedom requires only an implicit token-passing scheme, eliminating expensive age-based priorities.
Abstract: As Chip Multiprocessors (CMPs) scale to tens or hundreds of nodes, the interconnect becomes a significant factor in cost, energy consumption and performance. Recent work has explored many design tradeoffs for networks-on-chip (NoCs) with novel router architectures to reduce hardware cost. In particular, recent work proposes bufferless deflection routing to eliminate router buffers. The high cost of buffers makes this choice potentially appealing, especially for low-to-medium network loads. However, current bufferless designs usually add complexity to control logic. Deflection routing introduces a sequential dependence in port allocation, yielding a slow critical path. Explicit mechanisms are required for livelock freedom due to the non-minimal nature of deflection. Finally, deflection routing can fragment packets, and the reassembly buffers require large worst-case sizing to avoid deadlock, due to the lack of network backpressure. The complexity that arises out of these three problems has discouraged practical adoption of bufferless routing. To counter this, we propose CHIPPER (Cheap-Interconnect Partially Permuting Router), a simplified router microarchitecture that eliminates in-router buffers and the crossbar. We introduce three key insights: first, that deflection routing port allocation maps naturally to a permutation network within the router; second, that livelock freedom requires only an implicit token-passing scheme, eliminating expensive age-based priorities; and finally, that flow control can provide correctness in the absence of network backpressure, avoiding deadlock and allowing cache miss buffers (MSHRs) to be used as reassembly buffers. Using multiprogrammed SPEC CPU2006, server, and desktop application workloads and SPLASH-2 multithreaded workloads, we achieve an average 54.9% network power reduction for 13.6% average performance degradation (multipro-grammed) and 73.4% power reduction for 1.9% slowdown (multithreaded), with minimal degradation and large power savings at low-to-medium load. Finally, we show 36.2% router area reduction relative to buffered routing, with comparable timing.

217 citations

Proceedings ArticleDOI
04 Apr 2005
TL;DR: This paper proposes a new energy-efficient reconfigurable circuit-switched network-on-chip architecture that consumes 3.5 times less energy compared to its packet- Switched equivalent.
Abstract: Network-on-chip (NoC) is an energy-efficient on-chip communication architecture for multi-tile system-on-chip (SoC) architectures. The SoC architecture, including its run-time software, can replace inflexible ASICs for future ambient systems. These ambient systems have to be flexible as well as energy-efficient. To find an energy-efficient solution for the communication network we analyze three wireless applications. Based on their communication requirements we observe that revisiting of the circuit switching techniques is beneficial. In this paper we propose a new energy-efficient reconfigurable circuit-switched network-on-chip. By physically separating the concurrent data streams we reduce the overall energy consumption. The circuit-switched router has been synthesized and analyzed for its power consumption in 0.13 /spl mu/m technology. A 5-port circuit-switched router has an area of 0.05 mm/sup 2/ and runs at 1075 MHz. The proposed architecture consumes 3.5 times less energy compared to its packet-switched equivalent.

204 citations

Proceedings ArticleDOI
06 Mar 2006
TL;DR: This work presents dynamic re-configuration mechanisms that match the NoC configuration to the communication characteristics of each use-case, also accounting for use-cases that can run in parallel.
Abstract: A communication-centric design approach, networks on chips (NoCs), has emerged as the design paradigm for designing a scalable communication infrastructure for future systems on chips (SoCs). As technology advances, the number of applications or use-cases integrated on a single chip increases rapidly. The different use-cases of the SoC have different communication requirements (such as different bandwidth, latency constraints) and traffic patterns. The underlying NoC architecture has to satisfy the constraints of all the use-cases. In this work, we present a methodology to map multiple use-cases onto the NoC architecture, satisfying the constraints of each use-case. We present dynamic re-configuration mechanisms that match the NoC configuration to the communication characteristics of each use-case, also accounting for use-cases that can run in parallel. The methodology is applied to several real and synthetic SoC benchmarks, which result in a large reduction in NoC area (an average of 80%) and power consumption (an average of 54%) compared to traditional design approaches

178 citations