scispace - formally typeset
Search or ask a question
Author

Martti Forsell

Bio: Martti Forsell is an academic researcher from University of Eastern Finland. The author has contributed to research in topics: Shared memory & Thread (computing). The author has an hindex of 8, co-authored 49 publications receiving 1523 citations. Previous affiliations of Martti Forsell include VTT Technical Research Centre of Finland.

Papers
More filters
Proceedings ArticleDOI
07 Aug 2002
TL;DR: A packet switched platform for single chip systems which scales well to an arbitrary number of processor like resources which is the onchip communication infrastructure comprising the physical layer, the data link layer and the network layer of the OSI protocol stack.
Abstract: We propose a packet switched platform for single chip systems which scales well to an arbitrary number of processor like resources. The platform, which we call Network-on-Chip (NOC), includes both the architecture and the design methodology. The NOC architecture is a m/spl times/n mesh of switches and resources are placed on the slots formed by the switches. We assume a direct layout of the 2-D mesh of switches and resources providing physical- and architectural-level design integration. Each switch is connected to one resource and four neighboring switches, and each resource is connected to one switch. A resource can be a processor core, memory, an FPGA, a custom hardware block or any other intellectual property (IP) block, which fits into the available slot and complies with the interface of the NOC. The NOC architecture essentially is the onchip communication infrastructure comprising the physical layer, the data link layer and the network layer of the OSI protocol stack. We define the concept of a region, which occupies an area of any number of resources and switches. This concept allows the NOC to accommodate large resources such as large memory banks, FPGA areas, or special purpose computation resources such as high performance multi-processors. The NOC design methodology consists of two phases. In the first phase a concrete architecture is derived from the general NOC template. The concrete architecture defines the number of switches and shape of the network, the kind and shape of regions and the number and kind of resources. The second phase maps the application onto the concrete architecture to form a concrete product.

1,304 citations

Proceedings ArticleDOI
04 Jan 2003
TL;DR: A novel layered backbone-platform-system (BPS) design methodology for development of network-on-chip based products that combines and extends the distributed, parallel, embedded and platform-based design concepts in order to manage the diversity and complexity of NOC-based systems.
Abstract: Exploitation of silicon capacity will require improvements in design productivity and more scalable system paradigms. Asynchronous message passing networks on chip (NOC) have been proposed as backbones for billion-transistor ASICs. We present a novel layered backbone-platform-system (BPS) design methodology for development of network-on-chip based products. It combines and extends the distributed, parallel, embedded and platform-based design concepts in order to manage the diversity and complexity of NOC-based systems. The reuse of communication principles in various platforms, the reuse of platforms in product differentiation, and system-level decision-support methods are the cornerstones of our methodology. The presented mappability estimation and workload simulations demonstrate the feasibility of such methods.

22 citations

Journal ArticleDOI
TL;DR: This paper considers the idea of true multiport memory that can be used as building block of efficient PRAM-style shared main memory, and shows that at least small size multiport memories look physically feasible and the power of PRAM model can be fully exploited by computer systems with multiport Memories.
Abstract: Parallel Random Access Machine (PRAM) is a popular model for parallel computation that promises easy programmability and great parallel performance, but only if efficient shared main memories can be built. This won't be easy, because the complexity of shared memories leads to difficult technical problems. In this paper we consider the idea of true multiport memory that can be used as building block of efficient PRAM-style shared main memory. Two possible structures of multiport memory chips are presented. We will also give preliminary cost-effectivity and performance analysis of memory systems using proposed multiport RAMs. Results are encouraging: At least small size multiport memories look physically feasible. Also the power of PRAM model can be fully exploited by computer systems with multiport memories.

21 citations

Journal ArticleDOI
TL;DR: The performance of eight general purpose processor architectures representing widely both commercial and scientific processor designs in both single processor and multiprocessor setups is evaluated and it is concluded that there exists no single optimal architecture for general purpose computation.

17 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The research shows that NoC constitutes a unification of current trends of intrachip communication rather than an explicit new alternative.
Abstract: The scaling of microchip technologies has enabled large scale systems-on-chip (SoC). Network-on-chip (NoC) research addresses global communication in SoC, involving (i) a move from computation-centric to communication-centric design and (ii) the implementation of scalable communication structures. This survey presents a perspective on existing NoC research. We define the following abstractions: system, network adapter, network, and link to explain and structure the fundamental concepts. First, research relating to the actual network design is reviewed. Then system level design and modeling are discussed. We also evaluate performance analysis techniques. The research shows that NoC constitutes a unification of current trends of intrachip communication rather than an explicit new alternative.

1,720 citations

Proceedings ArticleDOI
07 Aug 2002
TL;DR: A packet switched platform for single chip systems which scales well to an arbitrary number of processor like resources which is the onchip communication infrastructure comprising the physical layer, the data link layer and the network layer of the OSI protocol stack.
Abstract: We propose a packet switched platform for single chip systems which scales well to an arbitrary number of processor like resources. The platform, which we call Network-on-Chip (NOC), includes both the architecture and the design methodology. The NOC architecture is a m/spl times/n mesh of switches and resources are placed on the slots formed by the switches. We assume a direct layout of the 2-D mesh of switches and resources providing physical- and architectural-level design integration. Each switch is connected to one resource and four neighboring switches, and each resource is connected to one switch. A resource can be a processor core, memory, an FPGA, a custom hardware block or any other intellectual property (IP) block, which fits into the available slot and complies with the interface of the NOC. The NOC architecture essentially is the onchip communication infrastructure comprising the physical layer, the data link layer and the network layer of the OSI protocol stack. We define the concept of a region, which occupies an area of any number of resources and switches. This concept allows the NOC to accommodate large resources such as large memory banks, FPGA areas, or special purpose computation resources such as high performance multi-processors. The NOC design methodology consists of two phases. In the first phase a concrete architecture is derived from the general NOC template. The concrete architecture defines the number of switches and shape of the network, the kind and shape of regions and the number and kind of resources. The second phase maps the application onto the concrete architecture to form a concrete product.

1,304 citations

Journal ArticleDOI
TL;DR: This paper develops a consistent and meaningful evaluation methodology to compare the performance and characteristics of a variety of NoC architectures and explores design trade-offs that characterize the NoC approach and obtains comparative results for a number of common NoC topologies.
Abstract: Multiprocessor system-on-chip (MP-SoC) platforms are emerging as an important trend for SoC design. Power and wire design constraints are forcing the adoption of new design methodologies for system-on-chip (SoC), namely, those that incorporate modularity and explicit parallelism. To enable these MP-SoC platforms, researchers have recently pursued scaleable communication-centric interconnect fabrics, such as networks-on-chip (NoC), which possess many features that are particularly attractive for these. These communication-centric interconnect fabrics are characterized by different trade-offs with regard to latency, throughput, energy dissipation, and silicon area requirements. In this paper, we develop a consistent and meaningful evaluation methodology to compare the performance and characteristics of a variety of NoC architectures. We also explore design trade-offs that characterize the NoC approach and obtain comparative results for a number of common NoC topologies. To the best of our knowledge, this is the first effort in characterizing different NoC architectures with respect to their performance and design trade-offs. To further illustrate our evaluation methodology, we map a typical multiprocessing platform to different NoC interconnect architectures and show how the system performance is affected by these design trade-offs.

921 citations

Proceedings ArticleDOI
16 Feb 2004
TL;DR: NMAP is presented, a fast algorithm that maps the cores onto a mesh NoC architecture under bandwidth constraints, minimizing the average communication delay, and the NMAP algorithm is presented for both single minimum-path routing and split-traffic routing.
Abstract: We address the design of complex monolithic systems, where processing cores generate and consume a varying and large amount of data, thus bringing the communication links to the edge of congestion. Typical applications are in the area of multi-media processing. We consider a mesh-based networks on chip (NoC) architecture, and we explore the assignment of cores to mesh cross-points so that the traffic on links satisfies bandwidth constraints. A single-path deterministic routing between the cores places high bandwidth demands on the links. The bandwidth requirements can be significantly reduced by splitting the traffic between the cores across multiple paths. In this paper, we present NMAP, a fast algorithm that maps the cores onto a mesh NoC architecture under bandwidth constraints, minimizing the average communication delay. The NMAP algorithm is presented for both single minimum-path routing and split-traffic routing. The algorithm is applied to a benchmark DSP design and the resulting NoC is built and simulated at cycle accurate level in SystemC using macros from the /spl times/pipes library. Also, experiments with six video processing applications show significant savings in bandwidth and communication cost for NMAP algorithm when compared to existing algorithms.

714 citations

Journal ArticleDOI
TL;DR: An algorithm which automatically maps a given set of intellectual property onto a generic regular network-on-chip (NoC) architecture and constructs a deadlock-free deterministic routing function such that the total communication energy is minimized.
Abstract: In this paper, we present an algorithm which automatically maps a given set of intellectual property onto a generic regular network-on-chip (NoC) architecture and constructs a deadlock-free deterministic routing function such that the total communication energy is minimized. At the same time, the performance of the resulting communication system is guaranteed to satisfy the specified design constraints through bandwidth reservation. As the main theoretical contribution, we first formulate the problem of energy- and performance-aware mapping in a topological sense, and show how the routing flexibility can be exploited to expand the solution space and improve the solution quality. An efficient branch-and-bound algorithm is then proposed to solve this problem. Experimental results show that the proposed algorithm is very fast, and significant communication energy savings can be achieved. For instance, for a complex video/audio application, 51.7% communication energy savings have been observed, on average, compared to an ad hoc implementation.

662 citations