scispace - formally typeset
Open AccessJournal ArticleDOI

NoC synthesis flow for customized domain specific multiprocessor systems-on-chip

TLDR
This work illustrates a complete synthesis flow, called Netchip, for customized NoC architectures, that partitions the development work into major steps (topology mapping, selection, and generation) and provides proper tools for their automatic execution (SUNMAP, xpipescompiler).
Abstract
The growing complexity of customizable single-chip multiprocessors is requiring communication resources that can only be provided by a highly-scalable communication infrastructure. This trend is exemplified by the growing number of network-on-chip (NoC) architectures that have been proposed recently for system-on-chip (SoC) integration. Developing NoC-based systems tailored to a particular application domain is crucial for achieving high-performance, energy-efficient customized solutions. The effectiveness of this approach largely depends on the availability of an ad hoc design methodology that, starting from a high-level application specification, derives an optimized NoC configuration with respect to different design objectives and instantiates the selected application specific on-chip micronetwork. Automatic execution of these design steps is highly desirable to increase SoC design productivity. This work illustrates a complete synthesis flow, called Netchip, for customized NoC architectures, that partitions the development work into major steps (topology mapping, selection, and generation) and provides proper tools for their automatic execution (SUNMAP, xpipescompiler). The entire flow leverages the flexibility of a fully reusable and scalable network components library called xpipes, consisting of highly-parameterizable network building blocks (network interface, switches, switch-to-switch links) that are design-time tunable and composable to achieve arbitrary topologies and customized domain-specific NoC architectures. Several experimental case studies are presented In the work, showing the powerful design space exploration capabilities of the proposed methodology and tools.

read more

Content maybe subject to copyright    Report

NoC Synthesis Flow for Customized Domain
Specific Multiprocessor Systems-on-Chip
Davide Bertozzi, Antoine Jalabert, Srinivasan Murali, Student Member, IEEE,
Rutuparna Tamhankar, Student Member, IEEE, Stergios Stergiou, Student Member, IEEE,
Luca Benini, Member, IEEE, and Giovanni De Micheli, Fellow, IEEE
Abstract—The growing complexity of customizable single-chip multiprocessors is requiring communication resources that can only be
provided by a highly-scalable communication infrastructure. This trend is exemplified by the growing number of Network-on-Chip
(NoC) architectures that have been proposed recently for System-on-Chip (SoC) integration. Developing NoC-based systems tailored
to a particular application domain is crucial for achieving high-performance, energy-efficient customized solutions. The effectiveness of
this approach largely depends on the availability of an ad hoc design methodology that, starting from a high-level application
specification, derives an optimized NoC configuration with respect to different design objectives and instantiates the selected
application specific on-chip micronetwork. Automatic execution of these design steps is highly desirable to increase SoC design
productivity. This paper illustrates a complete synthesis flow, called NetChip, for customized NoC architectures, that partitions the
development work into major steps (topology mapping, selection, and generation) and provides proper tools for their automatic
execution (SUNMAP, pipesCompiler). The entire flow leverages the flexibility of a fully reusable and scalable network components
library called pipes, consisting of highly-parameterizable network building blocks (network interface, switches, switch-to-switch
links) that are design-time tunable and composable to achieve arbitrary topologies and customized domain-specific NoC architectures.
Several experimental case studies are presented in the paper, showing the powerful design space exploration capabilities of the
proposed methodology and tools.
Index Terms—Systems-on-chip, networks on chip, synthesis, mapping, architecture.
æ
1INTRODUCTION
I
N contrast to past projections, today the introduction of
new technology so lutions is increasingly application
driven. As an example, let us consider ambient intelligence,
which is regarded as the new paradigm for consumer
electronics. Systems designed for ambient intelligence will
be based on high-speed digital signal processing, with
computational loads ranging from 10 MOPS for lightweight
audio processing, 3 GOPS for video processing, 20 GOPS for
multilingual conversation interfaces, and up to 1 TOPS for
synthetic video generation [4]. This computational chal-
lenge will have to be addressed at manageable power levels
and affordable costs, and a single processor will not suffice,
thus driving the development of increasingly more complex
Multi-Processor Systems-on-Chip (MPSoCs).
SoCs represent high-complexity, high-value semicon-
ductor products that incorporate building blocks from
multiple sources (either in-house made or externally
supplied), such as general-purpose fully programmable
processors, coprocessors, DSPs, dedicated hardware accel-
erators, memory blocks, I/O blocks, etc. Even though
commercial products currently exhibit only a few integrated
cores (e.g., NEC’s new TCP/IP offload engine is powered
by 10 Tensilica Xtensa Processor Cores [42]), in the next few
years technology will support the integration of thousands
of cores, making a large computational power available.
Full exploitation of the increased level of SoC integration
requires new paradigms and significant improvements of
design productivity, as current system architectures and
design styles do not scale up to such dimensions and
complexities. A relevant examp le regards the system
architecture, whose paradigm is progressively shifting from
computation-centric to communication-centric. In fact,
MPSoC performance will be increasingly determined by
the ability of the communication infrastructure to efficiently
accommodate the communication needs of the integrated
computation resources. Traditional state-of-the-art shared
busses cannot meet the scalability requirements of complex
MPSoCs due to the serialization of bus access requests, and
turn out to be also energy-inefficient due to the broadcast
communication paradigm.
A scalable communication architecture that supports the
trend of SoC integration consists of an on-chip packet-
switched micronetwork of interconnects, generally known as
Network-on-Chip (NoC) [2], [21], [34]. The scalable and
modular nature of NoCs and their support for efficient on-
chip communication potentially leads to NoC-based multi-
processor systems characterized by high structural complex-
ity and functional diversity. It is observed in [14] that NoC-
based systems are economically feasible if they can be used in
several product variants, and if the design can be reused in
different application areas. On the other hand, successful
products must provide good performance characteristics,
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 16, NO. 2, FEBRUARY 2005 113
. D. Bertozzi and L. Benini are with DEIS, University of Bologna, Viale
Risorgimento 2, 40136, Bologna, Italy.
E-mail: {dbertozzi, lbenini}@deis.unibo.it.
. A. Jalabert is with CEA-LETI, France. E-mail: antoine.jalabert@cea.fr.
. S. Murali, R. Tamhankar, S. Stergiou, and G. De Micheli are with the
Department of Electrical Engineering, Gates Computer Science Building,
Room 330, 353 Serra Mall, Stanford University, Stanford, CA 94305.
E-mail: {smurali, rutu, utopcell, nanni}@stanford.edu.
Manuscript received 30 Jan. 2004; revised 29 June 2004; accepted 21 July
2004; published online 20 Dec. 2004.
For information on obtaining reprints of this article, please send e-mail to:
tpds@computer.org, and reference IEEECS Log Number TPDSSI-0035-0104.
1045-9219/05/$20.00 ß 2005 IEEE Published by the IEEE Computer Society

thus requiring dedicated solutions that are tailored to specific
needs. As a consequence, the challenge lies in the capability to
design hardware-optimized, customizable computation plat-
forms for each application domain [9].
Hardware optimization can be achieved by facilitating
the integration of domain-specific computation resources in
a plug-and-play design style. Standard interface sockets
such as Virtual Component Interface (VCI) [43] and Open Core
Protocol (OCP) [44] have been developed for this purpose
and support the use of a common NoC as the basis for
system integration. A relevant task of these interfaces is to
make the NoC adaptive to the different features of the
integrated cores (e.g., data and address bus width).
NoC architectures are pushing the evolution of traditional
IC design methodologies in order to more effectively deal
with functional diversity and complexity. At the application
level, the key design challenge i s to expose task-level
parallelism and to formally capture concurrent communica-
tion in models of computation [14]. Then, high-level con-
current tasks have to be mapped to the underlying
communication and computation resources. At this level,
an abstract model of the hardware architecture is usually
exposed to the mapping t ool, so that area and power
estimates can be given in the early design stage, and different
objective functions (e.g., minimization of communication
energy) can be considered to evaluate the feasibility of
alternative mappings. For NoC-based MPSoCs, a critical step
in communication mapping is the network topology selection
for its significant impact on overall system performance,
which is increasingly communication-dominated.
Although a lot of research efforts are being devoted to
improving individual design activities, there are very few
complete NoC design methodologies and CAD tools.
Setting up a fully automated synthesis framework for NoCs
is a nontrivial task, particularly for the case of application
specific MPSoCs, where a set of heterogeneous computing
and storage resources have to be interconnected to each
other by means of a custom-tailored communication net-
work. This translates into the need to provide design time
instantiation of different network schemes and topologies,
tailored to the specific application domain.
A library-based approach to NoC design could be an
effective solution [12], [24], wherein predesigned soft
macros are composed at instantiation time to build arbitrary
topologies. However, the full exploitation of a customizable
network topology requires an ad hoc design methodology
spanning different levels of abstractions (from application
specification to physical implementation) and deriving the
most efficient NoC configuration for a given application
domain.
The design methodology has to partition the design
problem into manageable tasks and to define the tools and
practices for those tasks. In this paper, we propose a NoC
synthesis flow, called NetChip, for designing domain-
specific NoCs and automating most of the complex and
time-intensive design steps. Significantly, NetChip pro-
vides design support also for regular network topologies
and, therefore, lends itself to the implementation of both
homogeneous and heterogeneous system interconnects.
NetChip assumes that the application has already been
mapped onto cores by using preexisting tools (such as [15])
and the resulting cores together with their communication
requirements represent the inputs to our NoC synthesis flow.
The tool-assisted design and generation of a customized
NoC-based communication architecture is the ultimate goal
of NetChip, and is achieved by means of three major design
activities: topology mapping, topology selection, and topology
generation. NetChip leverages two tools: SUNMAP, which
performs the network topology mapping and selection func-
tions, and pipesCompiler, which performs the topology
generation function.
SUNMAP produces a mapping of cores onto various NoC
topologies that are defined in a topology library. The
mappings are optimized for the chosen design objective
(such as minimizing area, power or hop delay) and satisfy
the design constraints (such as area or bandwidth con-
straints). SUNMAP uses floorplanning information early in
the mapping process to determine the area-power estimates
of a mapping and to produce feasible mappings (satisfying
the design constraints). The tool supports various routing
functions (dimension ordered, minimum-path, traffic split-
ting across minimum-paths, traffic splitting across all paths)
and chooses the mapping onto the best topology from the
library of available ones.
A design file describing the chosen topology is input to
the pipesCompiler, which automatically generates the
SystemC description of the network components (switches,
links, and network interfaces) and their interconnection
with the cores. A custom hand-mapped topology specifica-
tion can also be accepted by the NoC synthesizer, and the
network components with the selected configuration can be
generated accordingly. The resulting SystemC code for the
whole design can be simulated at the cycle-accurate and
signal accur ate level. The pipesCompiler uses the
pipes library, which consists of highly parameterizable
network building blocks that can be tuned and composed at
design time to generate the c hosen topology. Thus,
NetChip automates NoC mapping, selection, and genera-
tion functions of a design, thereby bridging an important
design gap in building NoCs.
The rest of the paper is organized as follows: In the next
section, we present the previous works in this area. In
Section 3, we present the design methodology of NetChip.
In Sections 4 and 5, we present the SUNMAP tool and the
area-power models used in the tool. In Section 6, we present
the architecture of networks components defined in the
pipes library. We present the pipesCompiler in
Section 7. The NetChip design flow is used to model
several video and network applications. The communica-
tion pattern in these applications differ, thereby requiring
various topologies for different applications. These are
presented in Sections 8.1 and 8.2. The rich design space
exploration capabilities of NetChip is shown in Section 8.3.
The design flow can also be used to model custom hand-
mapped topologies and is explained in Section 8.4. In
Section 8.5, we model a DSP Filter application and generate
the SystemC files of the chosen topology. The resulting
design is simulated at the cycle-accurate level and the
simulations are checked for functional and timing correct-
ness, validating the output of our tools.
2PREVIOUS WORK
The most advanced state-of-the-art SoC communication
architectures represent evolutionary solutions with respect
to shared busses. Sonics MicroNetwork [36] is a TDMA-based
bus which can easily adapt to the data-word width, burst
attributes, interrupt schemes, and other critical parameters of
the integrated cores, while providing very high bandwidth
utilization. STBUS interconnect is a h igh performance
114 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 16, NO. 2, FEBRUARY 2005

communication infrastructure that allows to instantiate
shared busses as well as more advanced topologies such as
partial or full crossbars. Although evolutionary from a
topology viewpoint, these solutions can rely on advanced
and highly automated design methodologies for the imple-
mentation of generic communication subsystems, allowing
designers to rapidly assemble, synthesize, and verify their
SoCs using the MicroNetwork or the STBUS interconnect as
integration platforms.
However, the early works in [2], [34] pointed out the
need for more scalable architectures for on-chip commu-
nication and, therefore, to progressively replace shared
busses with on-chip networks. Many NoC architectures
have therefore been proposed in the open literature so far,
but in most cases, the design methodologies and tools are
still in the early stage.
One of the earliest contributions in this area is the Maia
heterogeneous signal processing architecture, proposed by
Zhang et al., based on a hierarchical mesh network [10].
Unfortunately, Maia’s interconnect is fully instance-specific.
Furthermore, routing is static at configuration time and
communication is based on circuit switching, as opposed to
packet switching. In this direction, Dally and Lacy sketch
thearchitectureofaVLSImulticomputerusing2009
technology [35]. A chip with 64 processor-memory tiles is
envisioned. Communication is based on packet switching.
This seminal work draws upon past experiences in
designing parallel computers and reconfigurable architec-
tures (FPGAs and their evolutions) [30], [31], [32].
Most proposed NoC platforms are packet switched and
exhibit regular structure. An example is a mesh intercon-
nection, which can rely on a simple layout and the switch
independence on the network size. The NOSTRUM net-
work described in [5] takes this approach: The platform
includes both a mesh topology and the relative design
methodology, wherein a concrete architecture is derived
from a general NoC template, then application mapping
follows.
The Scalable Programmable Integrated Network (SPIN)
described in [3] is another regular, fat-tree-based network
architecture. It adopts cut-through switching to minimize
message latency and storage requirements in the design of
network switches. The Linkoeping SoCBUS [39] is a two-
dimensional mesh network which uses a packet connected
circuit (PCC) to set up routes through the network: A packet is
switched through the network locking the circuit as it goes.
This notion of virtual circuit leads to deterministic commu-
nication behavior but restricts routing flexibility for the rest of
the communication traffic.
In [8], the use of octagon communication topology for
network processors is presented. Instead, the implementa-
tion of a star-connected on-chip network supporting plesio-
chronous communication among system components is
described in [13].
The Aethereal NoC design framework presented in [7] aims
at providing a complete infrastructure for developing
heterogeneous NoC with end-to-end quality of service
guarantees. The network supports guaranteed throughput
(GT) for real-time applications and best effort (BE) traffic for
timing unconstrained applications. Support for heteroge-
neous architectures requires highly configurable network
building blocks, customizable at instantiation time for a
specific application domain. For instance, the Proteo NoC [12]
consists of a small library of predefined, parameterized
components that allow the implementation of a large range of
different topologies, protocols and configurations. pipes
interconnect [24] and its synthesizer pipesCompiler [25]
push this approach to the limit, by instantiating an applica-
tion specific NoC from a library of composable soft macros
(network interface, link, and switch). The components are
highly parameterizable and provide reliable and latency
insensitive operation. They represent the core of the NoC
synthesis flow illustrated in this paper.
In [11], a hierarchical approach for designing on-chip
networks was presented to help designers compare different
design options. Design methodologies for building irregular
networks have been proposed in [18], [20]. Pinto et al. [18]
presents a heuristic for the constraint-driven communication
synthesis of on-chip communication networks, while [20]
describes a design methodology for finding minimal topol-
ogies that support low contention or contention-free com-
munication for known communication patterns. In [19],
memory opti mization in s ingle chip network f abrics is
explored.
The problem of mapping cores onto NoC architectures is
addressed in [22], [23], [26], [27]. In [22], a branch-and-
bound algorithm is used to map cores onto a mesh-based
architecture with the objective of minimizing energy and
satisfying the bandwidth constraints of the NoC. A simple
dimension-ordered routing is assumed in the work. In [23],
the authors extend the above work for other deadlock free
minimal path routing algorithms. In [26], fast algorithms for
mesh NoC architectures under different routing functions
(minimum path, split-traffic) and delay/bandwidth con-
straints are presented.
The design methodology and tools presented in this
paper aim at providing MPSoC designers with a framework
for the rapid selection and synthesis of application-specific
NoC architectures. While still allowing the comparison and
generation of regular network topologies, our NoC design
framework supports the synthesis of customized irregular
topologies, and bridges a gap in a largely unexplored
research area.
3DESIGN FLOW OF NETCHIP
The design flow of NetChip is presented in Fig. 1a. The
application is mapped onto cores during the hardware/
software codesign phase using existing tools such as [15].
By means of static analysis or simulation, it is possible to
determine the average rate of data transfer between the
cores. The resulting cores and communication demands
between them is represented by a graph, called core graph,
and is the input to our tool. NetChip has three phases of
operation: topology mapping phase, topology selection phase,
and topology generation phase. NetChip in-turn has two tools
built into it: SUNMAP which performs the topology mapping
and selection phases and the pipesCompiler which
generates the selected topology.
In the topology mapping phase, NetChip takes as inputs:
. the core graph with communication among cores
annotated as edge weights,
. the design objective function that needs to be
optimized, and
. the design constraints that are to be satisfied by the
mapping.
Netchip has a Graphical User Interface (GUI) designed in
TCL/TK for entering the inputs. A snapshot of the GUI is
BERTOZZI ET AL.: NOC SYNTHESIS FLOW FOR CUSTOMIZED DOMAIN SPECIFIC MULTIPROCESSOR SYSTEMS-ON-CHIP 115

presented in Fig. 1b. The input core graph is then mapped
onto various standard topologies (mesh, torus, hypercube,
Clos, and butterfly) defined in the topology library. The
approach presented here is general and other topologies
(such as the star network or the octagon network [13], [8]) can
be easily added to the library. Netchip explores various
design objectives such as minimizing average hop delay,
area, and power dissipation. The tool also supports
different routing functions: dimension-ordered, minimum path,
traffic splitting across minimum paths, and traffic splitting
across all paths. For each mapping, the bandwidth and area
constraints are evaluated, so that only feasible mappings are
chosen. The area-power models and floorplanner are built
into NetChip, so that area-power estimates can be
incorporated early in the mapping process. For a chosen
design objective and routing function, the best feasible
mappings onto various topologies are obtained.
In the topology selection phase, the various topologies (with
mappings produced from the mapping phase) are evaluated
for several design objectives and the best topology for the
application is chosen. The design file describing the selected
topology and routing files describing the routes (or paths) to
be taken (which depends on the chosen routing function) are
automatically generated. The SUNMAP tool which incorpo-
rates these two phases is explained in Section 4.
In the topology generation phase, NetChip reads the
design and routing files and generates SystemC description
of network components for the selected topology using
pipesCompiler. The pipesCompiler instantiates a
network of building blocks from the pipes library, which
consists of composable soft macros (switches, network
interfaces, and links) described in SystemC at the cycle-
accurate level. The network components generated are
optimized for that particular network and support reliable,
latency-insensitive operation. The architecture of the
pipes network components is presented in Section 6. In
Section 7, the pipesCompiler is presented.
NetChip can also accept a custom hand-mapped
topology and generate the network components for the
topology. In such a case, the first two phases are skipped, as
shown in Fig. 1a. The resulting network generated by the
pipesCompiler is highly optimized for that particular
topology. The area, power, and latency savings of the
custom mappings can also be compared with mappings
onto standard topologies using the NetChip tool.
4TOPOLOGY MAPPING AND SELECTION
We formulate the mapping problem mathematically as
follows. The communication between the cores of the SoC is
represented by the core graph:
Definition 1. The core graph is a directed graph GðV;EÞ, where
V ¼fv
i
;i¼ 1; 2; ...;N1g, N1 ¼jV j, with each v
i
repre-
senting a core and the directed edge ðv
i
;v
j
Þ, denoted as
e
i;j
2 E, representing the communication between the cores v
i
and v
j
. The weight of the edge e
i;j
, denoted by comm
i;j
,
represents the bandwidth of the communication from v
i
to v
j
.
The connectivity and link bandwidth of the NoC is
represented by the NoC topology graph:
Definition 2. The NoC topology graph is a directed graph
P ðU; FÞ, where U ¼fu
i
;i¼ 1; 2; ...;N2g, N2 ¼jUj, with
each vertex u
i
2 U representing a node in the topology and the
directed edge ðu
i
;u
j
Þ, denoted as f
i;j
2 F representing a direct
communication between the vertices u
i
and u
j
. The weight of
the edge f
i;j
, denoted by bw
i;j
, represents the bandwidth
available across the edge f
i;j
.
ThemappingofthecoregraphGðV;EÞ onto the
processo r graph P ðU; FÞ is defined by the one-to-one
mapping function map:
map : V ! U; s:t: mapðv
i
Þ¼u
j
; 8v
i
2 V;9u
j
2 U: ð1Þ
The mapping is defined when jV jjUj.
As an example, the core graph of Video Object Plane
Decoder (Fig. 2a) is shown in Fig. 2b. Example topology
graphs of mesh, torus, hypercube, Clos, and butterfly are
116 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 16, NO. 2, FEBRUARY 2005
Fig. 1. Design flow of NetChip and the input GUI. (a) Design flow of NetChip and (b) snapshot of the input GUI.

shown in Fig. 4. An example mapping of the VOPD core
graph onto mesh and torus topology graphs is shown in
Figs. 2c and 2d.
The communication between each pair of cores (i.e., each
edge e
i;j
2 E) is treated as a flow of single commodity,
represented as fd
k
;k¼ 1; 2; ; jEjg.Thevalueofd
k
represents the bandwidth of communication across the
edge and is denoted by vl ðd
k
Þ. The set of all commodities is
represented by D and is defined as:
D ¼
d
k
: vlðd
k
Þ¼comm
i;j
;k¼ 1; 2; ; jEj; 8i; j : e
i;j
2 E;
with sourceðd
k
Þ¼mapðv
i
Þ; destðd
k
Þ¼mapðv
j
Þ:

ð2Þ
As an example, in Fig. 2b, the communication between vld
and rld is represented by a single commodity, d
1
, with
value vlðd
1
Þ equal to 70, sourceðd
1
Þ representing vld and
destðd
1
Þ representing rld.
4.1 General Minimum-Path Mapping Algorithm
In this section, we present the general mapping algorithm,
and in the next sections, we show how the algorithm is
adapted for each topology. NetChip supports different
routing functions: dimension-ordered, minimum-path, traffic
splitting across minimum-paths, and traffic splitting across all
paths. In dimension-ordered and minimum-path routing,
the communication between every pair of cores takes place
through a single path. Splitting the traffic across multiple
paths reduces the bandwidth requirements of network
links. In traffic splitting across minimum paths, the
communication between cores is spread only across the
minimum paths between them. Clearly, this is a special case
of the all-path traffic splitting. The advantage of this scheme
is that it has bandwidth requirements which are inter-
mediate to that of minimum-path routing and all-path
traffic splitting. Also, traffic streams across different paths
have the same hop delay, thereby reducing the jitter
associated with traffic splitting.
As the graph mapping problem is a special case of the
quadratic assignment problem, which is intractable [22],
[29], we use a heuristic approach with three phases:
1. An initial mapping is obtained using a greedy
algorithm.
2. For minimum-path routing, the minimum-paths and
mapping costs are computed. When the routing
function is traffic splitting, the paths are obtained by
solving a system of Multi-Commodity Flow (MCF)
equations [26].
3. The solution is iteratively improved by invoking the
second phase for every mapping produced by pair-
wise swapping of vertices.
The minimum-path mapping algorithm is presented in
Figs. 3a and 3b. In this paper, we present only mapping
algorithms for minimum-path routing and we refer the
interested reader to [26] for description of the mapping
algorithms for other routing functions. In the initial
mapping procedure, first the core that has maximum
communication demand is placed onto one of the mesh
nodes with maximum number of neighbors. Then, the core
that communicates the most with placed cores is chosen.
This core is placed onto the NoC node that minimizes the
cost function and this procedure is repeated until all the
cores are placed.
Once an initial mapping is obtained, in the second phase
(steps 2 to 8 in Fig. 3), the commodities are sorted in
decreasing order of their values. Then, for each commodity
in order, a quadrant graph between the source and destina-
tion of the commodity is formed, as the shortest path
between the source and destination lies within the quadrant
between them. The shaded regions in Figs. 2c and 2d are
examples of quadrant graphs for the communication
between the cores smem and iquant. The procedure for
forming quadrant graphs is presented in Section 4.3. Then,
Dijkstra’s shortest path algorithm is applied (step 5) to the
quadrant graph and the minimum path is obtained. The
edge weights are incremented suitably and the procedure is
repeated for each commodity in order. After routing all
commodities, if the bandwidth and area constraints are
satisfied, the cost of communication is calculated. Band-
width constraints are satisfied, if in the resulting mapping,
the traffic across any link is smaller than or equal to the
capacity of the link.
1
The area constraints are satisfied when
the mapped design area is lower than the maximum
allowed area and aspect ratios of the design and soft core
blocks (blocks that have flexible sizes) are within permis-
sible ranges. For the area-power estimates, area-power
models of the switches and floorplanner are incorporated
into NetChip as explained in Section 5. The mapping
algorithms can have many different objectives such as
minimizing average hop delay, area, or power dissipation
and is an input parameter to NetChip. Depending on the
objective function, the cost function calculation (done as
part of step 8) varies.
BERTOZZI ET AL.: NOC SYNTHESIS FLOW FOR CUSTOMIZED DOMAIN SPECIFIC MULTIPROCESSOR SYSTEMS-ON-CHIP 117
1. Capacity of a link in an NoC is techno logy and implementation
dependent and is assumed as an input to NetChip.
Fig. 2. VOPD block diagram and core graph, with communication BW annotated (in MB/s) and its mapping onto mesh and torus topologies. (a) VOPD
block diagram, (b) VOPD graph, (c) mesh mapping, and (d) torus mapping.

Citations
More filters
Journal ArticleDOI

Design-time application mapping and platform exploration for MP-SoC customised run-time management

TL;DR: A Pareto-based approach is proposed combining a design-time application and platform exploration with a low-complexity run-time manager to avoid conservative worst-case assumptions and eliminate large run- time overheads on the state-of-the-art RTOS kernels.
Journal ArticleDOI

Reliable network-on-chip design for multi-core system-on-chip

TL;DR: This paper presents a simple coding scheme for reducing power dissipation, crosstalk noise, and crosStalk delay on the bus while simultaneously detecting errors at runtime, using a simple bus-invert encoding technique.
Proceedings ArticleDOI

Case Study : NoC based Next-generation WLAN receiver design in Transaction Level

TL;DR: Next-generation WLAN receiver based on network-on-chip platform is designed and performance enhancement from exchanging protocol is 30% in 200 MHz switching frequency and 128 bit link width.
Proceedings ArticleDOI

Populating and exploring the design space of wavelength-routed optical network-on-chip topologies by leveraging the add-drop filtering primitive

TL;DR: To the first time the design space of wavelength-routed topologies is populated in a potentially exhaustive way through a systematic methodology, it becomes evident that for a specified quality metric, there exist better solutions than the topologies that have been found out so far by designers' intuition.
Proceedings ArticleDOI

Modeling and simulation of mobile gateways interacting with wireless sensor networks

TL;DR: This work presents the modeling and simulation of a network scenario, core of a telecom provider's future portfolio, in which an ARM-based mobile handset is used as the gateway between a wireless sensor network (WSN) and remote users through a wide area network ( WAN).
References
More filters
Book

Computers and Intractability: A Guide to the Theory of NP-Completeness

TL;DR: The second edition of a quarterly column as discussed by the authors provides a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book "Computers and Intractability: A Guide to the Theory of NP-Completeness,” W. H. Freeman & Co., San Francisco, 1979.
Journal ArticleDOI

Networks on chips: a new SoC paradigm

TL;DR: Focusing on using probabilistic metrics such as average values or variance to quantify design objectives such as performance and power will lead to a major change in SoC design methodologies.
Proceedings ArticleDOI

Route packets, not wires: on-chip interconnection networks

TL;DR: This paper introduces the concept of on-chip networks, sketches a simple network, and discusses some challenges in the architecture and design of these networks.
Journal ArticleDOI

Reconfigurable computing: a survey of systems and software

TL;DR: The hardware aspects of reconfigurable computing machines, from single chip architectures to multi-chip systems, including internal structures and external coupling are explored, and the software that targets these machines is focused on.
Frequently Asked Questions (17)
Q1. What are the contributions in "Noc synthesis flow for customized domain specific multiprocessor systems-on-chip" ?

The growing complexity of customizable single-chip multiprocessors is requiring communication resources that can only be provided by a highly-scalable communication infrastructure. This trend is exemplified by the growing number of Network-on-Chip ( NoC ) architectures that have been proposed recently for System-on-Chip ( SoC ) integration. The effectiveness of this approach largely depends on the availability of an ad hoc design methodology that, starting from a high-level application specification, derives an optimized NoC configuration with respect to different design objectives and instantiates the selected application specific on-chip micronetwork. This paper illustrates a complete synthesis flow, called NetChip, for customized NoC architectures, that partitions the development work into major steps ( topology mapping, selection, and generation ) and provides proper tools for their automatic execution ( SUNMAP, pipesCompiler ). Several experimental case studies are presented in the paper, showing the powerful design space exploration capabilities of the proposed methodology and tools. 

A library of highly parameterized, design time composable network building blocks ( pipes) is at the core of the proposed design methodology. 

NetChip in-turn has two tools built into it: SUNMAP which performs the topology mapping and selection phases and the pipesCompiler which generates the selected topology. 

The smaller number of switches and smaller switch sizes also account for the large area savings achieved by the butterfly network. 

The authors validated the need for Clos networks by producing mappings onto various topologies by relaxing the bandwidth constraints and simulating the resulting SystemC design. 

In order to automate tracing of signals, a debugging mode has been implemented, that enables monitoring of any signal in the design. 

the early works in [2], [34] pointed out the need for more scalable architectures for on-chip communication and, therefore, to progressively replace shared busses with on-chip networks. 

A regular topology, for example, such as a 16 16 mesh, can be generated faster than an irregular, application-specific topology with only few cores and switches. 

The most advanced state-of-the-art SoC communication architectures represent evolutionary solutions with respect to sharedbusses. 

The unused part of the datastream is stored in a regpark register, so that a new datastream can be read from the HEADER_BUILDER block. 

module OUT_BUFFER stores flits to be sent across the network, and allows the NIS to keep preparing successive flits also when the network is congested. 

The high-level description consists of the definition of the cores, network interfaces, switches, links, and their interconnections. 

The average link length in the butterfly network (obtained from floorplanner) was observed to be longer than the link lengths (around 1:5 ) of direct networks. 

For a torus network, the wraparound channels need to be considered for computing the smallest bounding box between the source and destination nodes (Fig. 2d). 

Setting up a fully automated synthesis framework for NoCs is a nontrivial task, particularly for the case of application specific MPSoCs, where a set of heterogeneous computing and storage resources have to be interconnected to each other by means of a custom-tailored communication network. 

The maximum distance between adjacent switches halves with each stage (e.g., switch 0 of stage 1 is connected to switches 0 and 2 of stage 2, resulting in a maximumdistance of 2. 

Using the built-in power models, power dissipation for the switches and links are calculated based on the average traffic (shown as edge annotations in Fig. 2b) through them.