What is the core of the proposed design methodology?

A library of highly parameterized, design time composable network building blocks ( pipes) is at the core of the proposed design methodology.

What is the reason for the large area savings achieved by the butterfly network?

The smaller number of switches and smaller switch sizes also account for the large area savings achieved by the butterfly network.

How did the authors validate the need for Clos networks?

The authors validated the need for Clos networks by producing mappings onto various topologies by relaxing the bandwidth constraints and simulating the resulting SystemC design.

What is the purpose of the debugging mode?

In order to automate tracing of signals, a debugging mode has been implemented, that enables monitoring of any signal in the design.

What is the fastest topology for a regular topology?

A regular topology, for example, such as a 16 16 mesh, can be generated faster than an irregular, application-specific topology with only few cores and switches.

What is the unused part of the datastream?

The unused part of the datastream is stored in a regpark register, so that a new datastream can be read from the HEADER_BUILDER block.

What is the function that stores flits?

module OUT_BUFFER stores flits to be sent across the network, and allows the NIS to keep preparing successive flits also when the network is congested.

What is the high-level description of the network?

The high-level description consists of the definition of the cores, network interfaces, switches, links, and their interconnections.

What is the average link length in the butterfly network?

The average link length in the butterfly network (obtained from floorplanner) was observed to be longer than the link lengths (around 1:5 ) of direct networks.

What is the smallest bounding box between the source and destination nodes?

For a torus network, the wraparound channels need to be considered for computing the smallest bounding box between the source and destination nodes (Fig. 2d).

What is the maximum distance between adjacent switches halves?

The maximum distance between adjacent switches halves with each stage (e.g., switch 0 of stage 1 is connected to switches 0 and 2 of stage 2, resulting in a maximumdistance of 2.

What is the power dissipation for the switches and links?

Using the built-in power models, power dissipation for the switches and links are calculated based on the average traffic (shown as edge annotations in Fig. 2b) through them.

(Open Access) NoC synthesis flow for customized domain specific multiprocessor systems-on-chip (2005) | Davide Bertozzi

NoC Synthesis Flow for Customized Domain

Specific Multiprocessor Systems-on-Chip

Davide Bertozzi, Antoine Jalabert, Srinivasan Murali, Student Member, IEEE,

Rutuparna Tamhankar, Student Member, IEEE, Stergios Stergiou, Student Member, IEEE,

Luca Benini, Member, IEEE, and Giovanni De Micheli, Fellow, IEEE

Abstract—The growing complexity of customizable single-chip multiprocessors is requiring communication resources that can only be

provided by a highly-scalable communication infrastructure. This trend is exemplified by the growing number of Network-on-Chip

(NoC) architectures that have been proposed recently for System-on-Chip (SoC) integration. Developing NoC-based systems tailored

to a particular application domain is crucial for achieving high-performance, energy-efficient customized solutions. The effectiveness of

this approach largely depends on the availability of an ad hoc design methodology that, starting from a high-level application

specification, derives an optimized NoC configuration with respect to different design objectives and instantiates the selected

application specific on-chip micronetwork. Automatic execution of these design steps is highly desirable to increase SoC design

productivity. This paper illustrates a complete synthesis flow, called NetChip, for customized NoC architectures, that partitions the

development work into major steps (topology mapping, selection, and generation) and provides proper tools for their automatic

execution (SUNMAP, pipesCompiler). The entire flow leverages the flexibility of a fully reusable and scalable network components

library called pipes, consisting of highly-parameterizable network building blocks (network interface, switches, switch-to-switch

links) that are design-time tunable and composable to achieve arbitrary topologies and customized domain-specific NoC architectures.

Several experimental case studies are presented in the paper, showing the powerful design space exploration capabilities of the

proposed methodology and tools.

Index Terms—Systems-on-chip, networks on chip, synthesis, mapping, architecture.

1INTRODUCTION

N contrast to past projections, today the introduction of

new technology so lutions is increasingly application

driven. As an example, let us consider ambient intelligence,

which is regarded as the new paradigm for consumer

electronics. Systems designed for ambient intelligence will

be based on high-speed digital signal processing, with

computational loads ranging from 10 MOPS for lightweight

audio processing, 3 GOPS for video processing, 20 GOPS for

multilingual conversation interfaces, and up to 1 TOPS for

synthetic video generation [4]. This computational chal-

lenge will have to be addressed at manageable power levels

and affordable costs, and a single processor will not suffice,

thus driving the development of increasingly more complex

Multi-Processor Systems-on-Chip (MPSoCs).

SoCs represent high-complexity, high-value semicon-

ductor products that incorporate building blocks from

multiple sources (either in-house made or externally

supplied), such as general-purpose fully programmable

processors, coprocessors, DSPs, dedicated hardware accel-

erators, memory blocks, I/O blocks, etc. Even though

commercial products currently exhibit only a few integrated

cores (e.g., NEC’s new TCP/IP offload engine is powered

by 10 Tensilica Xtensa Processor Cores [42]), in the next few

years technology will support the integration of thousands

of cores, making a large computational power available.

Full exploitation of the increased level of SoC integration

requires new paradigms and significant improvements of

design productivity, as current system architectures and

design styles do not scale up to such dimensions and

complexities. A relevant examp le regards the system

architecture, whose paradigm is progressively shifting from

computation-centric to communication-centric. In fact,

MPSoC performance will be increasingly determined by

the ability of the communication infrastructure to efficiently

accommodate the communication needs of the integrated

computation resources. Traditional state-of-the-art shared

busses cannot meet the scalability requirements of complex

MPSoCs due to the serialization of bus access requests, and

turn out to be also energy-inefficient due to the broadcast

communication paradigm.

A scalable communication architecture that supports the

trend of SoC integration consists of an on-chip packet-

switched micronetwork of interconnects, generally known as

Network-on-Chip (NoC) [2], [21], [34]. The scalable and

modular nature of NoCs and their support for efficient on-

chip communication potentially leads to NoC-based multi-

processor systems characterized by high structural complex-

ity and functional diversity. It is observed in [14] that NoC-

based systems are economically feasible if they can be used in

several product variants, and if the design can be reused in

different application areas. On the other hand, successful

products must provide good performance characteristics,

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 16, NO. 2, FEBRUARY 2005 113

. D. Bertozzi and L. Benini are with DEIS, University of Bologna, Viale

Risorgimento 2, 40136, Bologna, Italy.

E-mail: {dbertozzi, lbenini}@deis.unibo.it.

. A. Jalabert is with CEA-LETI, France. E-mail: antoine.jalabert@cea.fr.

. S. Murali, R. Tamhankar, S. Stergiou, and G. De Micheli are with the

Department of Electrical Engineering, Gates Computer Science Building,

Room 330, 353 Serra Mall, Stanford University, Stanford, CA 94305.

E-mail: {smurali, rutu, utopcell, nanni}@stanford.edu.

Manuscript received 30 Jan. 2004; revised 29 June 2004; accepted 21 July

2004; published online 20 Dec. 2004.

For information on obtaining reprints of this article, please send e-mail to:

tpds@computer.org, and reference IEEECS Log Number TPDSSI-0035-0104.

1045-9219/05/$20.00 ß 2005 IEEE Published by the IEEE Computer Society

thus requiring dedicated solutions that are tailored to specific

needs. As a consequence, the challenge lies in the capability to

design hardware-optimized, customizable computation plat-

forms for each application domain [9].

Hardware optimization can be achieved by facilitating

the integration of domain-specific computation resources in

a plug-and-play design style. Standard interface sockets

such as Virtual Component Interface (VCI) [43] and Open Core

Protocol (OCP) [44] have been developed for this purpose

and support the use of a common NoC as the basis for

system integration. A relevant task of these interfaces is to

make the NoC adaptive to the different features of the

integrated cores (e.g., data and address bus width).

NoC architectures are pushing the evolution of traditional

IC design methodologies in order to more effectively deal

with functional diversity and complexity. At the application

level, the key design challenge i s to expose task-level

parallelism and to formally capture concurrent communica-

tion in models of computation [14]. Then, high-level con-

current tasks have to be mapped to the underlying

communication and computation resources. At this level,

an abstract model of the hardware architecture is usually

exposed to the mapping t ool, so that area and power

estimates can be given in the early design stage, and different

objective functions (e.g., minimization of communication

energy) can be considered to evaluate the feasibility of

alternative mappings. For NoC-based MPSoCs, a critical step

in communication mapping is the network topology selection

for its significant impact on overall system performance,

which is increasingly communication-dominated.

Although a lot of research efforts are being devoted to

improving individual design activities, there are very few

complete NoC design methodologies and CAD tools.

Setting up a fully automated synthesis framework for NoCs

is a nontrivial task, particularly for the case of application

specific MPSoCs, where a set of heterogeneous computing

and storage resources have to be interconnected to each

other by means of a custom-tailored communication net-

work. This translates into the need to provide design time

instantiation of different network schemes and topologies,

tailored to the specific application domain.

A library-based approach to NoC design could be an

effective solution [12], [24], wherein predesigned soft

macros are composed at instantiation time to build arbitrary

topologies. However, the full exploitation of a customizable

network topology requires an ad hoc design methodology

spanning different levels of abstractions (from application

specification to physical implementation) and deriving the

most efficient NoC configuration for a given application

domain.

The design methodology has to partition the design

problem into manageable tasks and to define the tools and

practices for those tasks. In this paper, we propose a NoC

synthesis flow, called NetChip, for designing domain-

specific NoCs and automating most of the complex and

time-intensive design steps. Significantly, NetChip pro-

vides design support also for regular network topologies

and, therefore, lends itself to the implementation of both

homogeneous and heterogeneous system interconnects.

NetChip assumes that the application has already been

mapped onto cores by using preexisting tools (such as [15])

and the resulting cores together with their communication

requirements represent the inputs to our NoC synthesis flow.

The tool-assisted design and generation of a customized

NoC-based communication architecture is the ultimate goal

of NetChip, and is achieved by means of three major design

activities: topology mapping, topology selection, and topology

generation. NetChip leverages two tools: SUNMAP, which

performs the network topology mapping and selection func-

tions, and pipesCompiler, which performs the topology

generation function.

SUNMAP produces a mapping of cores onto various NoC

topologies that are defined in a topology library. The

mappings are optimized for the chosen design objective

(such as minimizing area, power or hop delay) and satisfy

the design constraints (such as area or bandwidth con-

straints). SUNMAP uses floorplanning information early in

the mapping process to determine the area-power estimates

of a mapping and to produce feasible mappings (satisfying

the design constraints). The tool supports various routing

functions (dimension ordered, minimum-path, traffic split-

ting across minimum-paths, traffic splitting across all paths)

and chooses the mapping onto the best topology from the

library of available ones.

A design file describing the chosen topology is input to

the pipesCompiler, which automatically generates the

SystemC description of the network components (switches,

links, and network interfaces) and their interconnection

with the cores. A custom hand-mapped topology specifica-

tion can also be accepted by the NoC synthesizer, and the

network components with the selected configuration can be

generated accordingly. The resulting SystemC code for the

whole design can be simulated at the cycle-accurate and

signal accur ate level. The pipesCompiler uses the

pipes library, which consists of highly parameterizable

network building blocks that can be tuned and composed at

design time to generate the c hosen topology. Thus,

NetChip automates NoC mapping, selection, and genera-

tion functions of a design, thereby bridging an important

design gap in building NoCs.

The rest of the paper is organized as follows: In the next

section, we present the previous works in this area. In

Section 3, we present the design methodology of NetChip.

In Sections 4 and 5, we present the SUNMAP tool and the

area-power models used in the tool. In Section 6, we present

the architecture of networks components defined in the

pipes library. We present the pipesCompiler in

Section 7. The NetChip design flow is used to model

several video and network applications. The communica-

tion pattern in these applications differ, thereby requiring

various topologies for different applications. These are

presented in Sections 8.1 and 8.2. The rich design space

exploration capabilities of NetChip is shown in Section 8.3.

The design flow can also be used to model custom hand-

mapped topologies and is explained in Section 8.4. In

Section 8.5, we model a DSP Filter application and generate

the SystemC files of the chosen topology. The resulting

design is simulated at the cycle-accurate level and the

simulations are checked for functional and timing correct-

ness, validating the output of our tools.

2PREVIOUS WORK

The most advanced state-of-the-art SoC communication

architectures represent evolutionary solutions with respect

to shared busses. Sonics MicroNetwork [36] is a TDMA-based

bus which can easily adapt to the data-word width, burst

attributes, interrupt schemes, and other critical parameters of

the integrated cores, while providing very high bandwidth

utilization. STBUS interconnect is a h igh performance

114 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 16, NO. 2, FEBRUARY 2005

communication infrastructure that allows to instantiate

shared busses as well as more advanced topologies such as

partial or full crossbars. Although evolutionary from a

topology viewpoint, these solutions can rely on advanced

and highly automated design methodologies for the imple-

mentation of generic communication subsystems, allowing

designers to rapidly assemble, synthesize, and verify their

SoCs using the MicroNetwork or the STBUS interconnect as

integration platforms.

However, the early works in [2], [34] pointed out the

need for more scalable architectures for on-chip commu-

nication and, therefore, to progressively replace shared

busses with on-chip networks. Many NoC architectures

have therefore been proposed in the open literature so far,

but in most cases, the design methodologies and tools are

still in the early stage.

One of the earliest contributions in this area is the Maia

heterogeneous signal processing architecture, proposed by

Zhang et al., based on a hierarchical mesh network [10].

Unfortunately, Maia’s interconnect is fully instance-specific.

Furthermore, routing is static at configuration time and

communication is based on circuit switching, as opposed to

packet switching. In this direction, Dally and Lacy sketch

thearchitectureofaVLSImulticomputerusing2009

technology [35]. A chip with 64 processor-memory tiles is

envisioned. Communication is based on packet switching.

This seminal work draws upon past experiences in

designing parallel computers and reconfigurable architec-

tures (FPGAs and their evolutions) [30], [31], [32].

Most proposed NoC platforms are packet switched and

exhibit regular structure. An example is a mesh intercon-

nection, which can rely on a simple layout and the switch

independence on the network size. The NOSTRUM net-

work described in [5] takes this approach: The platform

includes both a mesh topology and the relative design

methodology, wherein a concrete architecture is derived

from a general NoC template, then application mapping

follows.

The Scalable Programmable Integrated Network (SPIN)

described in [3] is another regular, fat-tree-based network

architecture. It adopts cut-through switching to minimize

message latency and storage requirements in the design of

network switches. The Linkoeping SoCBUS [39] is a two-

dimensional mesh network which uses a packet connected

circuit (PCC) to set up routes through the network: A packet is

switched through the network locking the circuit as it goes.

This notion of virtual circuit leads to deterministic commu-

nication behavior but restricts routing flexibility for the rest of

the communication traffic.

In [8], the use of octagon communication topology for

network processors is presented. Instead, the implementa-

tion of a star-connected on-chip network supporting plesio-

chronous communication among system components is

described in [13].

The Aethereal NoC design framework presented in [7] aims

at providing a complete infrastructure for developing

heterogeneous NoC with end-to-end quality of service

guarantees. The network supports guaranteed throughput

(GT) for real-time applications and best effort (BE) traffic for

timing unconstrained applications. Support for heteroge-

neous architectures requires highly configurable network

building blocks, customizable at instantiation time for a

specific application domain. For instance, the Proteo NoC [12]

consists of a small library of predefined, parameterized

components that allow the implementation of a large range of

different topologies, protocols and configurations. pipes

interconnect [24] and its synthesizer pipesCompiler [25]

push this approach to the limit, by instantiating an applica-

tion specific NoC from a library of composable soft macros

(network interface, link, and switch). The components are

highly parameterizable and provide reliable and latency

insensitive operation. They represent the core of the NoC

synthesis flow illustrated in this paper.

In [11], a hierarchical approach for designing on-chip

networks was presented to help designers compare different

design options. Design methodologies for building irregular

networks have been proposed in [18], [20]. Pinto et al. [18]

presents a heuristic for the constraint-driven communication

synthesis of on-chip communication networks, while [20]

describes a design methodology for finding minimal topol-

ogies that support low contention or contention-free com-

munication for known communication patterns. In [19],

memory opti mization in s ingle chip network f abrics is

explored.

The problem of mapping cores onto NoC architectures is

addressed in [22], [23], [26], [27]. In [22], a branch-and-

bound algorithm is used to map cores onto a mesh-based

architecture with the objective of minimizing energy and

satisfying the bandwidth constraints of the NoC. A simple

dimension-ordered routing is assumed in the work. In [23],

the authors extend the above work for other deadlock free

minimal path routing algorithms. In [26], fast algorithms for

mesh NoC architectures under different routing functions

(minimum path, split-traffic) and delay/bandwidth con-

straints are presented.

The design methodology and tools presented in this

paper aim at providing MPSoC designers with a framework

for the rapid selection and synthesis of application-specific

NoC architectures. While still allowing the comparison and

generation of regular network topologies, our NoC design

framework supports the synthesis of customized irregular

topologies, and bridges a gap in a largely unexplored

research area.

3DESIGN FLOW OF NETCHIP

The design flow of NetChip is presented in Fig. 1a. The

application is mapped onto cores during the hardware/

software codesign phase using existing tools such as [15].

By means of static analysis or simulation, it is possible to

determine the average rate of data transfer between the

cores. The resulting cores and communication demands

between them is represented by a graph, called core graph,

and is the input to our tool. NetChip has three phases of

operation: topology mapping phase, topology selection phase,

and topology generation phase. NetChip in-turn has two tools

built into it: SUNMAP which performs the topology mapping

and selection phases and the pipesCompiler which

generates the selected topology.

In the topology mapping phase, NetChip takes as inputs:

. the core graph with communication among cores

annotated as edge weights,

. the design objective function that needs to be

optimized, and

. the design constraints that are to be satisfied by the

mapping.

Netchip has a Graphical User Interface (GUI) designed in

TCL/TK for entering the inputs. A snapshot of the GUI is

BERTOZZI ET AL.: NOC SYNTHESIS FLOW FOR CUSTOMIZED DOMAIN SPECIFIC MULTIPROCESSOR SYSTEMS-ON-CHIP 115

presented in Fig. 1b. The input core graph is then mapped

onto various standard topologies (mesh, torus, hypercube,

Clos, and butterfly) defined in the topology library. The

approach presented here is general and other topologies

(such as the star network or the octagon network [13], [8]) can

be easily added to the library. Netchip explores various

design objectives such as minimizing average hop delay,

area, and power dissipation. The tool also supports

different routing functions: dimension-ordered, minimum path,

traffic splitting across minimum paths, and traffic splitting

across all paths. For each mapping, the bandwidth and area

constraints are evaluated, so that only feasible mappings are

chosen. The area-power models and floorplanner are built

into NetChip, so that area-power estimates can be

incorporated early in the mapping process. For a chosen

design objective and routing function, the best feasible

mappings onto various topologies are obtained.

In the topology selection phase, the various topologies (with

mappings produced from the mapping phase) are evaluated

for several design objectives and the best topology for the

application is chosen. The design file describing the selected

topology and routing files describing the routes (or paths) to

be taken (which depends on the chosen routing function) are

automatically generated. The SUNMAP tool which incorpo-

rates these two phases is explained in Section 4.

In the topology generation phase, NetChip reads the

design and routing files and generates SystemC description

of network components for the selected topology using

pipesCompiler. The pipesCompiler instantiates a

network of building blocks from the pipes library, which

consists of composable soft macros (switches, network

interfaces, and links) described in SystemC at the cycle-

accurate level. The network components generated are

optimized for that particular network and support reliable,

latency-insensitive operation. The architecture of the

pipes network components is presented in Section 6. In

Section 7, the pipesCompiler is presented.

NetChip can also accept a custom hand-mapped

topology and generate the network components for the

topology. In such a case, the first two phases are skipped, as

shown in Fig. 1a. The resulting network generated by the

pipesCompiler is highly optimized for that particular

topology. The area, power, and latency savings of the

custom mappings can also be compared with mappings

onto standard topologies using the NetChip tool.

4TOPOLOGY MAPPING AND SELECTION

We formulate the mapping problem mathematically as

follows. The communication between the cores of the SoC is

represented by the core graph:

Definition 1. The core graph is a directed graph GðV;EÞ, where

V ¼fv

;i¼ 1; 2; ...;N1g, N1 ¼jV j, with each v

repre-

senting a core and the directed edge ðv

Þ, denoted as

i;j

2 E, representing the communication between the cores v

and v

. The weight of the edge e

i;j

, denoted by comm

i;j

represents the bandwidth of the communication from v

to v

The connectivity and link bandwidth of the NoC is

represented by the NoC topology graph:

Definition 2. The NoC topology graph is a directed graph

P ðU; FÞ, where U ¼fu

;i¼ 1; 2; ...;N2g, N2 ¼jUj, with

each vertex u

2 U representing a node in the topology and the

directed edge ðu

Þ, denoted as f

i;j

2 F representing a direct

communication between the vertices u

and u

. The weight of

the edge f

i;j

, denoted by bw

i;j

, represents the bandwidth

available across the edge f

i;j

ThemappingofthecoregraphGðV;EÞ onto the

processo r graph P ðU; FÞ is defined by the one-to-one

mapping function map:

map : V ! U; s:t: mapðv

Þ¼u

; 8v

2 V;9u

2 U: ð1Þ

The mapping is defined when jV jjUj.

As an example, the core graph of Video Object Plane

Decoder (Fig. 2a) is shown in Fig. 2b. Example topology

graphs of mesh, torus, hypercube, Clos, and butterfly are

116 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 16, NO. 2, FEBRUARY 2005

Fig. 1. Design flow of NetChip and the input GUI. (a) Design flow of NetChip and (b) snapshot of the input GUI.

shown in Fig. 4. An example mapping of the VOPD core

graph onto mesh and torus topology graphs is shown in

Figs. 2c and 2d.

The communication between each pair of cores (i.e., each

edge e

i;j

2 E) is treated as a flow of single commodity,

represented as fd

;k¼ 1; 2; ; jEjg.Thevalueofd

represents the bandwidth of communication across the

edge and is denoted by vl ðd

Þ. The set of all commodities is

represented by D and is defined as:

D ¼

: vlðd

Þ¼comm

i;j

;k¼ 1; 2; ; jEj; 8i; j : e

i;j

2 E;

with sourceðd

Þ¼mapðv

Þ; destðd

Þ¼mapðv

Þ:



ð2Þ

As an example, in Fig. 2b, the communication between vld

and rld is represented by a single commodity, d

, with

value vlðd

Þ equal to 70, sourceðd

Þ representing vld and

destðd

Þ representing rld.

4.1 General Minimum-Path Mapping Algorithm

In this section, we present the general mapping algorithm,

and in the next sections, we show how the algorithm is

adapted for each topology. NetChip supports different

routing functions: dimension-ordered, minimum-path, traffic

splitting across minimum-paths, and traffic splitting across all

paths. In dimension-ordered and minimum-path routing,

the communication between every pair of cores takes place

through a single path. Splitting the traffic across multiple

paths reduces the bandwidth requirements of network

links. In traffic splitting across minimum paths, the

communication between cores is spread only across the

minimum paths between them. Clearly, this is a special case

of the all-path traffic splitting. The advantage of this scheme

is that it has bandwidth requirements which are inter-

mediate to that of minimum-path routing and all-path

traffic splitting. Also, traffic streams across different paths

have the same hop delay, thereby reducing the jitter

associated with traffic splitting.

As the graph mapping problem is a special case of the

quadratic assignment problem, which is intractable [22],

[29], we use a heuristic approach with three phases:

1. An initial mapping is obtained using a greedy

algorithm.

2. For minimum-path routing, the minimum-paths and

mapping costs are computed. When the routing

function is traffic splitting, the paths are obtained by

solving a system of Multi-Commodity Flow (MCF)

equations [26].

3. The solution is iteratively improved by invoking the

second phase for every mapping produced by pair-

wise swapping of vertices.

The minimum-path mapping algorithm is presented in

Figs. 3a and 3b. In this paper, we present only mapping

algorithms for minimum-path routing and we refer the

interested reader to [26] for description of the mapping

algorithms for other routing functions. In the initial

mapping procedure, first the core that has maximum

communication demand is placed onto one of the mesh

nodes with maximum number of neighbors. Then, the core

that communicates the most with placed cores is chosen.

This core is placed onto the NoC node that minimizes the

cost function and this procedure is repeated until all the

cores are placed.

Once an initial mapping is obtained, in the second phase

(steps 2 to 8 in Fig. 3), the commodities are sorted in

decreasing order of their values. Then, for each commodity

in order, a quadrant graph between the source and destina-

tion of the commodity is formed, as the shortest path

between the source and destination lies within the quadrant

between them. The shaded regions in Figs. 2c and 2d are

examples of quadrant graphs for the communication

between the cores smem and iquant. The procedure for

forming quadrant graphs is presented in Section 4.3. Then,

Dijkstra’s shortest path algorithm is applied (step 5) to the

quadrant graph and the minimum path is obtained. The

edge weights are incremented suitably and the procedure is

repeated for each commodity in order. After routing all

commodities, if the bandwidth and area constraints are

satisfied, the cost of communication is calculated. Band-

width constraints are satisfied, if in the resulting mapping,

the traffic across any link is smaller than or equal to the

capacity of the link.

The area constraints are satisfied when

the mapped design area is lower than the maximum

allowed area and aspect ratios of the design and soft core

blocks (blocks that have flexible sizes) are within permis-

sible ranges. For the area-power estimates, area-power

models of the switches and floorplanner are incorporated

into NetChip as explained in Section 5. The mapping

algorithms can have many different objectives such as

minimizing average hop delay, area, or power dissipation

and is an input parameter to NetChip. Depending on the

objective function, the cost function calculation (done as

part of step 8) varies.

BERTOZZI ET AL.: NOC SYNTHESIS FLOW FOR CUSTOMIZED DOMAIN SPECIFIC MULTIPROCESSOR SYSTEMS-ON-CHIP 117

1. Capacity of a link in an NoC is techno logy and implementation

dependent and is assumed as an input to NetChip.

Fig. 2. VOPD block diagram and core graph, with communication BW annotated (in MB/s) and its mapping onto mesh and torus topologies. (a) VOPD

block diagram, (b) VOPD graph, (c) mesh mapping, and (d) torus mapping.

NoC synthesis flow for customized domain specific multiprocessor systems-on-chip

Figures

Citations

Design-time application mapping and platform exploration for MP-SoC customised run-time management

Reliable network-on-chip design for multi-core system-on-chip

Case Study : NoC based Next-generation WLAN receiver design in Transaction Level

Populating and exploring the design space of wavelength-routed optical network-on-chip topologies by leveraging the add-drop filtering primitive

Modeling and simulation of mobile gateways interacting with wireless sensor networks

References

Johnson: Computers and Intractability-A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness

Networks on chips: a new SoC paradigm

Route packets, not wires: on-chip interconnection networks

Reconfigurable computing: a survey of systems and software

Related Papers (5)

Networks on chips: a new SoC paradigm

Route packets, not wires: on-chip interconnection networks

A network on chip architecture and design methodology

Principles and Practices of Interconnection Networks

AEthereal network on chip: concepts, architectures, and implementations

Frequently Asked Questions (17)

Q1. What are the contributions in "Noc synthesis flow for customized domain specific multiprocessor systems-on-chip" ?

Q2. What is the core of the proposed design methodology?

Q3. What is the main tool used in the topology mapping phase?

Q4. What is the reason for the large area savings achieved by the butterfly network?

Q5. How did the authors validate the need for Clos networks?

Q6. What is the purpose of the debugging mode?

Q7. What are the early works in the literature on shared busses?

Q8. What is the fastest topology for a regular topology?

Q9. What are the advanced state-of-the-art communication architectures?

Q10. What is the unused part of the datastream?

Q11. What is the function that stores flits?

Q12. What is the high-level description of the network?

Q13. What is the average link length in the butterfly network?

Q14. What is the smallest bounding box between the source and destination nodes?

Q15. What is the role of the network topology in the design of a NoC?

Q16. What is the maximum distance between adjacent switches halves?

Q17. What is the power dissipation for the switches and links?