What are the two operations that allow the design of complex on-chip communications?

To allow the incremental design of complex on-chip communications, the authors introduce two operations: renaming and parallel composition.

What is the restriction of a configuration to a subset of components?

Given a communication structure N(C,q, L), the restriction of a configuration l ∈ L to a subset of components C′ ⊆ C, denoted by l|C′ , is a function f : C′ → Dq such that f(c) = l(c) for all c ∈ C′.

How is the power of the noC found by the heuristic?

The power of the NoC found by the heuristic is within 2x from the power found by CPLEX that is very optimistic for the change in the cost function and for the relaxation of the integer constraints.

How many bits of flit were used to achieve a maximum link capacity of 3.2?

Since the total memory bandwidth is 3GBps, the authors set the flit width to 128 bits to achieve a link capacity of 3.2GBps with a maximum flit rate of 200 ·106 per input port of the routers.

Why is the maximum number of nodes limited to lI?

Because the authors want lI [(x, y, τ)] to be injective (i.e. only one component of a specific type can be installed in a particular location), the maximum number of nodes in any platform instance is limited to |D(x,y,τ)|.

In what section does the optimization technique explore the isomorphic-free set of regular top?

In [24], the optimization technique explores the isomorphic-free set of all regular topologies and in [25] the authors assume that one NP is given as input to their algorithm.

What is the communication structure among all possible platform instances?

According to Lemma 1, if the authors can find the greatest element NP of 〈L〉 with respect to the ordering relation ≤qP , then the solution of problem PR1 with NP = NP is the best communication structure among all possible platform instances.

How many libraries of communication components were used in this experiment?

The authors used six libraries of communication components differing for the flit-width of the data path (32 and 128 bits corresponding to 280 ·106 and 70 ·106 flits per second, respectively) and the size of the largest switch available in the library (2 × 2, 5 × 5 and 8 × 8).

What can be used to optimize the bus circuitry?

The transfer table information can be used at a lower abstraction level to optimize the bus circuitry (e.g. decoders and multiplexers) or even to segment the bus and insert bus bridges.

What is the procedure that checks the delay constraints of the nodes?

If a delay model must be taken into account to check delay constraints, the best path is discovered by a labeling algorithm (SpLabeling) that finds the minimumcost constrained shortest path between two nodes; a modified version of Dijkstra’s shortest path algorithm is used otherwise.

What is the degree constraint for the rip-up and reroute approach?

The links connected to the output of nodes with output degree violations and links connected to the input of nodes with input degree violations are the ones that are considered for rip-up and re-route.

(Open Access) A Methodology for Constraint-Driven Synthesis of On-Chip Communications (2009) | Alessandro Pinto

Q: What have the authors contributed in "A methodology for constraint-driven synthesis of on-chip communications" ?

The authors present a methodology and an optimization framework for the synthesis of on-chip communication through the assembly of components such as interfaces, routers, buses and links, from a target library.

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 28, NO. 3, MARCH 2009 1

A Methodology for Constraint-Driven Synthesis of

On-Chip Communications

Alessandro Pinto, Member, IEEE, Luca P. Carloni, Member, IEEE, and

Alberto L. Sangiovanni-Vincentelli, Fellow, IEEE

Abstract—We present a methodology and an optimization

framework for the synthesis of on-chip communication through

the assembly of components such as interfaces, routers, buses and

links, from a target library. Models for functionality, cost, and

performance of each element are captured in the library together

with their composition rules. We develop a mathematical frame-

work to model communication at different levels of abstraction

from the point-to-point input speciﬁcation to the library elements

and the ﬁnal implementation.

Index Terms—Communication synthesis, System-on-chip, In-

terconnect synthesis, Performance optimization.

I. INTRODUCTION

ITH the advances of IC technology, global intercon-

nects have become the dominant factor in determining

chip performance: they are not only becoming responsible for

a larger fraction of the overall delay and power dissipation

but exacerbate also design problems such as noise coupling,

routing congestion, and timing closure, thereby imposing

severe limitations on design productivity [1], [2]. Because of

these characteristics, most VLSI circuits can be considered

distributed systems, a fact that challenges traditional design

methodologies and the electronic design automation tools that

are based on them [3]. Systems-on-Chip (SoCs) are typically

from different vendors and/or different divisions of the same

company in the attempt of reducing time-to-market by reusing

pre-designed and pre-veriﬁed elements. However, since these

components are designed independently, the assembly step

is often a challenging problem that requires the design of

communication interfaces to match different protocols and data

parallelism, and the routing of global interconnect wires to

meet the constraints imposed by the target clock period.

The Open Core Protocol (OCP) [4] tackles this problem by

deﬁning a standard open-domain interface with which IP cores

should comply to allow fast integration using appropriate inter-

connect architectures. While there is no intrinsic limitation on

This work was partially supported by the GSRC Focus Center, one of

ﬁve research centers funded under the Focus Center Research Program, a

Semiconductor Research Corporation program, and by the National Science

Foundation (Award #: 0644202).

A. Pinto is with United Technologies Research Center, East Hartford, CT,

most of this work was carried out while at the Dept. of EECS, U.C. Berkeley,

CA 94720, (apinto@eecs.berkeley.edu).

L.P. Carloni is with Department of Computer Science, Columbia University

New York, NY 10027 (luca@cs.columbia.edu).

A. Sangiovanni-Vincentelli is with the Dept. of EECS, U.C. Berkeley, CA

94720, (alberto@eecs.berkeley.edu). Manuscript received November 15,

2007; revised April 28, 2008. Copyright

2008 IEEE. Personal use of this

material is permitted. However, permission to use this material for any other

purposes must be obtained from the IEEE by sending an email to pubs-

permissions@ieee.org.

the interconnect architecture for OCP, most designers rely on

traditional bus architectures so that pre-designed components

can be used. In this domain, proprietary protocols such as the

ARM AMBA BUS and the IBM CORECONNECT are popular

among SoC designers making the adoption of a universal

standard difﬁcult at best.

We argued that SoCs are distributed systems. For this

reason, bus architectures may not be always ideal; in fact,

a set of seminal papers has proposed scalable, multi-hop,

packet-switched Networks-on-Chip (NoCs) as a solution for

the integration of IP components as an interesting alterna-

tive [5]–[7]. Borrowing from the communication networks

literature, an NoC can be built through the combination of

heterogeneous elements such as interfaces, routers, and links.

The NoC design is a challenging problem because there are

many degrees of freedom (e.g. network topologies, routing

protocols, ﬂow-control mechanisms, positions of the commu-

nication components and core interfaces) as well as multiple

optimization goals (e.g. performance, power, area occupation

and reliability). Hence, the problem had been simpliﬁed by

limiting the number and types of components considered, by

focusing on a subset of the relevant objectives, by constraining

NoC topology and components positions, and by dividing the

optimization process in successive stages. Limiting the degrees

of freedom has also the important side effect of reducing

implementation and layout complexity.

In [8] Bertozzi et al. propose NETCHIP, a synthesis ﬂow to

derive an application-speciﬁc NoC by mapping the application

cores on standard topologies (e.g torus, mesh, hypercube) in

an optimal way. In [9], Hu and Marculescu perform mapping

and routing on the NoC with optimal energy and performance.

Lahiri et al. use standard topologies consisting of sets of

channels (point-to-point links or shared busses) connected by

bridges [10]. Ogras et al. propose a perturbation method that

starting from the mapping of an application on a standard

topology optimizes performance and cost by inserting custom

long links between routers [11]. In [12] Murali et al. synthesize

NoCs that, albeit being more general than the approaches

that start from a regular topology, are still constrained to be

“two-level structures”, where star topologies are connected

by links to satisfy inter-cluster communication requirements.

In [13] Srinivasan et al. synthesize an application-speciﬁc

NoC without assuming any pre-existing interconnection fabric.

The synthesis problem is linearized and solved via integer

linear programming (ILP) that, due to its complexity, yields

running time of the order of several hours even for relatively

small instances. In [14] the same authors propose an efﬁcient

approximation algorithm that is strongly tied to the cost model

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 28, NO. 3, MARCH 2009 2

and that does not consider constraints on the router size (i.e.

number of inputs and outputs).

While a rich set of interesting results exists in the literature,

few are the examples of practical applications of NoCs. In fact,

the debate between those who favor standard bus architectures

or variations thereof and those who advocate the adoption

of NoC approaches ranging from constrained architectures to

custom ones is vibrant. We do not take sides even though

the NoC approach has undisputable fundamental merits that

may make it successful in the long run. Instead, we propose a

general methodology for the design of on-chip communication

that can explore a large number of alternatives including as

special cases NoCs, bus architectures and hybrid ones. Thanks

to its generality our approach can be used to build a framework

where different constrained solutions are compared using a

number of evaluation factors.

We address the synthesis of optimal heterogeneous networks

by assembling components from a ﬁne-grained library without

enforcing any constraint on their topology other than the ones

formally captured in the library. In particular, the network that

we obtain need not be direct and not even connected if these

constraints are not captured in the composition rules of the

communication components.

Our approach is detailed in the rest of the paper as follows:

In Section II, we introduce formally the SoC design speciﬁ-

cation (i.e. the function), the target technology process with

the library of communication components and the ﬁnal com-

munication implementation. At a ﬁrst glance, the formalism

used in this section may seem overly complex. However, in

our opinion, the beneﬁts it offers in terms of generality (the

same formalism applies independent of the communication

synthesis problem being investigated) outweigh its complexity.

In Section III, we show how to use this formal framework

to formulate a general optimization problem for a general

class of libraries. In Section IV, we use our framework to

formulate the communication synthesis problem in the speciﬁc

case of NoCs. and provide a heuristic algorithm to solve the

resulting complex integer optimization problem. The algorithm

is independent from the speciﬁc input constraints and the

target platform. We do report a customization of the algorithm

that takes into account bandwidth and latency constraints, ex-

pressed as hop count, to synthesize a minimal-power NoC. The

general algorithmic framework can be customized in several

other ways by changing the cost function and constraints.

The material presented in this paper is the theoretical foun-

dation of COSI-OCC , a design ﬂow for on-chip communication

synthesis design that is part of the COmmunication Syn-

thesis Infrastructure (COSI). COSI is a public-domain design

framework for the analysis and synthesis of interconnection

networks [15]. Our goal has been to provide an infrastructure

that can be used by researchers and designers as a basis

for developing new design ﬂows by integrating additional

models, library elements, analysis tools and synthesis tools

In Section V, we brieﬂy describe COSI-OCC together with

the results we obtain by applying it to a number of test

This approach is similar to the one our group followed in developing

MIS that has been used for years as a platform to invent and test new logic

synthesis algorithms [16].

PAD1

PAD2 PAD3

PAD4

(0.2, 2.44)

1.44

0.65

0.2

0.46

124

538

207

297

0.55

Mutually

exclusive

constraints

dem

(OCP)

aud

(OCP)

vid

(OCP)

mem

(OCP)

HDTV

(OCP)

CPU

(AMBA)

stb

Area ( )

Position

Fig. 1. The system-level speciﬁcation of a simpliﬁed Set-Top Box. Each

core in the speciﬁcation is annotated with and area in mm

and each arrow

is annotated with a bandwidth constraint in M B/s.

cases for NoC design. We present more details on COSI and

COSI-OCC in [17] and we provide a detailed comparison of

our approach with other on-chip communication design tools

in [18].

II. THE METHODOLOGY AND ITS MATHEMATICAL

REPRESENTATION

A. The Methodology

The general approach is based on Platform-Based Design

(PBD) [19] where the design speciﬁcation and the imple-

mentation alternatives are kept separate. The methodology is

recursive: the functional speciﬁcation is implemented on a

particular architecture through a series of reﬁnement steps. At

each step, which corresponds to a speciﬁc level of abstraction,

the implementation alternatives are characterized by a set of

components, called library, that can be instantiated, conﬁg-

ured, and assembled according to speciﬁc rules, to derive a

more complex structure. The set of components together with

their compositional rules deﬁne a platform which is a family

of admissible solutions. The task of the synthesis process is

then to select one out of this family (a platform instance) and a

mapping of the speciﬁcation onto the components that satisfy

the requirements and possibly optimize the objectives of the

design. The implementation reﬁnes both requirements and

platform instance and is deﬁned at a lower level of abstraction.

In this process, it is essential to formalize how requirements

are speciﬁed, how the library is described, and how the

composition rules are deﬁned and applied to generate the space

of admissible solutions. The composition rules can be used to

encode constraints related to the topology that the designer

wishes to consider while the components in the library de-

termine which kind of “nodes” can be selected. To select a

platform instance using an optimization algorithm we must

associate to each library component (and to the hierarchical

composition of two or more of them) a “characterization” in

terms of cost, performance, power, and “type” (e.g., number of

ports and interface type of a router) that allows us to evaluate

metrics associated with the objectives and constraints of the

design.

To illustrate our approach, consider, for instance, the simpli-

ﬁed Set-Top Box System shown in Fig. 1. This design will serve

as an example throughout the paper. The SoC speciﬁcation

contains six IP cores that exchange messages through a

dozen of point-to-point channels and interact with the external

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 28, NO. 3, MARCH 2009 3

IF1

OCP

IF3

OCP

AMBA

IF2

OCP

IF4

OCP

AMBA

Can be placed

only on chip

boundaries

Can be placed

anywhere

Distance

≤ l

Bandwidth

≤ b

max

Energy per ﬂit:

8.2pJ

Leakage @

1GHz

0.85mW

Area:

5888µm

Energy per ﬂit:

35.2pJ

Leakage @

1GHz

5.1mW

Area:

31488µm

Fig. 2. A library of predeﬁned on-chip communication components.

environment through four major I/O connections (pads). The

data input stream is processed by the demux core (dem) that

sends an audio stream to the audio decoder and a video stream

to the video decoder. The video decoder accesses the external

memory through a memory controller. The memory is used

both as an intermediate storage and to send the decoded stream

to the display controller and HDTV encoder. Finally, a master

CPU controls the operation of all the blocks and handles the

interaction with the environment. Additional non-functional

constraints are often part of the speciﬁcation: e.g, the dem

core must occupy position (0.2, 2.44) (in millimeter); the cpu

communicates with the other cores, one at the time.

Fig. 2 shows a library of on-chip communication compo-

nents that contains a set of communication templates including

interfaces IF1 and IF2 to connect pads with OCP cores, and

interface IF3 and IF4 to connect AMBA cores with OCP

cores. The library also contains various OCP routers that differ

by the number of I/O ports. Each component is characterized

by performance metrics, cost functions, and composition rules.

Possible characterizations include: a link in a given metal layer

can sustain up to a certain bandwidth b

max

and span a distance

no greater than l

; a parameterized synthesizable router may

not have more than a maximum number of I/O ports, and an

IP core may feature only a speciﬁc protocol interface.

A communication structure that serves as the communica-

tion backbone for an SoC is constructed by instantiating com-

munication templates (i.e. components from the library) and

composing them. For example, PAD4 is Fig. 3 is connected to

the memory controller by instantiating templates G

and G

Fig. 3 shows two alternative NoC implementations of the same

speciﬁcation. Network G

is obtained by instantiating the

necessary interfaces plus one 8×8 router while G

is obtained

by instantiating only 2 ×2 routers. The performance and cost

of the communication structure depend on the performance

metrics and the cost functions of each component.

B. Basic Deﬁnitions

The basic element of our formal framework is the com-

munication structure. A communication structure is a set of

interconnected components with associated quantities such as

latency, bandwidth and position. A quantity q takes on values

from a domain D

that is partially ordered by a relation 

The ordering relation captures the notion of a value being

“better” than another value. We assume that ⊥, which denotes

no values, always belongs to the domain of a quantity D

Also, ⊥ 

ν for all ν ∈ D

. A quantity q is ﬁnite if D

is a

Demux

Audio

Video

HDTV

Mem

Ctrl

CPU

Demux

Audio

Video

HDTV

Mem

Ctrl

CPU

Instantiation of

Platform Instance

Fig. 3. Two NoC instances obtained by instantiation and composition of

communication components.

ﬁnite set, and it is bounded if there exists an element ¯ν ∈ D

such that ν 

¯ν for all ν ∈ D

. Bandwidth, for instance,

is modeled by a quantity b. Its domain D

can either be the

set of natural numbers, or it can be a discrete set of values

like D

= {10, 100} (in MB/s). Ordering relation 

is the

same as the ordering relation ≤ deﬁned on natural numbers.

The domain D

of the quantity h representing latency can

be deﬁned as a ﬁnite set of integer numbers, but the ordering

relation 

is now reversed, i.e. 100(ns) 

10(ns).

Given a vector of quantities q = (q

, . . . , q

), the domain

of q is the cross product D

×. . .×D

. It is partially ordered

by a relation 

point-wise induced by the relations 

. We

use the notation ⊥

to denote a n-tuple of ⊥ values. [X → Y ]

denotes the set of all functions from set X to set Y .

Deﬁnition 1. A communication structure is a tuple N(C, q, L)

where C = {c

, . . . , c

} is a set of components, q =

, . . . , q

) is a vector of quantities, and L ⊆ [C → D

]

is a set of communication conﬁgurations. Set C is partitioned

into the set of nodes V ⊆ U

and the set of links E ⊆ V ×V .

The set L of communication conﬁgurations captures the

different ways in which quantities can be associated to com-

ponents. The set U

is called the node universe. Similarly, the

component universe is U

= U

∪ U

, and the conﬁguration

universe is U

= ∪

C⊆U

[C → D

], the union of all possible

conﬁgurations for any subset of components. Let G

be the

set of all communication structures with quantities q.

For a given subscript σ, and vector of quantities q, let N

∈

be a communication structure. Then, we use C

, E

and L

to denote the sets of components, nodes, links, and

conﬁgurations of N

, respectively.

Example 1. (Communication structure): Consider the vector of

quantities q = (x, y) representing the horizontal and vertical co-

ordinates of a component. The domain D

is the set of points where

nodes can be placed. This domain can be described, for instance,

by a discrete set of points or by union of rectangles. If there are no

preferred positions, the elements of D

are not comparable, therefore

the order 

is a ﬂat one, with ⊥ being the minimum element. Given

a communication structure N(C, q, L), the set of conﬁgurations L

captures all the admissible placements of the nodes in V . Since we

do not assign any position to the links, for all l ∈ L and for all

links e ∈ E, l(e) =⊥

. The additional constraint that no two nodes

occupy the same position requires that for all l ∈ L, and for all pair

of nodes u, v ∈ V , l(u) 6= l(v).

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 28, NO. 3, MARCH 2009 4

We introduce two scoping operators on conﬁgurations.

Given a communication structure N (C, q, L), the restriction

of a conﬁguration l ∈ L to a subset of components C

⊆ C,

denoted by l|

, is a function f : C

→ D

such that

f(c) = l(c) for all c ∈ C

. In particular, l|

and l|

are the

restrictions of a conﬁguration l to the set of nodes and links,

respectively. Given a vector q

obtained from q by projecting

away some of the quantities, the projection of a conﬁguration

onto q

is denoted by l[q

], and corresponds to ignoring

the quantities not in q

. We naturally extend these operators

to sets of conﬁgurations, e.g. L[(x)]|

denotes the possible

assignments of horizontal positions to nodes in Example 1.

We use communication structures to capture three important

and related concepts in our framework: the speciﬁcation of an

on-chip communication synthesis problem, the collection of

alternatives to implement the communication (the platform in-

stances), and the ﬁnal communication implementation. These

three structures correspond to different abstraction levels. In

Section II-F we establish precise relations among them to

deﬁne when an implementation reﬁnes a platform instance

and supports a speciﬁcation. It is often necessary to compare

speciﬁcations, platform instances and implementations; e.g.

it is important to be able to order different speciﬁcations

depending on how stringent the constraints are. Similarly, it is

important to compare platform instances depending on their

performance. Therefore, we deﬁne an ordering relation ≤

the set of communication structures G

as follows:

Deﬁnition 2. Given two communication structures N

, N

∈

, N

≤

if and only if C

⊆ C

, and for all l

∈ L

there exists l

∈ L

such that for all c ∈ C

, l

(c).

C. Communication Speciﬁcation

We express the speciﬁcation of an on-chip communication

synthesis problem as a communication structure N

∈ G

where q

= (x, y, a, τ, b, h). Nodes represent IP cores (that

can be sources and/or destination of a communication) and

have an associated position (x, y) in the Euclidean plane, an

area a, and a type τ denoting the supported interface protocol.

Links represent distinct inter-core communications. Each link

is associated with two quantities: a minimum average band-

width b and a maximum latency h. Each conﬁguration l ∈ L

represents a possible combination of the positions and inter-

faces of the cores, and bandwidth and latency requirements

for the communication among them (e.g., to capture different

communication scenarios or different chip ﬂoor-planning).

Example 2. (Communication speciﬁcation): In the set-top box

example of Fig. 1, the position of the dem core is ﬁxed at coordinates

(0.2, 1.44). Hence, each conﬁguration l ∈ L

stb

must be such that

l(dem) = (0.2, 1.44, 0.55, OCP, ⊥, ⊥). Since there are no other

ﬂoor-planning constraints, the position of the other IP cores can be

determined during the synthesis process. The double arrows indicate

that the constraints between the CPU and the IP cores are mutually

exclusive, i.e. the CPU can only communicate with one core at the

time: i.e. for all l ∈ L

stb

[(b)], only one among l((CP U, dem)),

l((CPU, aud)), l((CP U, vid)), l((CP U, mem)) can be different

from zero.

Since the performance and cost of the network depend on

the core positions, an important step in our design ﬂow is to

restrict the possible conﬁgurations of a speciﬁcation by ﬁxing

the position of the ports of each core. In COSI-OCC we rely

on the PARQUET ﬂoor-planner [20] to obtain these positions.

D. Communication Structures Instantiation and Composition

To allow the incremental design of complex on-chip com-

munications, we introduce two operations: renaming and par-

allel composition. The identiﬁers of two nodes in different

sub-nets can be renamed to be the same to indicate that either

one IP implements both or an implicit connection is present

between the two sub-nets at these nodes. A renaming function

r : U

→ U

is a bijection on the vertex universe. R denotes

the set of all renaming functions. Given a communication

structure N and a renaming function r, with abuse of notation

we use r(N) to denote a new communication structure where

the components have been renamed according to r.

The composition of two communication structures N

and

, denoted by N

, results in a new communication struc-

ture N that contains the set of components C

∪C

. We deﬁne

the operator k by two rules. The ﬁrst rule establishes how the

conﬁgurations of the components being merged contribute to

the formation of the ones of the combined entity. The rule

is expressed by the binary operator ⊕

that is commutative

and associative so that the composition of communication

structures also satisﬁes these properties. This is important since

we want the result of the composition to be independent of

the order in which communication structures areinstantiated

and composed. Further, if l

: C

→ D

and l

: C

→ D

then l = l

⊕

must be such that l : C

∪ C

→ D

This operator is deﬁned on sets of conﬁgurations as follows:

let L

⊆ [C

→ D

] and L

⊆ [C

→ D

], then

⊕

= {l

⊕

∈ L

∧ l

∈ L

}. A second

rule restricts the legal compositions by forcing the composed

structure to satisfy certain properties. This rule, that deﬁnes a

class of communication structures the result of the composition

must belong to, is given by a relation between the components

and the conﬁgurations and it is denoted by R ⊆ 2

× U

Deﬁnition 3. Given a binary operator ⊕

and a composi-

tion rule R, and two communication structures N

and N

belonging to G

, their composition is N

= N ∈ G

where C = C

∪ C

, L = {l ∈ L

⊕

|(C, l) ∈ R} 6= ∅; the

composition is not deﬁned if L = ∅.

Example 3. (Composition of communication speciﬁcations): We

want to add an extra video channel to our set-top box chip by reusing

the already instantiated IP cores. In Fig. 4, N

vch

is a communication

structure capturing the communication requirements of a set-top-box

video channel. To reuse the same IP cores, we rename the nodes

according to a renaming function r such that r(d) = dem, r(m) =

mem, r(v) = vid and r(dec) = HDT V . Since the new video

channel must be displayed on the same device, r(P 2) = P AD3

forces the same output pad to be reused. For the demodulator input,

though, we need an additional pad. We also add a new pad to

connect a second memory bank to the memory controller. Fig. 4

shows the result of the composition N

stb

r(N

vch

). Intuitively,

we have added the bandwidths of common requirements and we have

restricted the position of the dem core. More precisely, we need

to deﬁne the operator ⊕

. Given two communication structures

structures N

, N

∈ G

, let l

∈ L

and l

∈ L

be two

conﬁgurations. The conﬁguration l = l

⊕

is deﬁned as follows:

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 28, NO. 3, MARCH 2009 5

dem

aud

vid

HDTV

mem

CPU

PAD1

PAD2 PAD3

PAD4

(0.2, 2.44)

1.44

0.65

0.2

0.46

124

538

0.55

dec

0.65

0.46

124

538

207

297

0.55

dem

vid

HDTV

mem

PAD5

PAD3

124

538

207

297

0.55

PAD6

0.46

0.65

0.55

Renaming

vch

r(N

vch

)

PAD5

124

538

PAD6

414

594

stb

!r(N

vch

)

Fig. 4. Example of parallel composition of networks: the set-top box is

expanded by adding a video channel and an extra off-chip memory bank.

• there is no “interference” between components not shared by N

and N

, i.e l(c) = l

\ C

, and l(c) = l

(c)

for all c ∈ C

\ C

;

• common nodes must be “compatible”, meaning that they must

agree on the positions and interfaces:

∀c ∈ V

∩ V

, l(c) =



(c)

⊥

if l

(c)

(notice that it is sufﬁcient to have some compatible conﬁgura-

tions for the composition to be deﬁned);

• for all c ∈ E

∩ E

, l[(b)](c) = l

[(b)](c) + l

[(b)](c) and

l[(h)](c) = min{l

[(h)](c), l

[(h)](c)}.

We now deﬁne the composition rules. First, we specify that each node

has an assigned position and interface protocol: R

= {(C, l) ∈

× U

|∀v ∈ C, ∀q ∈ {x, y, a, τ }, l[(q)](v) 6= ⊥}. A second

rule may depend on the area budget ν

for the IP cores on the chip:

(

(C, l) ∈ 2

× U



c∈V

l[(a)](c) ≤ ν

)

The two rules are combined as R

= R

∩ R

. We give examples

of other rules in Section II-E.

E. Libraries and Platforms

A platform is the set of all valid compositions that can be

obtained by assembling the components from a given commu-

nication library. These components either have a corresponding

implementation that is ready to be used or can be synthesized

by tools operating at a lower level of abstraction.

A communication library L is a collection of communi-

cation structures, i.e. L ⊂ G

. The elements of a commu-

nication library are templates that can be instantiated and

composed to obtain more complex communication structures.

The vector of quantities that characterize our platform is

= (x, y, τ, in, out, γ) where each node has an associate

position (x, y), a type τ, two multisets in and out of input

and output port interfaces, respectively. Each link is associ-

ated with a capacity γ, i.e. the maximum bandwidth that it

can sustain (Section II-E). Differently from q

, vector q

represents the capabilities of a component; e.g., quantities

x and y in q

denote the coordinates where a component

must be located, whereas the same variables in q

denote the

coordinates where a component can be located.

(Bus node)

(Mesh node)

(Bus segment)

(EW mesh link)

(NS mesh link)

(Interfaces)

i, j

i, j + 1

i + 1, j

i, j

max

dem

aud vid HDTV

mem

CPU

dem

aud vid HDTV

mem

CPU

−1, −1

0, 0

0, −1

−1, 0

0, 1

−1, 1

= r

)

)!r

)

max

[0, γ

max

]

Fig. 5. Example of a library L and two alternative implementations for the

set-top box based on composing elements instantiated from L.

The deﬁnition of composition k

captures the set of valid

communication architectures (i.e. communication platform in-

stances) that can be obtained out of the communication library.

The deﬁnition of the rules is more involved than in the case of

Example 3 and depends on the design space of interest. The

following example shows the ﬂexibility that our framework

provides in deﬁning the set of communication structures that

can be obtained by composition of library elements.

Example 4. Composition rules: Consider a communication library

whose elements are nodes and links. Fig. 5 shows a communication

library L and two possible platform instances N

and N

. Library

L contains the following set of components: a bus node and a

bidirectional bus-segment connecting two bus nodes; a mesh node

and two mesh links for East-West connection and North-South con-

nection, respectively. It contains also a set of interface communication

structures to connect IP cores to bus nodes and mesh nodes. Each

node has an associated multi-set of input interfaces in and output

interfaces out (depicted as ﬁlled and non-ﬁlled shapes attached to

nodes in Fig. 5). A link connects an output interface of a node to

an input interface of another node. Mesh links have an associated

maximum capacity γ

max

while bus-segments (including the link

between an IP core and a bus node) have an associated interval

of capacities [0, γ

max

] corresponding to different conﬁgurations. We

introduce two more quantities i

and i

for mesh structures that

are the row and column index of a node. Now, we state a set of

composition rules such that the only platform instances that are valid

in this platform are either busses or meshes:

1) The number of bus nodes can be at most the number of bus

segments minus one. This ensures that the topology of a bus

is a collection of trees. Also, since a bus node has only two

bidirectional ports to connect to other bus nodes,each bus is a

chain of IP cores (as shown by the platform instance N

2) An East-West mesh link can connect two mesh nodes (u, v)

only if l[(i

, i

)](u) = (i, j) and l[(i

, i

)](v) = (i, j + 1);

a North-South mesh link can connect two mesh nodes (u, v)

only if l[(i

, i

)](u) = (i, j) and l[(i

, i

)](v) = (i+ 1, j) (as

shown by the platform instance N

3) A bus conﬁguration l forces the sum of the capacities of the

links connecting the cores to the bus to be less than γ

max

. This

restricts the possible bus organizations and models the sharing

of the bus capacity among all connected IP cores.

These three rules deﬁne R

for this speciﬁc platform.

A Methodology for Constraint-Driven Synthesis of On-Chip Communications

Figures

Citations

A power-aware mapping approach to map IP cores onto NoCs under bandwidth and latency constraints

Design of an Energy-Efficient Asynchronous NoC and Its Optimization Tools for Heterogeneous SoCs

The Future of Formal Methods and GALS Design

Comparing Energy and Latency of Asynchronous and Synchronous NoCs for Embedded SoCs

CusNoC: Fast Full-Chip Custom NoC Generation

References

Networks on chips: a new SoC paradigm

Route packets, not wires: on-chip interconnection networks

Deadlock-Free Message Routing in Multiprocessor Interconnection Networks

Deadlock-free message routing in multiprocessor interconnection networks

Circuits, interconnections, and packaging for VLSI

Related Papers (5)

Outstanding Research Problems in NoC Design: System, Microarchitecture, and Circuit Perspectives

ORION 2.0: a fast and accurate NoC power and area model for early-stage design space exploration

Principles and Practices of Interconnection Networks

Energy-aware mapping for tile-based NoC architectures under performance constraints

A 5-GHz Mesh Interconnect for a Teraflops Processor

Frequently Asked Questions (12)

Q1. What have the authors contributed in "A methodology for constraint-driven synthesis of on-chip communications" ?

Q2. What are the two operations that allow the design of complex on-chip communications?

Q3. What is the restriction of a configuration to a subset of components?

Q4. How is the power of the noC found by the heuristic?

Q5. How many bits of flit were used to achieve a maximum link capacity of 3.2?

Q6. Why is the maximum number of nodes limited to lI?

Q7. In what section does the optimization technique explore the isomorphic-free set of regular top?

Q8. What is the communication structure among all possible platform instances?

Q9. How many libraries of communication components were used in this experiment?

Q10. What can be used to optimize the bus circuitry?

Q11. What is the procedure that checks the delay constraints of the nodes?

Q12. What is the degree constraint for the rip-up and reroute approach?