scispace - formally typeset
Open AccessJournal ArticleDOI

A Methodology for Constraint-Driven Synthesis of On-Chip Communications

Reads0
Chats0
TLDR
A mathematical framework to model communication at different levels of abstraction from the point-to-point input specification to the library elements and the final implementation is developed.
Abstract
We present a methodology and an optimization framework for the synthesis of on-chip communication through the assembly of components such as interfaces, routers, buses, and links, from a target library. Models for functionality, cost, and performance of each element are captured in the library together with their composition rules. We develop a mathematical framework to model communication at different levels of abstraction from the point-to-point input specification to the library elements and the final implementation.

read more

Content maybe subject to copyright    Report

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 28, NO. 3, MARCH 2009 1
A Methodology for Constraint-Driven Synthesis of
On-Chip Communications
Alessandro Pinto, Member, IEEE, Luca P. Carloni, Member, IEEE, and
Alberto L. Sangiovanni-Vincentelli, Fellow, IEEE
Abstract—We present a methodology and an optimization
framework for the synthesis of on-chip communication through
the assembly of components such as interfaces, routers, buses and
links, from a target library. Models for functionality, cost, and
performance of each element are captured in the library together
with their composition rules. We develop a mathematical frame-
work to model communication at different levels of abstraction
from the point-to-point input specification to the library elements
and the final implementation.
Index Terms—Communication synthesis, System-on-chip, In-
terconnect synthesis, Performance optimization.
I. INTRODUCTION
W
ITH the advances of IC technology, global intercon-
nects have become the dominant factor in determining
chip performance: they are not only becoming responsible for
a larger fraction of the overall delay and power dissipation
but exacerbate also design problems such as noise coupling,
routing congestion, and timing closure, thereby imposing
severe limitations on design productivity [1], [2]. Because of
these characteristics, most VLSI circuits can be considered
distributed systems, a fact that challenges traditional design
methodologies and the electronic design automation tools that
are based on them [3]. Systems-on-Chip (SoCs) are typically
designed by assembling intellectual property (IP) components
from different vendors and/or different divisions of the same
company in the attempt of reducing time-to-market by reusing
pre-designed and pre-verified elements. However, since these
components are designed independently, the assembly step
is often a challenging problem that requires the design of
communication interfaces to match different protocols and data
parallelism, and the routing of global interconnect wires to
meet the constraints imposed by the target clock period.
The Open Core Protocol (OCP) [4] tackles this problem by
defining a standard open-domain interface with which IP cores
should comply to allow fast integration using appropriate inter-
connect architectures. While there is no intrinsic limitation on
This work was partially supported by the GSRC Focus Center, one of
five research centers funded under the Focus Center Research Program, a
Semiconductor Research Corporation program, and by the National Science
Foundation (Award #: 0644202).
A. Pinto is with United Technologies Research Center, East Hartford, CT,
most of this work was carried out while at the Dept. of EECS, U.C. Berkeley,
CA 94720, (apinto@eecs.berkeley.edu).
L.P. Carloni is with Department of Computer Science, Columbia University
New York, NY 10027 (luca@cs.columbia.edu).
A. Sangiovanni-Vincentelli is with the Dept. of EECS, U.C. Berkeley, CA
94720, (alberto@eecs.berkeley.edu). Manuscript received November 15,
2007; revised April 28, 2008. Copyright
c
2008 IEEE. Personal use of this
material is permitted. However, permission to use this material for any other
purposes must be obtained from the IEEE by sending an email to pubs-
permissions@ieee.org.
the interconnect architecture for OCP, most designers rely on
traditional bus architectures so that pre-designed components
can be used. In this domain, proprietary protocols such as the
ARM AMBA BUS and the IBM CORECONNECT are popular
among SoC designers making the adoption of a universal
standard difficult at best.
We argued that SoCs are distributed systems. For this
reason, bus architectures may not be always ideal; in fact,
a set of seminal papers has proposed scalable, multi-hop,
packet-switched Networks-on-Chip (NoCs) as a solution for
the integration of IP components as an interesting alterna-
tive [5]–[7]. Borrowing from the communication networks
literature, an NoC can be built through the combination of
heterogeneous elements such as interfaces, routers, and links.
The NoC design is a challenging problem because there are
many degrees of freedom (e.g. network topologies, routing
protocols, flow-control mechanisms, positions of the commu-
nication components and core interfaces) as well as multiple
optimization goals (e.g. performance, power, area occupation
and reliability). Hence, the problem had been simplified by
limiting the number and types of components considered, by
focusing on a subset of the relevant objectives, by constraining
NoC topology and components positions, and by dividing the
optimization process in successive stages. Limiting the degrees
of freedom has also the important side effect of reducing
implementation and layout complexity.
In [8] Bertozzi et al. propose NETCHIP, a synthesis flow to
derive an application-specific NoC by mapping the application
cores on standard topologies (e.g torus, mesh, hypercube) in
an optimal way. In [9], Hu and Marculescu perform mapping
and routing on the NoC with optimal energy and performance.
Lahiri et al. use standard topologies consisting of sets of
channels (point-to-point links or shared busses) connected by
bridges [10]. Ogras et al. propose a perturbation method that
starting from the mapping of an application on a standard
topology optimizes performance and cost by inserting custom
long links between routers [11]. In [12] Murali et al. synthesize
NoCs that, albeit being more general than the approaches
that start from a regular topology, are still constrained to be
“two-level structures”, where star topologies are connected
by links to satisfy inter-cluster communication requirements.
In [13] Srinivasan et al. synthesize an application-specific
NoC without assuming any pre-existing interconnection fabric.
The synthesis problem is linearized and solved via integer
linear programming (ILP) that, due to its complexity, yields
running time of the order of several hours even for relatively
small instances. In [14] the same authors propose an efficient
approximation algorithm that is strongly tied to the cost model

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 28, NO. 3, MARCH 2009 2
and that does not consider constraints on the router size (i.e.
number of inputs and outputs).
While a rich set of interesting results exists in the literature,
few are the examples of practical applications of NoCs. In fact,
the debate between those who favor standard bus architectures
or variations thereof and those who advocate the adoption
of NoC approaches ranging from constrained architectures to
custom ones is vibrant. We do not take sides even though
the NoC approach has undisputable fundamental merits that
may make it successful in the long run. Instead, we propose a
general methodology for the design of on-chip communication
that can explore a large number of alternatives including as
special cases NoCs, bus architectures and hybrid ones. Thanks
to its generality our approach can be used to build a framework
where different constrained solutions are compared using a
number of evaluation factors.
We address the synthesis of optimal heterogeneous networks
by assembling components from a fine-grained library without
enforcing any constraint on their topology other than the ones
formally captured in the library. In particular, the network that
we obtain need not be direct and not even connected if these
constraints are not captured in the composition rules of the
communication components.
Our approach is detailed in the rest of the paper as follows:
In Section II, we introduce formally the SoC design specifi-
cation (i.e. the function), the target technology process with
the library of communication components and the final com-
munication implementation. At a first glance, the formalism
used in this section may seem overly complex. However, in
our opinion, the benefits it offers in terms of generality (the
same formalism applies independent of the communication
synthesis problem being investigated) outweigh its complexity.
In Section III, we show how to use this formal framework
to formulate a general optimization problem for a general
class of libraries. In Section IV, we use our framework to
formulate the communication synthesis problem in the specific
case of NoCs. and provide a heuristic algorithm to solve the
resulting complex integer optimization problem. The algorithm
is independent from the specific input constraints and the
target platform. We do report a customization of the algorithm
that takes into account bandwidth and latency constraints, ex-
pressed as hop count, to synthesize a minimal-power NoC. The
general algorithmic framework can be customized in several
other ways by changing the cost function and constraints.
The material presented in this paper is the theoretical foun-
dation of COSI-OCC , a design flow for on-chip communication
synthesis design that is part of the COmmunication Syn-
thesis Infrastructure (COSI). COSI is a public-domain design
framework for the analysis and synthesis of interconnection
networks [15]. Our goal has been to provide an infrastructure
that can be used by researchers and designers as a basis
for developing new design flows by integrating additional
models, library elements, analysis tools and synthesis tools
1
.
In Section V, we briefly describe COSI-OCC together with
the results we obtain by applying it to a number of test
1
This approach is similar to the one our group followed in developing
MIS that has been used for years as a platform to invent and test new logic
synthesis algorithms [16].
PAD1
PAD2 PAD3
PAD4
(0.2, 2.44)
1.44
0.65
0.2
0.46
124
10
25
538
207
34
34
297
0.55
0.55
Mutually
exclusive
constraints
dem
(OCP)
aud
(OCP)
vid
(OCP)
mem
(OCP)
HDTV
(OCP)
CPU
(AMBA)
N
stb
C
Area ( )
Position
Fig. 1. The system-level specification of a simplified Set-Top Box. Each
core in the specification is annotated with and area in mm
2
and each arrow
is annotated with a bandwidth constraint in M B/s.
cases for NoC design. We present more details on COSI and
COSI-OCC in [17] and we provide a detailed comparison of
our approach with other on-chip communication design tools
in [18].
II. THE METHODOLOGY AND ITS MATHEMATICAL
REPRESENTATION
A. The Methodology
The general approach is based on Platform-Based Design
(PBD) [19] where the design specification and the imple-
mentation alternatives are kept separate. The methodology is
recursive: the functional specification is implemented on a
particular architecture through a series of refinement steps. At
each step, which corresponds to a specific level of abstraction,
the implementation alternatives are characterized by a set of
components, called library, that can be instantiated, config-
ured, and assembled according to specific rules, to derive a
more complex structure. The set of components together with
their compositional rules define a platform which is a family
of admissible solutions. The task of the synthesis process is
then to select one out of this family (a platform instance) and a
mapping of the specification onto the components that satisfy
the requirements and possibly optimize the objectives of the
design. The implementation refines both requirements and
platform instance and is defined at a lower level of abstraction.
In this process, it is essential to formalize how requirements
are specified, how the library is described, and how the
composition rules are defined and applied to generate the space
of admissible solutions. The composition rules can be used to
encode constraints related to the topology that the designer
wishes to consider while the components in the library de-
termine which kind of “nodes” can be selected. To select a
platform instance using an optimization algorithm we must
associate to each library component (and to the hierarchical
composition of two or more of them) a “characterization” in
terms of cost, performance, power, and “type” (e.g., number of
ports and interface type of a router) that allows us to evaluate
metrics associated with the objectives and constraints of the
design.
To illustrate our approach, consider, for instance, the simpli-
fied Set-Top Box System shown in Fig. 1. This design will serve
as an example throughout the paper. The SoC specification
contains six IP cores that exchange messages through a
dozen of point-to-point channels and interact with the external

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 28, NO. 3, MARCH 2009 3
IF1
OCP
OCP
OCP
OCP
OCP
R
OCP
OCP
OCP
OCP
R
IF3
OCP
AMBA
IF2
OCP
IF4
OCP
AMBA
G
1
G
2
P
P
Can be placed
only on chip
boundaries
Can be placed
anywhere
Distance
l
st
Bandwidth
b
max
Energy per flit:
8.2pJ
Leakage @
1GHz
:
0.85mW
Area:
5888µm
2
Energy per flit:
35.2pJ
Leakage @
1GHz
:
5.1mW
Area:
31488µm
2
Fig. 2. A library of predefined on-chip communication components.
environment through four major I/O connections (pads). The
data input stream is processed by the demux core (dem) that
sends an audio stream to the audio decoder and a video stream
to the video decoder. The video decoder accesses the external
memory through a memory controller. The memory is used
both as an intermediate storage and to send the decoded stream
to the display controller and HDTV encoder. Finally, a master
CPU controls the operation of all the blocks and handles the
interaction with the environment. Additional non-functional
constraints are often part of the specification: e.g, the dem
core must occupy position (0.2, 2.44) (in millimeter); the cpu
communicates with the other cores, one at the time.
Fig. 2 shows a library of on-chip communication compo-
nents that contains a set of communication templates including
interfaces IF1 and IF2 to connect pads with OCP cores, and
interface IF3 and IF4 to connect AMBA cores with OCP
cores. The library also contains various OCP routers that differ
by the number of I/O ports. Each component is characterized
by performance metrics, cost functions, and composition rules.
Possible characterizations include: a link in a given metal layer
can sustain up to a certain bandwidth b
max
and span a distance
no greater than l
st
; a parameterized synthesizable router may
not have more than a maximum number of I/O ports, and an
IP core may feature only a specific protocol interface.
A communication structure that serves as the communica-
tion backbone for an SoC is constructed by instantiating com-
munication templates (i.e. components from the library) and
composing them. For example, PAD4 is Fig. 3 is connected to
the memory controller by instantiating templates G
1
and G
2
.
Fig. 3 shows two alternative NoC implementations of the same
specification. Network G
1
P
is obtained by instantiating the
necessary interfaces plus one 8×8 router while G
2
P
is obtained
by instantiating only 2 ×2 routers. The performance and cost
of the communication structure depend on the performance
metrics and the cost functions of each component.
B. Basic Definitions
The basic element of our formal framework is the com-
munication structure. A communication structure is a set of
interconnected components with associated quantities such as
latency, bandwidth and position. A quantity q takes on values
from a domain D
q
that is partially ordered by a relation
q
.
The ordering relation captures the notion of a value being
“better” than another value. We assume that , which denotes
no values, always belongs to the domain of a quantity D
q
.
Also,
q
ν for all ν D
q
. A quantity q is finite if D
q
is a
P4
Demux
P1
P2
Audio
Video
HDTV
Mem
Ctrl
CPU
P4
Demux
P1
P2
Audio
Video
HDTV
Mem
Ctrl
CPU
P3
P3
Instantiation of
G
1
Instantiation of
G
2
Platform Instance
G
1
P
Platform Instance
G
2
P
Fig. 3. Two NoC instances obtained by instantiation and composition of
communication components.
finite set, and it is bounded if there exists an element ¯ν D
q
such that ν
q
¯ν for all ν D
q
. Bandwidth, for instance,
is modeled by a quantity b. Its domain D
b
can either be the
set of natural numbers, or it can be a discrete set of values
like D
b
= {10, 100} (in MB/s). Ordering relation
b
is the
same as the ordering relation defined on natural numbers.
The domain D
h
of the quantity h representing latency can
be defined as a finite set of integer numbers, but the ordering
relation
h
is now reversed, i.e. 100(ns)
h
10(ns).
Given a vector of quantities q = (q
1
, . . . , q
k
), the domain
of q is the cross product D
q
1
×. . .×D
q
k
. It is partially ordered
by a relation
q
point-wise induced by the relations
q
i
. We
use the notation
n
to denote a n-tuple of values. [X Y ]
denotes the set of all functions from set X to set Y .
Definition 1. A communication structure is a tuple N(C, q, L)
where C = {c
1
, . . . , c
n
} is a set of components, q =
(q
1
, . . . , q
k
) is a vector of quantities, and L [C D
q
]
is a set of communication configurations. Set C is partitioned
into the set of nodes V U
V
and the set of links E V ×V .
The set L of communication configurations captures the
different ways in which quantities can be associated to com-
ponents. The set U
V
is called the node universe. Similarly, the
component universe is U
C
= U
V
U
2
V
, and the configuration
universe is U
q
=
CU
C
[C D
q
], the union of all possible
configurations for any subset of components. Let G
q
be the
set of all communication structures with quantities q.
For a given subscript σ, and vector of quantities q, let N
σ
G
q
be a communication structure. Then, we use C
σ
,V
σ
, E
σ
and L
σ
to denote the sets of components, nodes, links, and
configurations of N
σ
, respectively.
Example 1. (Communication structure): Consider the vector of
quantities q = (x, y) representing the horizontal and vertical co-
ordinates of a component. The domain D
q
is the set of points where
nodes can be placed. This domain can be described, for instance,
by a discrete set of points or by union of rectangles. If there are no
preferred positions, the elements of D
q
are not comparable, therefore
the order
q
is a flat one, with being the minimum element. Given
a communication structure N(C, q, L), the set of configurations L
captures all the admissible placements of the nodes in V . Since we
do not assign any position to the links, for all l L and for all
links e E, l(e) =
2
. The additional constraint that no two nodes
occupy the same position requires that for all l L, and for all pair
of nodes u, v V , l(u) 6= l(v).

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 28, NO. 3, MARCH 2009 4
We introduce two scoping operators on configurations.
Given a communication structure N (C, q, L), the restriction
of a configuration l L to a subset of components C
0
C,
denoted by l|
C
0
, is a function f : C
0
D
q
such that
f(c) = l(c) for all c C
0
. In particular, l|
V
and l|
E
are the
restrictions of a configuration l to the set of nodes and links,
respectively. Given a vector q
0
obtained from q by projecting
away some of the quantities, the projection of a configuration
onto q
0
is denoted by l[q
0
], and corresponds to ignoring
the quantities not in q
0
. We naturally extend these operators
to sets of configurations, e.g. L[(x)]|
V
denotes the possible
assignments of horizontal positions to nodes in Example 1.
We use communication structures to capture three important
and related concepts in our framework: the specification of an
on-chip communication synthesis problem, the collection of
alternatives to implement the communication (the platform in-
stances), and the final communication implementation. These
three structures correspond to different abstraction levels. In
Section II-F we establish precise relations among them to
define when an implementation refines a platform instance
and supports a specification. It is often necessary to compare
specifications, platform instances and implementations; e.g.
it is important to be able to order different specifications
depending on how stringent the constraints are. Similarly, it is
important to compare platform instances depending on their
performance. Therefore, we define an ordering relation
q
on
the set of communication structures G
q
as follows:
Definition 2. Given two communication structures N
1
, N
2
G
q
, N
1
q
N
2
if and only if C
1
C
2
, and for all l
1
L
1
there exists l
2
L
2
such that for all c C
1
, l
1
(c)
q
l
2
(c).
C. Communication Specification
We express the specification of an on-chip communication
synthesis problem as a communication structure N
C
G
q
C
,
where q
C
= (x, y, a, τ, b, h). Nodes represent IP cores (that
can be sources and/or destination of a communication) and
have an associated position (x, y) in the Euclidean plane, an
area a, and a type τ denoting the supported interface protocol.
Links represent distinct inter-core communications. Each link
is associated with two quantities: a minimum average band-
width b and a maximum latency h. Each configuration l L
C
represents a possible combination of the positions and inter-
faces of the cores, and bandwidth and latency requirements
for the communication among them (e.g., to capture different
communication scenarios or different chip floor-planning).
Example 2. (Communication specification): In the set-top box
example of Fig. 1, the position of the dem core is fixed at coordinates
(0.2, 1.44). Hence, each configuration l L
stb
C
must be such that
l(dem) = (0.2, 1.44, 0.55, OCP, , ). Since there are no other
floor-planning constraints, the position of the other IP cores can be
determined during the synthesis process. The double arrows indicate
that the constraints between the CPU and the IP cores are mutually
exclusive, i.e. the CPU can only communicate with one core at the
time: i.e. for all l L
stb
C
[(b)], only one among l((CP U, dem)),
l((CPU, aud)), l((CP U, vid)), l((CP U, mem)) can be different
from zero.
Since the performance and cost of the network depend on
the core positions, an important step in our design flow is to
restrict the possible configurations of a specification by fixing
the position of the ports of each core. In COSI-OCC we rely
on the PARQUET floor-planner [20] to obtain these positions.
D. Communication Structures Instantiation and Composition
To allow the incremental design of complex on-chip com-
munications, we introduce two operations: renaming and par-
allel composition. The identifiers of two nodes in different
sub-nets can be renamed to be the same to indicate that either
one IP implements both or an implicit connection is present
between the two sub-nets at these nodes. A renaming function
r : U
V
U
V
is a bijection on the vertex universe. R denotes
the set of all renaming functions. Given a communication
structure N and a renaming function r, with abuse of notation
we use r(N) to denote a new communication structure where
the components have been renamed according to r.
The composition of two communication structures N
1
and
N
2
, denoted by N
1
kN
2
, results in a new communication struc-
ture N that contains the set of components C
1
C
2
. We define
the operator k by two rules. The first rule establishes how the
configurations of the components being merged contribute to
the formation of the ones of the combined entity. The rule
is expressed by the binary operator
q
that is commutative
and associative so that the composition of communication
structures also satisfies these properties. This is important since
we want the result of the composition to be independent of
the order in which communication structures areinstantiated
and composed. Further, if l
1
: C
1
D
q
and l
2
: C
2
D
q
,
then l = l
1
q
l
2
must be such that l : C
1
C
2
D
q
.
This operator is defined on sets of configurations as follows:
let L
1
[C
1
D
q
] and L
2
[C
2
D
q
], then
L
1
q
L
2
= {l
1
q
l
2
|l
1
L
1
l
2
L
2
}. A second
rule restricts the legal compositions by forcing the composed
structure to satisfy certain properties. This rule, that defines a
class of communication structures the result of the composition
must belong to, is given by a relation between the components
and the configurations and it is denoted by R 2
U
C
× U
q
.
Definition 3. Given a binary operator
q
and a composi-
tion rule R, and two communication structures N
1
and N
2
belonging to G
q
, their composition is N
1
k
R
q
N
2
= N G
q
,
where C = C
1
C
2
, L = {l L
1
q
L
2
|(C, l) R} 6= ; the
composition is not defined if L = .
Example 3. (Composition of communication specifications): We
want to add an extra video channel to our set-top box chip by reusing
the already instantiated IP cores. In Fig. 4, N
vch
is a communication
structure capturing the communication requirements of a set-top-box
video channel. To reuse the same IP cores, we rename the nodes
according to a renaming function r such that r(d) = dem, r(m) =
mem, r(v) = vid and r(dec) = HDT V . Since the new video
channel must be displayed on the same device, r(P 2) = P AD3
forces the same output pad to be reused. For the demodulator input,
though, we need an additional pad. We also add a new pad to
connect a second memory bank to the memory controller. Fig. 4
shows the result of the composition N
stb
C
k
R
C
q
C
r(N
vch
). Intuitively,
we have added the bandwidths of common requirements and we have
restricted the position of the dem core. More precisely, we need
to define the operator
q
C
. Given two communication structures
structures N
1
, N
2
G
q
C
, let l
1
L
1
and l
2
L
2
be two
configurations. The configuration l = l
1
q
C
l
2
is defined as follows:

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 28, NO. 3, MARCH 2009 5
dem
aud
vid
HDTV
mem
CPU
PAD1
PAD2 PAD3
PAD4
(0.2, 2.44)
1.44
0.65
0.2
0.46
124
10
25
538
0.55
0.55
d
v
dec
m
P1
P2
P3
0.65
0.46
124
15
538
207
34
34
297
0.55
0.55
dem
vid
HDTV
mem
PAD5
PAD3
124
15
538
207
34
34
297
0.55
PAD6
0.46
0.65
0.55
Renaming
N
vch
r(N
vch
)
PAD5
124
538
PAD6
30
414
68
68
594
N
stb
C
!r(N
vch
)
Fig. 4. Example of parallel composition of networks: the set-top box is
expanded by adding a video channel and an extra off-chip memory bank.
there is no “interference” between components not shared by N
1
and N
2
, i.e l(c) = l
1
(c) for all c C
1
\ C
2
, and l(c) = l
2
(c)
for all c C
2
\ C
1
;
common nodes must be “compatible”, meaning that they must
agree on the positions and interfaces:
c V
1
V
2
, l(c) =
l
1
(c) if l
1
(c) = l
2
(c)
6
if l
1
(c) 6= l
2
(c)
(notice that it is sufficient to have some compatible configura-
tions for the composition to be defined);
for all c E
1
E
2
, l[(b)](c) = l
1
[(b)](c) + l
2
[(b)](c) and
l[(h)](c) = min{l
1
[(h)](c), l
2
[(h)](c)}.
We now define the composition rules. First, we specify that each node
has an assigned position and interface protocol: R
v
C
= {(C, l)
2
U
C
× U
q
C
|∀v C, q {x, y, a, τ }, l[(q)](v) 6= ⊥}. A second
rule may depend on the area budget ν
a
for the IP cores on the chip:
R
a
C
=
(
(C, l) 2
U
C
× U
q
C
X
cV
l[(a)](c) ν
a
)
The two rules are combined as R
C
= R
v
C
R
a
C
. We give examples
of other rules in Section II-E.
E. Libraries and Platforms
A platform is the set of all valid compositions that can be
obtained by assembling the components from a given commu-
nication library. These components either have a corresponding
implementation that is ready to be used or can be synthesized
by tools operating at a lower level of abstraction.
A communication library L is a collection of communi-
cation structures, i.e. L G
q
. The elements of a commu-
nication library are templates that can be instantiated and
composed to obtain more complex communication structures.
The vector of quantities that characterize our platform is
q
P
= (x, y, τ, in, out, γ) where each node has an associate
position (x, y), a type τ, two multisets in and out of input
and output port interfaces, respectively. Each link is associ-
ated with a capacity γ, i.e. the maximum bandwidth that it
can sustain (Section II-E). Differently from q
C
, vector q
P
represents the capabilities of a component; e.g., quantities
x and y in q
C
denote the coordinates where a component
must be located, whereas the same variables in q
P
denote the
coordinates where a component can be located.
NN
S
S
E
E
W
W
NN
S
S
W
W
NN
S
S
W
W
E
E
E
E
NN
S
S
W
W
E
E
NN
S
S
W
W
E
E
L
(Bus node)
(Mesh node)
(Bus segment)
(EW mesh link)
(NS mesh link)
(Interfaces)
N
1
N
2
N
3
N
4
N
5
N
6
N
7
N
9
i, j
i, j + 1
i + 1, j
i, j
l
max
dem
aud vid HDTV
mem
CPU
dem
aud vid HDTV
mem
CPU
1, 1
0, 0
0, 1
1, 0
0, 1
1, 1
N
1
P
N
2
P
N
!
P
= r
1
(N
6
)
N
!
P
!r
2
(N
6
)!r
3
(N
3
)
γ
M
max
γ
M
max
[0, γ
B
max
]
Fig. 5. Example of a library L and two alternative implementations for the
set-top box based on composing elements instantiated from L.
The definition of composition k
R
P
q
P
captures the set of valid
communication architectures (i.e. communication platform in-
stances) that can be obtained out of the communication library.
The definition of the rules is more involved than in the case of
Example 3 and depends on the design space of interest. The
following example shows the flexibility that our framework
provides in defining the set of communication structures that
can be obtained by composition of library elements.
Example 4. Composition rules: Consider a communication library
whose elements are nodes and links. Fig. 5 shows a communication
library L and two possible platform instances N
1
P
and N
2
P
. Library
L contains the following set of components: a bus node and a
bidirectional bus-segment connecting two bus nodes; a mesh node
and two mesh links for East-West connection and North-South con-
nection, respectively. It contains also a set of interface communication
structures to connect IP cores to bus nodes and mesh nodes. Each
node has an associated multi-set of input interfaces in and output
interfaces out (depicted as filled and non-filled shapes attached to
nodes in Fig. 5). A link connects an output interface of a node to
an input interface of another node. Mesh links have an associated
maximum capacity γ
M
max
while bus-segments (including the link
between an IP core and a bus node) have an associated interval
of capacities [0, γ
B
max
] corresponding to different configurations. We
introduce two more quantities i
x
and i
y
for mesh structures that
are the row and column index of a node. Now, we state a set of
composition rules such that the only platform instances that are valid
in this platform are either busses or meshes:
1) The number of bus nodes can be at most the number of bus
segments minus one. This ensures that the topology of a bus
is a collection of trees. Also, since a bus node has only two
bidirectional ports to connect to other bus nodes,each bus is a
chain of IP cores (as shown by the platform instance N
1
P
).
2) An East-West mesh link can connect two mesh nodes (u, v)
only if l[(i
x
, i
y
)](u) = (i, j) and l[(i
x
, i
y
)](v) = (i, j + 1);
a North-South mesh link can connect two mesh nodes (u, v)
only if l[(i
x
, i
y
)](u) = (i, j) and l[(i
x
, i
y
)](v) = (i+ 1, j) (as
shown by the platform instance N
2
P
).
3) A bus configuration l forces the sum of the capacities of the
links connecting the cores to the bus to be less than γ
B
max
. This
restricts the possible bus organizations and models the sharing
of the bus capacity among all connected IP cores.
These three rules define R
P
for this specific platform.

Citations
More filters
Journal ArticleDOI

A power-aware mapping approach to map IP cores onto NoCs under bandwidth and latency constraints

TL;DR: This article investigates the Intellectual Property (IP) mapping problem that maps a given set of IP cores onto the tiles of a mesh-based Network-on-Chip (NoC) architecture such that the power consumption due to intercore communications is minimized.
Journal ArticleDOI

Design of an Energy-Efficient Asynchronous NoC and Its Optimization Tools for Heterogeneous SoCs

TL;DR: This work designed and evaluated a network-on-chip (NoC) for such an application, including tools to optimize for power and communication latency, and results indicate the asynchronous network was more energy-efficient, lower in area, and provided comparable or superior message latency.
Journal ArticleDOI

The Future of Formal Methods and GALS Design

TL;DR: This treatise is intended to provoke debate as it projects what technologies will look like in the future, and discusses, among other aspects, the role of formal verification, education, the CAD industry, and the ever present tradeoff between greed and fear.
Proceedings ArticleDOI

Comparing Energy and Latency of Asynchronous and Synchronous NoCs for Embedded SoCs

TL;DR: This paper compares energy and performance characteristics of asynchronous (clockless) and synchronous network-on-chip implementations, optimized for a number of SoC designs, and adapted the COSI-2.0 framework with ORION 2.0 router and wire models for synchronousnetwork generation.
Journal ArticleDOI

CusNoC: Fast Full-Chip Custom NoC Generation

TL;DR: Experimental results show that CusNoC produces custom NoCs with better performance than previous methods while the computation time is significantly shorter and this method is also more scalable, which makes it ideal for complicated systems.
References
More filters
Journal ArticleDOI

Networks on chips: a new SoC paradigm

TL;DR: Focusing on using probabilistic metrics such as average values or variance to quantify design objectives such as performance and power will lead to a major change in SoC design methodologies.
Proceedings ArticleDOI

Route packets, not wires: on-chip interconnection networks

TL;DR: This paper introduces the concept of on-chip networks, sketches a simple network, and discusses some challenges in the architecture and design of these networks.
Journal ArticleDOI

Deadlock-Free Message Routing in Multiprocessor Interconnection Networks

TL;DR: In this article, a deadlock-free routing algorithm for arbitrary interconnection networks using the concept of virtual channels is presented, where the necessary and sufficient condition for deadlock free routing is the absence of cycles in a channel dependency graph.
Book

Deadlock-free message routing in multiprocessor interconnection networks

TL;DR: A deadlock-free routing algorithm can be generated for arbitrary interconnection networks using the concept of virtual channels, which is used to develop deadlocked routing algorithms for k-ary n-cubes, for cube-connected cycles, and for shuffle-exchange networks.
Related Papers (5)
Frequently Asked Questions (12)
Q1. What have the authors contributed in "A methodology for constraint-driven synthesis of on-chip communications" ?

The authors present a methodology and an optimization framework for the synthesis of on-chip communication through the assembly of components such as interfaces, routers, buses and links, from a target library. 

To allow the incremental design of complex on-chip communications, the authors introduce two operations: renaming and parallel composition. 

Given a communication structure N(C,q, L), the restriction of a configuration l ∈ L to a subset of components C′ ⊆ C, denoted by l|C′ , is a function f : C′ → Dq such that f(c) = l(c) for all c ∈ C′. 

The power of the NoC found by the heuristic is within 2x from the power found by CPLEX that is very optimistic for the change in the cost function and for the relaxation of the integer constraints. 

Since the total memory bandwidth is 3GBps, the authors set the flit width to 128 bits to achieve a link capacity of 3.2GBps with a maximum flit rate of 200 ·106 per input port of the routers. 

Because the authors want lI [(x, y, τ)] to be injective (i.e. only one component of a specific type can be installed in a particular location), the maximum number of nodes in any platform instance is limited to |D(x,y,τ)|. 

In [24], the optimization technique explores the isomorphic-free set of all regular topologies and in [25] the authors assume that one NP is given as input to their algorithm. 

According to Lemma 1, if the authors can find the greatest element NP of 〈L〉 with respect to the ordering relation ≤qP , then the solution of problem PR1 with NP = NP is the best communication structure among all possible platform instances. 

The authors used six libraries of communication components differing for the flit-width of the data path (32 and 128 bits corresponding to 280 ·106 and 70 ·106 flits per second, respectively) and the size of the largest switch available in the library (2 × 2, 5 × 5 and 8 × 8). 

The transfer table information can be used at a lower abstraction level to optimize the bus circuitry (e.g. decoders and multiplexers) or even to segment the bus and insert bus bridges. 

If a delay model must be taken into account to check delay constraints, the best path is discovered by a labeling algorithm (SpLabeling) that finds the minimumcost constrained shortest path between two nodes; a modified version of Dijkstra’s shortest path algorithm is used otherwise. 

The links connected to the output of nodes with output degree violations and links connected to the input of nodes with input degree violations are the ones that are considered for rip-up and re-route.