What are the contributions mentioned in the paper "The routability of multiprocessor network topologies in fpgas" ?

Following a standard design flow and using commercial tools, the authors investigate how this fundamental difference in resource usage affects the mapping of various network topologies to a modern FPGA routing structure. By exploring the routability of different multiprocessor network topologies with 8, 16 and 32 nodes on a single FPGA, the authors show that the difference between resource utilization of a ring, star, hypercube and mesh topologies is not significant up to 32 nodes. The authors also show that a fully-connected network can be implemented with at least 16 nodes, but with 32 nodes it exceeds the routing resources available on the FPGA.

(Open Access) The routability of multiprocessor network topologies in FPGAs (2006) | Manuel Saldana

The Routability of Multiprocessor Network Topologies in

FPGAs

Manuel Salda

na, Lesley Shannon and Paul Chow

Dept. of Electrical and Computer Engineering

University of Toronto

Toronto, Ontario, Canada, M5S 3G4

{msaldana,lesley,pc}@eecg.toronto.edu

ABSTRACT

A fundamental diﬀerence between ASICs and FPGAs is that

wires in ASICs are designed such that they match the re-

quirements of a particular design. Wire parameters such as

length, width, layout and the number of wires can b e varied

to implement a desired circuit. Conversely, in an FPGA, area

is ﬁxed and routing resources exist whether or not they are

used, so the goal becomes implementing a circuit within the

limits of available resources. The architecture for existing

routing structures in FPGAs has evolved over time to suit

the requirements of large, localized digital circuits. How-

ever, FPGAs now have the capacity to implement networks

of such circuits, and system-level interconnection becomes a

key element of the design pro cess.

Following a standard design ﬂow and using commercial

to ols, we investigate how this fundamental diﬀerence in re-

source usage aﬀects the mapping of various network topol-

ogies to a modern FPGA routing structure. By exploring

the routability of diﬀerent multiprocessor network topolo-

gies with 8, 16 and 32 nodes on a single FPGA, we show

that the diﬀerence between resource utilization of a ring,

star, hypercube and mesh topologies is not signiﬁcant up to

32 nodes. We also show that a fully-connected network can

b e implemented with at least 16 nodes, but with 32 nodes

it exceeds the routing resources available on the FPGA. We

also derive a cost metric that helps to estimate the impact

of the topology selection based on the number of nodes.

Categories and Subject Descriptors

C.1.2 [Processor Architectures]: Multiple Data Stream

Architectures (Multiprocessors); D.0 [Computer Systems

Organization]: General

General Terms

Design

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for proﬁt or commercial advantage and that copies

bear this notice and the full citation on the ﬁrst page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior speciﬁc

permission and/or a fee.

SLIP’06, March 4–5, 2006, Munich, Germany.

Keywords

Multiprocessor, FPGA, Network-on-Chip, Topology, Inter-

connect

1. INTRODUCTION

With the growing complexity of System-on-Chip (SoC)

circuits, more sophisticated communication schemes are re-

quired to connect the increasing number and variety of in-

tellectual property (IP) blocks. Approaches like AMBA [1],

CoreConnect [2], WISHBONE [3] and SiliconBackplane [4]

follow a shared bus scheme that works well for Master-

Slave communication patterns, where there are peripherals

(slaves) that wait for data to be received or requested from

a more complex processing IP (master). When there are

several masters (e.g., processors) in the system, synchro-

nization, data interchange and I/O may saturate the bus,

and contention will slow down data transfers.

The Network-on-Chip (NoC) [5, 6] provides a possible so-

lution for this problem by creating a scalable interconnec-

tion scheme. The concept uses a set of buses connected to

routers or switches that interchange packets, much in the

same way as traditional computer networks or multiproces-

sor machines do. Consequently, NoC approaches have design

parameters and properties similar to traditional networks.

One of these parameters is the topology, which deﬁnes the

interconnection pattern between the routers and switches.

Multiple topologies have been studied for NoCs on ASICs

[7] [8]. A popular choice is the mesh [6, 9] because it pro-

vides structure, better control over electrical characteristics,

has an easy packet routing algorithm. These advantages are

clear for ASICs, but not necessarily for FPGAs [10]. The

electrical characteristics of the FPGA are solved by the chip

vendor, not by the user. As for structure it is perhaps in-

tuitive to use a mesh topology in FPGAs since the recon-

ﬁgurable fabric layout is in the form of a mesh. However,

the placement and routing of components on an FPGA will

not typically result in a symmetric, well-organized struc-

tured layout that resembles a mesh. Furthermore, manually

restricting the placement of components or routing of nets

may lead to ineﬃcient resource utilization for the logic that

is not part of the network. Finally, there are other topolo-

gies like hypercube or torus networks, or even tree topologies

that also have simple routing algorithms.

In this paper, we compare the routability of point-to-point

network topologies on FPGAs by measuring the impact of

each topology on a soft multiprocessor system implemented

on modern commercial FPGAs. We do this by measur-

ing the logic utilization, logic distribution (area), maximum

clo ck frequency, number of nets, and the place and route

time for ﬁve diﬀerent network topologies. We also derive a

cost metric to try to extract trends for larger systems.

The rest of this paper is organized as follows. Section

2 provides some background about research on NoCs. Sec-

tion 3 describes the topologies implemented and gives a brief

description of the block used as the network nodes, which

we call the computing node. Section 4 describes the im-

plementation platform, and how the systems are generated.

Section 5 presents the results obtained for the baseline sys-

tem and Section 6 explores the chip area required for each

top ology. Section 7 shows the highest frequency that each

system could achieve. Section 8 presents a metric we pro-

p ose to evaluate the topologies and Section 9 provides some

conclusions.

2. RELATED WORK

In this section we present examples of typical research

on NoCs, and how it relates to this study. In Brebner

and Levi [10] discuss NoC implementations on FPGAs, but

their focus is on the issues of using packet switching on a

mesh topology in the FPGA and on implementing crossbar

switches in the routing structure of the FPGA. Most NoC

work assumes ASIC implementations and there are numer-

ous studies including work on mesh topologies [6][9] and fat

trees [7]. Other studies on NoCs are done using register-

transfer-level simulations [7] and simulation models [11], but

they do not show the implementation side of the NoC. In-

stead, we focus on the interaction between the network to-

p ologies and how well they can be mapped to a ﬁxed FPGA

routing fabric. We create actual implementations by per-

forming synthesis, mapping, placement and routing for real

FPGAs using commercial tools.

Research has been done on synthesizing application-speciﬁc

network topologies [12]. A more general study on the routabil-

ity of diﬀerent topologies would require the ability to gen-

erate arbitrary interconnection patterns. In our work, we

created a design ﬂow and tools to automatically generate

multiprocessor systems using a set of well known topologies.

Based on the philosophy of routing packets, not wires [9,

13], NoC architectures have been proposed as packet-

switching networks, with the network interface itself being

the focus of much of the research. In this paper, we use a

simple network interface, more similar to a network hub than

a switch as it does not provide packet forwarding. Pack-

ets can only be sent to, and received from nearest-neighbor

no des. This makes the network interface extremely simple,

but it is suﬃcient for our purposes as the focus of this work is

on the routability of various topologies, not on the switching

element architecture.

3. EXPERIMENTAL ENVIRONMENT

The actual processor and network interface used are not

the critical elements in this study. What is required is to

create circuits that force particular routing patterns between

the computing nodes to see how the implementation re-

sources of these circuits on the FPGAs varies as the patterns,

i.e., topologies, are changed. We try ﬁve diﬀerent topologies

and three diﬀerent system sizes (8, 16 and 32 nodes) on ﬁve

diﬀerent FPGAs with enough resources to implement such

systems.

Table 1: Characteristics of the topologies studied

Topology Diameter Link DegreeRegularBisection

Complexity Width

ring

n 2 yes 2

star 2 n − 1 1,n − 1 no 1

square

2(n

1/2

− 1) 2(n − n

1/2

)

2,3,4 no 2

√

mesh

hypercube log

nlog

log

n yes

fully

n(n−1)

n − 1 yes

connected

In this section we describe the Network-on-Chip we used

to perform the experiments, which are explained later in this

paper.

3.1 Network Topologies

Networks can be classiﬁed into two categories. Static net-

works consist of point-to-point, ﬁxed connections between

processors, and dynamic networks which have active ele-

ments, such as switches, that can change the connectiv-

ity pattern in the system according to a protocol. In an

FPGA, the network can be dynamically reconﬁgured to

adapt to communication patterns by utilizing the reconﬁg-

urability [14] of the FPGA.

In this paper, we focus on static message passing networks.

The ring, star, mesh, hypercube and fully-connected topol-

ogies are selected as a representative sample, ranging from

the simplest ring topology to the routing-intensive fully-

connected system.

Network topologies can be characterized by a number of

properties: node degree, diameter, link complexity, bisection

width and regularity [15]. Node degree is the number of

links from a node to its nearest neighbors. Diameter is the

maximum distance between two nodes. Link complexity is

the number of links the topology requires. A network is

deemed to be regular when all the nodes have the same

degree. Bisection width is the number of links that must be

cut when the network is divided into two equal set of nodes.

Table 1 shows a summary of these characteristics for each

of the topologies used in this paper.

The characteristics of the network topology deﬁne the net-

work interface of a node. For example, the four-dimensional

hypercube is a regular topology, with all nodes having a

degree of four. This means that this topology requires a

single network interface type, each with four ports, i.e., four

communication links. The network interface is used to com-

municate with other nodes in the network. The maximum

distance (diameter) is four, which means that data going

through the network may require redirection or routing at

intermediate nodes and travel on up to four links. The link

complexity is 32, which is the total number of point-to-point

links that the overall system will have. In contrast, a 16-

node mesh has a total of 24 links in the system, but it is

not a regular topology, requiring three diﬀerent versions of

the network interface. Inner nodes require an interface with

four ports, perimeter nodes require one with three ports and

corner nodes use a two-port interface.

Figure 1 shows examples of systems with diﬀerent num-

bers of nodes and topologies that are implemented to carry

out our experiments. Every top ology can be seen as a

graph that is made of edges (links) and vertices (computing

nodes). In our implementations, the links are 64 bits wide

(i.e. Channel width of 64 bits) with 32 bits used for transmis-

sion and 32 bits used for reception, making it a full-duplex

communication system. The links also include control lines

used by the network interface.

Figure 1: A) 8-node ring, B) 8-node star, C) 32-

node mesh, D) 16-node hypercube, and E) 8-node

fully-connected topology

3.2 Computing Node

The computing nodes in Figure 1 consist of a computing

element and a network interface module. Figure 2 shows

the structure of a computing node. The master computing

no de of the system is conﬁgured to communicate with the

external world using a UART attached to the peripheral bus

shown inside the dashed box of Figure 2. The rest of the

no des have no peripheral bus.

Figure 2: The computing node

We use a Harvard architecture soft core processor as the

computing element so that data memory and program mem-

ory are accessed by independent memory buses. The com-

munication between the computing element and the network

interface is achieved by using two 32-bit wide FIFOs: one

for transmission and one for reception.

The network interface module is an extremely simple

blo ck that has two sides. It interfaces to the network with

several links (channels) according to the degree of the node.

On the other side, two FIFOS are used as message buﬀers

to the processor.

The network interface is basically a hub that broadcasts

the data to the neighbors on transmission, and it ﬁlters out

the data from the neighbors on reception. It is eﬀectively

a FIFO multiplexer that is controlled by the destination

ﬁeld in the packet header. If the destination value matches

the processor’s ID number, then the packet passes through

the hub to the pro cessor attached to the hub. Again, this

interface is simple, but is enough for the purpose of this

research, since we are interested in the connectivity pattern.

Implementing a single version of the network interface

would not provide a good measure of the diﬀerence in logic

utilization between the various topologies because the topol-

ogies requiring nodes of lesser degree should use less logic.

It is likely that the optimizer in the synthesis tool would

remove the unused ports, still allowing the study to be per-

formed, but we chose to actually implement the diﬀerent

node degrees required to be certain that only the necessary

logic was included.

The size of the remaining logic in the computing node is

independent of the node degree i.e., the logic in the proces-

sors, the FIFOs, the memory controllers and the UART are

independent of the topology selection.

4. IMPLEMENTATION PLATFORM

To build the net list, map the design, place it and route it,

we use the Xilinx [16] EDK tools version 7.1i in combination

with the Xilinx XST synthesis tool. To visualize the place-

ment of the systems, we use the Xilinx FPGA Floorplanning

tool. The network interface is developed in VHDL and sim-

ulated using ModelSim version 6.0b [17]. The routed nets

are counted with the help of the Xilinx FPGA Editor. For

Section 7, we use the Xilinx Xplorer utility to try to meet

the timing constraints. All the experiments are executed on

an IBM workstation with a Pentium 4 processor running at

2.8 GHz with Hyperthreading enabled and 2 GB of memory.

Our multiprocessor systems use the Xilinx MicroBlaze

soft-processor core [16] as the computing element. The com-

puting element connects to the network interface module

through two Fast Simplex Links (FSL), a Xilinx core that is

a unidirectional, point-to-point communication bus imple-

mented as a FIFO.

We use a variety of Xilinx chips to implement the de-

signs: the Virtex2 XC2V2000, and the Virtex4 XC4VLX25,

XC4VLX40, XC4VLX60 and XC4VLX200. The LX version

of the Virtex4 family only has Block RAM (BRAM) and

DSP hard cores in addition to the FPGA fabric. They do

not have PowerPC processors or Multi-gigabit Transceivers

(MGTs). This provides a more homogeneous architecture

that facilitates area comparisons.

The hard multiplier option for the MicroBlaze is disabled

to minimize the impact of hard core blocks that may inﬂu-

ence or limit the placement and routing. The BRAM are

hard core blocks that also aﬀect placement and routing, but

they are essential for the MicroBlaze system to synthesize

so they have not been eliminated.

A 32-node, fully-connected system requires 1056 links to

be speciﬁed, and doing this manually is time consuming and

error prone. Instead, we developed a set of tools that take a

high-level description of the system that speciﬁes the topol-

ogy type, the number of nodes, the number of total links

and the number of links per node, and they generate the

ﬁles required by EDK.

The number of nodes, for all the topologies, is chosen

based on the limitation of the hypercube to 2

no des, where

d is the dimension. For d = 3, 4 and 5 we have 8, 16 and

32 nodes, respectively.

5. BASELINE SYSTEM

The main objective of this experiment is to measure the

logic and routing resources required for each of the topolo-

gies. The timing constraints are chosen to be realistic, but

not aggressive, so that the place and route times are not

excessive. The 8 and 16-node systems are speciﬁed to run

at 150 MHz and the 32-node systems are speciﬁed to run

at 133 MHz to account for the slower speed grade of the

XC4VLX200 chip that is used for those systems.

The logic resource usage is measured in terms of the to-

tal number of LUTs required for a design and the number of

LUTs related to only the interconnection network, i.e., those

used to implement the network interface modules. The logic

resources needed to implement the network are estimated by

ﬁrst synthesizing the network interface modules as stand-

alone blocks to determine the number of LUTs required.

These numbers are then used to estimate the usage of the

entire network. For example, the 8-node star topology re-

quires one 7-port network interface, which uses 345 LUTs,

and seven 2-port network interfaces, which need 111 LUTs

each. The total number of LUTs required by the network is

345 + (7 ×111) = 1122 LUTs. Note that this is only an esti-

mate as the values reported by synthesizing the stand-alone

blo ck level may change at the system level due to optimiza-

tions that may occur. The register (ﬂip ﬂop) utilization is

found by using the same method as used for ﬁnding the logic

resource utilization.

The routing resource utilization is measured in terms of

the total number of nets in the design and the number of

nets used to implement only the network links and network

interfaces. The counting of nets is done by using the Xilinx

FPGA Editor, which allows the user to ﬁlter out net names.

The number of nets attributed to the network is found by

counting the number of nets related to all the network inter-

face modules in the design. This includes all nets that are

used in the network interface module as well as the nets in

the network topology itself. Including the nets in the net-

work interface module is reasonable because more complex

top ologies use more complex network interfaces that also

consume FPGA routing resources.

5.1 Results

Figure 3 shows a histogram of the number of LUTs needed

to implement the complete systems, including the MicroB-

laze, FSLs, memory interface controllers, switches, UART,

and OPB bus. As expected, the system with the fully-

connected network has the highest logic utilization, and as

the system size increases, the diﬀerence with respect to the

other topologies gets more pronounced because of the O(n

)

growth in size. The diﬀerence is most signiﬁcant with the

32-no de system, which requires over twice the logic of the

other systems. For the other topologies, the maximum dif-

ference in LUT usage amongst the topologies at the same

no de size ranges from about 5% in the 8-node systems to

ab out 11% in the 32-node systems.

A more detailed view of the logic resources can be seen

in Table 2. The Logic Utiliz. column is the total numb er of

10000

20000

30000

40000

50000

60000

70000

80000

90000

32168

LUTs

Nodes

ring

star

mesh

hypercube

fully-connected

Figure 3: Logic utilization of systems

Table 2: Logic and register resources used by each

system

Topology Nodes Logic Logic Logic Total Reg. Reg.

Utiliz. Incr. Ovrhd. Reg. Incr. Ovrhd.

(LUTs) (%) (%) (%) (%)

ring 8 10197 0.0 8.7 2637 0.0 11.2

star 8 10393 1.9 10.8 2642 0.2 11.6

mesh 8 10470 2.7 10.7 2641 0.2 11.4

hypercube 8 10701 4.9 12.6 2645 0.3 11.5

fully con. 8 12376 21.4 22.3 2762 4.7 13.9

ring 16 20448 0.00 8.7 5186 0.0 11.4

star 16 20936 2.4 9.6 5190 0.1 11.8

mesh 16 21360 4.5 12.6 5202 0.3 11.7

hypercube 16 22272 8.9 16.2 5218 0.6 12.0

fully con. 16 30176 47.6 38.1 5490 5.9 16.3

ring 32 40648 0.00 8.7 10209 0.0 11.6

star 32 41880 3.0 9.0 10214 0.1 11.9

mesh 32 42936 5.6 13.6 10250 0.4 11.9

hypercube 32 45104 11.0 17.9 10306 0.9 12.4

fully con. 32 87760 115.9 57.8 11330 11.0 20.3

LUTs used for each design and these are the values shown in

Figure 3. Since the ring has the simplest routing topology,

it is used as the baseline for comparisons with the rest of

the topologies.

Column Logic Incr. shows the increase in the number of

LUTs for each topology relative to the ring top ology. For

example, the fully-connected topology requires 21.4% more

LUTs than the ring for the 8-node system. In contrast, the

Logic Ovrhd column shows the numb er of LUTs used for the

network interfaces as a fraction of the total LUTs required

for the complete system. It is calculated as (total number of

LUTs for network interfaces)/(Total LUTs in the system).

As expected, the ring topology has the lowest overhead for

all node sizes and the fully-connected system overhead in-

creases very quickly as the number of nodes increases.

Table 2 also shows the corresponding results for the reg-

ister (ﬂip ﬂop) utilization of the various topologies. The

trends mimic the logic utilization data, but the variation is

smaller because the number of registers in the network in-

terface module is small and because it is the only component

that is changing in size.

The routing resource usage of each system is presented in

Table 3. The Routing Utiliz. column is the total number

of nets used in the design. In general, the routing resource

utilization follows a similar pattern to the logic resource uti-

lization across the systems. The fully-connected system re-

quires the most nets, as expected. It should also be noted

that the 32-node, fully-connected topology design could be

placed but not completely routed, leaving 56 unrouted nets.

Column Routing Increase presents the diﬀerence in rout-

ing resources relative to the ring topology. It can be seen

that the greatest increase in routing for the ring, star, mesh

and hypercube topologies occurs for the 32-node hypercube

system with only a 10.6% increase relative to the 32-node

ring system. This reﬂects the O(n log n) link complexity of

the hypercube as compared to the O(n) link complexity for

the ring, star, and mesh topologies.

The Routing Ovrhd column is calculated as the total num-

b er of nets for all the network interfaces divided by the total

number of nets in the entire system. A visual representa-

tion of how each network topology contributes to the global

number of nets can be seen in Figure 4. From this ﬁgure it

can be seen that the ring topology overhead is practically

indep endent of the system size at about 6% of the total

nets for the 8, 16 and 32-node systems. The star and mesh

top ologies increase slowly to a maximum of about 11% of

the total nets for the 32-node system. The hypercube adds

ab out 15% overhead to the global routing in the 32-node

system. The fully-connected topology starts at 20% over-

head for an 8-node system, and grows to around 55% of the

total routing for the 32-node topology, which actually fails

to completely route. The other topologies have much lower

routing overhead and will likely be able to expand to 64-

no de or 128-node systems, assuming large enough FPGAs

exist.

100

32168

% of Nets in the System

Nodes

ring

star

mesh

hypercube

fully-connected

Figure 4: Topology impact on global routing

The place and route time data varies considerably because

of how the place and route algorithms work and factors that

impact the workstation performance. For the ring, star,

mesh and hypercube topologies, the average times to place

and route are approximately the same for a ﬁxed numb er of

no des.

For the 8, 16 and 32-node systems the average times are

12 min., 30 min. and 4 hours 48 min., respectively. The

fully-connected topology exhibits an exponentially growing

time of 15 min. for the 8-node system, 12 hours for the 16-

no de system, and remained unroutable after 3 days for the

32-no de system.

Table 3 also shows the clock frequency (freq.) achieved

Table 3: Routing resources used by each system

TopologyNodesRouting Routing Routing freq. Target

Utiliz. Increase Ovrhd Clock

(nets) (%) (%) (MHz)(MHz)

ring 8 10744 0.0 5.7 150 150

star 8 10956 2.0 7.6 151 150

mesh 8 11021 2.6 7.7 151 150

hyp.cube 8 11256 4.8 9.5 152 150

fully con. 8 13045 21.4 20.4 150 150

ring 16 21501 0.0 5.7 152 150

star 16 22013 2.4 8.0 150 150

mesh 16 22429 4.3 9.6 151 150

hyp.cube 16 23357 8.6 13.1 151 150

fully con. 16 31373 45.9 34.7 128 150

ring 32 42618 0.0 5.8 133 133

star 32 43888 3.0 8.6 100 133

mesh 32 44945 5.5 10.6 132 133

hyp.cube 32 47136 10.6 14.7 133 133

fully con. 32 90016 111.2 54.4 Fail 133

for each of the systems. Of the 8 and 16-node systems,

only the fully-connected, 16-node system is not able to meet

the 150 MHz requirement, achieving only 128 MHz. With

the 32-node systems, the target is 133 MHz, but this is not

achieved by the star or the fully-connected network. The

star incurs congestion at the central node, which aﬀects the

timing, and the fully-connected system requires too many

wires. The placement and routing eﬀorts were set to high,

but no time was spent to try and push the tools to improve

the results that did not meet the targets.

6. AREA REQUIREMENTS

For the previous exp eriments, LUT and ﬂip ﬂop counts

are used as the reference metrics for logic resource utiliza-

tion. However, for this experiment the number of slices is

used to measure area usage. In the Xilinx architecture, each

slice contains two LUTs. A design requires a certain num-

ber of LUTs and ﬂip ﬂops, and depending on how well the

packing algorithm performs, the design will require more or

less slices. Moreover, the place and route tools may not be

able to utilize the two LUTs in every slice because of routing

constraints and timing requirements. The number of slices

better reﬂects the actual chip area required to implement

the design. Also, the area constraints used by the Xilinx

tools are speciﬁed in terms of slices.

For this experiment, the Minimum Area Required is de-

ﬁned as the smallest number of slices needed for the design

to place and route successfully. It is determined by reduc-

ing, or compressing, the area used by the design until just

before it fails to place and route and counting the number

of slices in the compressed region at that point. This gives

a measure of how eﬃciently the design can use the resources

when the resources are close to being fully utilized, which

models the eﬀect of trying to implement a design on a chip

that is close to full capacity.

The area compression is done using area constraints in

the User Constraints File, i.e., the .ucf ﬁle. The constrained

area is described by giving the coordinates of the bottom-

left and top-right slice positions that deﬁne a rectangular

area in the FPGA. The origin is ﬁxed to X0Y0 and the

The routability of multiprocessor network topologies in FPGAs

Figures

Citations

Generic Low-Latency NoC Router Architecture for FPGA Computing Systems

Scaling Soft Processor Systems

Predicting the performance of application-specific NoCs implemented on FPGAs

Development of a Universal Adaptive Fast Algorithm for the Synthesis of Circulant Topologies for Networks-on-Chip Implementations

The effect of node size, heterogeneity, and network size on FPGA based NoCs

References

Route packets, not wires: on-chip interconnection networks

Interconnection Networks: An Engineering Approach

Interconnection Networks: An Engineering Approach

A network on chip architecture and design methodology

Performance evaluation and design trade-offs for network-on-chip interconnect architectures

Related Papers (5)

The routability of multiprocessor network topologies in FPGAs

Routability of Network Topologies in FPGAs

Route packets, not wires: on-chip interconnection networks

Interconnect estimation for FPGAs

NoC-Based FPGA: Architecture and Routing

Frequently Asked Questions (1)

Q1. What are the contributions mentioned in the paper "The routability of multiprocessor network topologies in fpgas" ?

Trending Questions (1)