scispace - formally typeset
Search or ask a question
Journal ArticleDOI

ROUTABILITY-DRIVEN PACKING: METRICS AND ALGORITHMS FOR CLUSTER-BASED FPGAs

TL;DR: A routability-driven clustering method for cluster-based FPGAs that packs LUTs into logic clusters while incorporating routability metrics into a cost function and integrates the routability model into a timing-driven packing algorithm.
Abstract: Most of the FPGA's area and delay are due to routing. Considering routability at earlier steps of the CAD flow would both yield better quality and faster design process. In this paper, we discuss the metrics that affect routability in packing logic into clusters. We are presenting a routability-driven clustering method for cluster-based FPGAs. Our method packs LUTs into logic clusters while incorporating routability metrics into a cost function. Based on our routability model, the routability in timing-driven packing algorithm is analyzed. We integrate our routability model into a timing-driven packing algorithm. Our method yields up to 50% improvement in terms of the minimum number of routing tracks compared to VPack (16.5% on average). The average routing area improvement is 27% over VPack and 12% over t-VPack.

Summary (1 min read)

Introduction

  • The organization of the paper is as follows Previous work on routability driven technology mapping and algorithms for cluster packing are discussed in Section Section describes the FPGA architecture the authors are targeting utilization and routability issues and problem formulation for the packing problem.
  • The authors are introducing new metrics that are used to form a new objective function to evaluate routability.
  • By clustering the logic blocks the number of connections between clusters is reduced.

RoutabilityGain B j Nets B Nets C j

  • If these nets are observed closely their contributions to routability gain of the block are slightly di erent Block B has three common nets with cluster C N N and N.
  • In fact the gain function originates from the third routability factor discussed in Section i e reducing pins per net.

Table Routability Gain of a Candidate Block According to a Single Net

  • If the output terminal of a net is inside the cluster internal connections can be used to connect the input pins of the net located inside the cluster.
  • When the net connected to a block is a multi terminal net the gain associated with the multi terminal net is computed for each block containing a terminal of the net.
  • This implies a prediction of high routability gain in later stages while constructing the cluster After a highly critical connection is added to a cluster more input output connections would be added to the cluster The authors analysis in this section shows that timing and routability correlate very strongly Satisfying timing improves routability in some aspects.
  • The routing results critical path delay and number of exposed nets using both packing methods t RPack and t VPack are reported in Table t RPack uses their routability gain function as described in Section.

Table Logic Size Number of Exposed Nets Number of Routing Tracks and Critical Path t

  • In this paper the authors addressed routability issues and their impact on performance and routing area.
  • A routability driven packing method for cluster based FPGAs is proposed.
  • The authors method is able to improve the routability by decreasing the number of required tracks in the FPGA routing channels.
  • This improvement was achieved by incorporating several routability factors in their packing algorithm Based on their routability model the authors analyzed the timing driven packing Criticality of a connection in terms of timing re ects the role of this connection in routability as well.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

UCLA
UCLA Previously Published Works
Title
Routability-driven packing: Metrics and algorithms for cluster-based FPGAs
Permalink
https://escholarship.org/uc/item/4v53n326
Journal
Journal of Circuits Systems and Computers, 13(1)
ISSN
0218-1266
Authors
Bozorgzadeh, E
Memik, S O
Yang, X
et al.
Publication Date
2004-02-01
Peer reviewed
eScholarship.org Powered by the California Digital Library
University of California

Routability-driven Packing: Metrics and Algorithms for
Cluster-based FPGAs
E. Bozorgzadeh
y
S. Ogrenci Memik
y
X. Yang
z
M. Sarrafzadeh
y
y
Computer Science Department
University of California, Los Angeles (UCLA)
3531C Bo elter Hall
Los Angeles, CA 90095, USA
e-mail:
f
elib,seda,ma jid
g
@cs.ucla.edu
z
Synplicity Inc.
600 W California Ave.
Sunnyvale, CA 94086
email: xjyang@synplicity.com
ABSTRACT
Most of an FPGA's area and delay are due to routing. Considering routability at earlier steps of the CAD ow would
both yield better quality and faster design process. In this paper, we discuss the metrics that aect routability in
packing logic into clusters. We arepresenting aroutability-driven clustering method for cluster-based FPGAs. Our
method packs LUTs into logic clusters while incorporating routability metrics into a cost function. Based on our
routability model, the routability in timing-driven packing algorithm is analyzed. We integrate our routability model
into a timing-driven packing algorithm. Our method yields up to 50
%
improvement in terms of the minimum number
of routing tracks comparedtoVPack(
16
:
5%
on average). The average routing areaimprovement is
27%
over VPack
and
12%
over t-VPack.
Keywords:
VLSI CAD, Field Programmable Gate Arrays (FPGAs), Technology mapping, Clustering Techniques,
Optimization, Algorithm.
1

S
CLB
S
SS
CLB
S S
CLB
CLB
S
S
S
CLB
CLB
S
S
S
CLB
S S
CLB
S
CLB
S
Routing Segments
I/O Pad
Routing Switch Box
Configurable Logic Block
Figure 1:
Island style FPGA
1 INTRODUCTION
Today's technology allowsFPGAstobedesignedasmulti-million system gate devices at the heart of elec-
tronic systems. Since FPGA is an integral part of many digital systems, the signicance of optimization
problems in mapping circuits on FPGA has increased. There are two important issues related to the FPGA
mapping pro cess: the quality of the resulting mapping and the run-time of the to ols serving in the pro cess.
The former b eing more dominant for FPGAs, both aspects are imp ortant. Similar to ASIC design, minimiz-
ing the delayisanimportant ob jectiveaswell as minimizing the silicon area. Area of an FPGA consists of
routing area and logic area. Optimizing the utilization of b oth routing and logic resources is very crucial to
obtain a go od quality result.
FPGAs consist of smaller congurable building blo cks called logic blocks or Congurable Logic Blo cks
(CLBs), which are placed on the FPGA chip either on atwo-dimensional array (see Figure 1) or in a set
of rows. The CAD ow of mapping a circuit on FPGA consists of four ma jor stages. In the rst stage the
circuit is basically logically optimized. In stage 2, the optimized circuit is divided into CLBs of the FPGA,
which is called technology mapping. Placement and routing stages accomplish the assignment of sub circuits
on CLBs and programming the routing switches of FPGA.
Due to highly constrained and discrete interconnect structure of current FPGAs, routing is a challenging
problem. Most of the time current FPGA routers cannot use available routing resources eciently. This leads
to a large portion of the routing area to be wasted. Also, dep ending on the complexity of the particular
design routing might require a fairly large amount of time, often several hours to be completed. Hence
considering routability at earlier steps of the CAD owwould both yield a better quality of the result and
2

less design time in later stages.
FPGA vendors have dierent logic block congurations. There are two kinds of CLBs: LUT-based blo cks
and multiplexor-based blocks. LUT-based logic blo cks are more popular. There have b een several contribu-
tionsindevelopment and design of FPGAs towards reducing the gap in density and performance b etween
ASIC and FPGA implementation. Hierarchical features have been added into logic and routing architecture
of FPGAs. Many commercial FPGAs, such as Xilinx, Altera, and Actel FPGAs include logic blocks that
contain several LUTs 1]. A collection of basic logic elements that are group ed together to be placed in
one complex logic blo ck is called a
cluster
(See Figure 2(a)). FPGAs with logic blocks containing multiple
basic blo cks are called
cluster-based
FPGAs. Each CLB (congurable logic block) is a cluster of basic logic
elements in cluster-based FPGAs. The structure and granularity of the logic blo ckhave a signicant impact
on the area-eciency and p erformance of the FPGA. If the logic block is ne-grained, the circuit to be im-
plemented will be distributed over more number of logic blocks. This has a negative impact on routability,
since more blo cks need to b e interconnected. Since the interconnect inside the logic blocks is hardwired, lo cal
interconnect can b e made very fast and eciently. This improves routability and decreases the load on the
router signicantly by reducing the size of problem. Two main b enets of clustering a basic blo ckinto CLBs
are sp eed in compilation and circuit delay improvement. On the other hand, it is not feasible to increase
the complexityofthelogicblocks b eyond a certain limit. If the logic blo cks b ecome to o complex it b ecomes
dicult to utilize them fully, hence several logic blo cks will b e wasted. Due to constraints on the number
of input pins and the number of blo cks within each cluster, all the resources in a cluster cannot be used
in circuit implementation. The task of assigning basic logic blo cks to clusters is called
packing
. Due to no
accurate means to estimate the interconnect at logic synthesis level, it is not easy to deal with routabilityof
circuit at logic level. However, if special properties of the interconnect available at logic level, such as sharing
among the pins, can be exploited during packing logics into basic blocks, signicant gains can be obtained
in terms of routability. In the past routability at the packing stage has not b een considered as extensively
as it has been at the technology mapping stage. Packing can bring improvements on the routability,since
after technology mapping a more accurate estimation on the interconnect is available.
In this pap er we prop ose a routability-driven packing algorithm. Weshowimprovements in routing area
upon the state-of-the-art logic packing algorithms called VPack and t-VPack: Logic Blo ckPacking Algorithm
4, 6]. We are introducing a new method to consider routabilityatthepacking stage. Our method in selecting
a block for clustering can easily b e integrated with other clustering algorithms. We are demonstrating the
eect of our method on the routabilitybysynthesizing the benchmark circuits through the complete CAD
ow. Wehavetechnology mapp ed a given circuit, then applied our routability-driven packing method for
clustering, and nally placed and routed the circuit. We present the results of the nal routing and showthat
our method improves the routability signicantly. Our new algorithm, RPack, indeed improves routability
compared to VPack. As our results on 20 largest MCNC benchmarks show in Section 5, we are able to
improve the minimum required number of routing tracks by 16.5% on an average. A preliminary version of
3

this work appeared in 8]. We also integrated our routability function in timing-driven packing algorithm.
Based on our routability mo del, routabilityin timing-driven packing algorithm is analyzed. Compared to
t-VPack, the routing area is improved by 12% on an average.
The organization of the paper is as follows: Previous work on routability-driven technology mapping and
algorithms for cluster packing are discussed in Section 2. Section 3 describ es the FPGA architecture we are
targeting, utilization and routability issues and problem formulation for the packing problem. In Section 4
RPack, our routability-driven packing metho d is described. Exp erimental results are presented in Section 5.
Section 6 includes conclusions and future work.
2 Previous Work
Most commercial FPGAs use congurable blo cks containing several LUT. Packing LUTs into clusters is an
important design step intro duced for cluster-based FPGAs. It can b e viewed as a sub-task within technology
mapping stage in which logic gates are assigned to LUTs and registers. We will rst mention contributions
made in the technology mapping area. The ma jorityofresearchdevoted to technology mapping has b een
done with the ob jective of improving either timing 7, 12,18,22,13] or area-eciency 19,24,20] or trade-o
between depth and area 19]. Compared to the amount of the eort made in this area there is little work done
in the routabilitydriven technology mapping domain 15], 17]. The routability driven technology mapp er
for LUT-based arrays, Rmap 17], employs a mapping strategy that considers routability.
The packing problem is a clustering problem. Clustering has been studied extensively for various ap-
plications, such as placement 25], technology mapping 4, 17], etc. Packing is a clustering problem with
constraints on the number of input pins and the numberofLUTsineach CLB. The ob jectiveistominimize
the numb er of required CLBs to cover all the LUTs while satisfying the constraints. Betz and Rose proposed
VPackandt-VPack, logic blockpacking algorithms 4] for cluster-based FPGAs. VPack and t-VPack are
oneofthebest known packing tools for FPGAs. VPack rst packs a ip op and a LUT together into a
basic logic element using a matching based metho d. Then these BLEs are packed in a greedy manner into
logic clusters with the lo cal optimization ob jectives being to ll each cluster to its capacity and minimize the
number of used inputs to each cluster. This approach is inspired from 21]. In 6] a timing-driven packing
tool for FPGAs, t-VPack is proposed. The blocks on the critical path are preferred to b e packed together
in a CLB so that the delay can b e improved by exploiting lo cal wiring in the CLB to route the critical nets.
t-VPack delivers a better routability compared to VPack. Later, we will describ e the routability potential in
timing-driven packing algorithms. Also in 23], a packing approach is prop osed based on maximum weight
matching on circuit graph. Recently researchers in 9] have prop osed a new technique for packing logic into
clusters. Based on Rent's rule for each application, the connectivityofeach cluster is dened. In this ap-
proach routabilityisweighted according to the connectivity of the application. It is a goo d idea to consider
routability based on connectivity of the circuit. On the other hand, the weight of routability in the overall
4

Citations
More filters
Book
25 Oct 2006
TL;DR: All major steps in FPGA design flow which includes: routing and placement, circuit clustering, technology mapping and architecture-specific optimization, physical synthesis, RT-level and behavior-level synthesis, and power optimization are covered.
Abstract: Design automation or computer-aided design (CAD) for field programmable gate arrays (FPGAs) has played a critical role in the rapid advancement and adoption of FPGA technology over the past two decades. The purpose of this paper is to meet the demand for an up-to-date comprehensive survey/tutorial for FPGA design automation, with an emphasis on the recent developments within the past 5-10 years. The paper focuses on the theory and techniques that have been, or most likely will be, reduced to practice. It covers all major steps in FPGA design flow which includes: routing and placement, circuit clustering, technology mapping and architecture-specific optimization, physical synthesis, RT-level and behavior-level synthesis, and power optimization. We hope that this paper can be used both as a guide for beginners who are embarking on research in this relatively young yet exciting area, and a useful reference for established researchers in this field.

147 citations

Proceedings ArticleDOI
27 Feb 2011
TL;DR: This paper presents an area-driven generic packing tool that can pack the logical atoms into any heterogeneous FPGA described in the new language, including many different kinds of soft and hard logic blocks.
Abstract: The development of future FPGA fabrics with more sophisticated and complex logic blocks requires a new CAD flow that permits the expression of that complexity and the ability to synthesize to it. In this paper, we present a new logic block description language that can depict complex intra-block interconnect, hierarchy and modes of operation. These features are necessary to support modern and future FPGA complex soft logic blocks, memory and hard blocks. The key part of the CAD flow associated with this complexity is the packer, which takes the logical atomic pieces of the complex blocks and groups them into whole physical entities. We present an area-driven generic packing tool that can pack the logical atoms into any heterogeneous FPGA described in the new language, including many different kinds of soft and hard logic blocks. We gauge its area quality by comparing the results achieved with a lower bound on the number of blocks required, and then illustrate its explorative capability in two ways: on fracturable LUT soft logic architectures, and on hard block memory architectures. The new infrastructure attaches to a flow that begins with a Verilog front-end, permitting the use of benchmarks that are significantly larger than the usual ones, and can target heterogenous FPGAs.

86 citations


Cites methods from "ROUTABILITY-DRIVEN PACKING: METRICS..."

  • ...These algorithms include T-VPack [21], T-RPack [7], IRAC [30], HDPack [9], and others [17] [18]....

    [...]

  • ...These algorithms include T-VPack [21], T-RPack [7], IRAC [30], HDPack [9], and others [17] [18]....

    [...]

Proceedings ArticleDOI
13 Jun 2005
TL;DR: A system level technique for mapping large, multiple-IP-block designs to channel-width constrained FPGAs and shows a graceful trade-off between channel width and CLB count, which makes it possible to target specific channel- width constraints during clustering with minimal CLB inflation.
Abstract: In this paper we present a system level technique for mapping large, multiple-IP-block designs to channel-width constrained FPGAs. Most FPGA clustering tools (Betz, 1999, Bozorgzadeh, 2004 and Singh, 2002) aim to reduce the amount of intercluster connections, hence reducing channel width needs. However, if this exceeds the FPGA's channel width (a hard constraint), then the circuit still cannot be routed. Previous work by Singh (2002) and Tessier (2000) depopulates logic clusters (CLBs) to reduce channel width. By depopulating non-uniformly, i.e. depopulate more in hard-to-route regions, we show a graceful trade-off between channel width and CLB count. This makes it possible to target specific channel-width constraints during clustering with minimal CLB inflation. Results show channel width decreases of up to 20% with a 5% increase in area. Further decreases of nearly 50% are possible at 3.3 times the original area. Despite the area increase, this technique creates routable solutions from otherwise unroutable circuits.

85 citations

Proceedings ArticleDOI
24 Feb 2008
TL;DR: This paper presents an interconnect model for island-style FPGAs, whose single output is the estimated routing demand (often referred to as W, the number of routing tracks per channel) for an FPGA as a function of several logic block, circuit and routing architecture parameters.
Abstract: Architecture development for FPGAs has typically been a very empirical discipline, requiring the synthesis of benchmark circuits into candidate architectures. This is difficult to do in the early stages of architecture development, however, because there is no complete architecture to synthesize circuits into. The effort required to create prototype tools for nascent architectures is far too great for every new logic block or routing architecture idea, and so it would be extremely helpful to have a simple and intuitive FPGA interconnect model to guide the architectIn this paper we present such an interconnect model for island-style FPGAs, whose single output is the estimated routing demand (often referred to as W, the number of routing tracks per channel) for an FPGA as a function of several logic block, circuit and routing architecture parameters. The goal of this model is to be as simple as possible, while still accurate enough to be useful, to provide understanding and intuition on FPGA routing. Our methodology is empirical -- we propose model forms based on empirical observations, intuition and some derivation, and then fit models to experimentally generated dataWe show the development of the model in stages, beginning with a fully flexible FPGA, and gradually proceeding to one which includes the key parameters that control the flexibility of FPGA routing, and one key parameter describing the logic block and another relating to the typical circuit. We then show how to use these models in early-stage architecture development to provide feedback on several aspects of logic block architecture. We also show how the model can be used to explore the routing architecture space itself and to provide an overall intuition for architecture development

47 citations


Cites background from "ROUTABILITY-DRIVEN PACKING: METRICS..."

  • ...Packing algorithms can be categorized into three general approaches, namely top-down ([30, 32]), depth-optimal ([22, 46]) and bottom-up ([25, 43])....

    [...]

Proceedings ArticleDOI
05 Nov 2006
TL;DR: This work presents a fully automated CAD flow (Un/DoPack) that finds local regions of high interconnect demand and reduces it by spreading out the logic in that region by introducing whitespace in the form of empty logic elements within the configurable logic blocks of the congested region.
Abstract: .FPGA device area is dominated by interconnect, so low-cost FPGA architectures often have reduced interconnect capacity. This limited routing capacity creates a hard channel width constraint that can make it difficult for CAD tools to successfully map a circuit into these devices. Instead of migrating a design to a high-cost, resource-rich architecture that is easier to route, we present a cheaper alternative: a fully automated CAD flow (Un/DoPack) that finds local regions of high interconnect demand and reduces it by spreading out the logic in that region. This is done by introducing whitespace in the form of empty logic elements (LEs) within the configurable logic blocks (CLBs) of the congested region. After spreading, the congested region occupies more routing channels and so obtains access to greater aggregate interconnect capacity. Although this has the side effect of using more CLBs, it has the advantage of lowering peak interconnect demands and making a previously-unroutable circuit routable. We also design a new set of synthetic benchmark circuits that model interconnect variation within a large design. Using these benchmarks, we show that circuits with high interconnect variation require FPGA devices to have large channel widths. However, since congestion of such circuits is localized, Un/DoPack is very good at reducing the peak demands of circuits with high interconnect variation. Our results suggest that even for an average Rent exponent of 0.62 (a modest value), a large variation of this exponent within a design will also require FPGAs to have large channel widths. Thus, it is crucial to study interconnect variation of benchmark circuits when designing low-cost FPGAs. Previous research studying interconnect properties focuses on average Rent exponent values of each design, but we believe new work should study variation as well. For circuits with high interconnect variation, we demonstrate that channel widths can be reduced by up to ~40% with only ~10% increase in area

39 citations

References
More filters
Book
01 Jan 1996
TL;DR: This book reviews the design techniques for approximation algorithms and the developments in this area since its inception about three decades ago and the "closeness" to optimum that is achievable in polynomial time.
Abstract: Approximation algorithms have developed in response to the impossibility of solving a great variety of important optimization problems. Too frequently, when attempting to get a solution for a problem, one is confronted with the fact that the problem is NP-hard. This, in the words of Garey and Johnson, means "I can't find an efficient algorithm, but neither can all of these famous people." While this is a significant theoretical step, it hardly qualifies as a cheering piece of news.If the optimal solution is unattainable then it is reasonable to sacrifice optimality and settle for a "good" feasible solution that can be computed efficiently. Of course, we would like to sacrifice as little optimality as possible, while gaining as much as possible in efficiency. Trading-off optimality in favor of tractability is the paradigm of approximation algorithms.The main themes of this book revolve around the design of such algorithms and the "closeness" to optimum that is achievable in polynomial time. To evaluate the limits of approximability, it is important to derive lower bounds or inapproximability results. In some cases, approximation algorithms must satisfy additional structural requirements such as being on-line, or working within limited space. This book reviews the design techniques for such algorithms and the developments in this area since its inception about three decades ago.

2,488 citations


"ROUTABILITY-DRIVEN PACKING: METRICS..." refers background in this paper

  • ...Many di erent heuristics have been proposed in clustering area [3]....

    [...]

Book
31 Mar 1999
TL;DR: From the Publisher: Architecture and CAD for Deep-Submicron FPGAs addresses several key issues in the design of high-performance FPGA architectures and CAD tools, with particular emphasis on issues that are important for FPG as implemented in deep-submicron processes.
Abstract: From the Publisher: Architecture and CAD for Deep-Submicron FPGAs addresses several key issues in the design of high-performance FPGA architectures and CAD tools, with particular emphasis on issues that are important for FPGAs implemented in deep-submicron processes. Three factors combine to determine the performance of an FPGA: the quality of the CAD tools used to map circuits into the FPGA, the quality of the FPGA architecture, and the electrical (i.e. transistor-level) design of the FPGA. Architecture and CAD for Deep-Submicron FPGAs examines all three of these issues in concert.

1,335 citations


"ROUTABILITY-DRIVEN PACKING: METRICS..." refers background or methods in this paper

  • ...In RPack, similar to VPack [4, 15], in the rst stage, a LUT and a register are packed into a basic logic block when possible....

    [...]

  • ...cluster C If the output terminal of a net is inside the cluster internal connections can be used to connect the input pins of the net located inside the cluster In such a case there is no need to use an input pin of the cluster to connect the net N to other terminals of the net outside the cluster since an output pin of the cluster can be used for external interconnection Therefore the contribution of net N to block gain is more than just covering an edge of a multi terminal net Actually by adding block B to cluster C an input pin of the cluster gets free and can be used for another net connection In Table this is de ned as in pin gain This increases the probability of acceptance of adding the block to the cluster In other words the probability of violating the input constraints of the cluster decreases Note that each block output pin is accessible from outside and there is no sharing among the output pins Therefore there would be no output pin constraints for the clusters hence saving on output pins does not bring any gain except in one case Suppose all the input pins of a net are already inside the cluster and the logic block being added to a cluster contains the output pin of the net Net N in Figure is an example of such a case The output pin of the cluster corresponding to the block driving the net N cannot be used by other blocks This means that there would be no connection from outside to this pin since all the terminals of the corresponding net are located inside the cluster Therefore the number of external connections of the cluster de ned as output congestion gain in Table decreases This yields less congestion among the clusters In other words it reduces the number of used pins of a cluster which is the fourth routability factor Net N has no pin in the cluster The gain from moving logic block B to the cluster would be zero according to the gain function above However not only no edge from N would be covered but also one input pin of the cluster would be used for N So the gain of moving logic block B to the cluster due to N in terms of used pins per cluster is This means N has a degrading e ect on the routability according to the fourth routability factor As explained above by considering just the number of shared inputs and outputs as in Equation the packing algorithm cannot di erentiate among the candidate blocks which have di erent impacts on routability All possible cases yielding di erent total gains are presented in Table for one net connected to a candidate block By incorporating the other routability factors the gain for each logic block B going into cluster C can be computed as the weighted combination of di erent routability factors as follows Gain B C f Nets B Nets C X i Nets B g i Nets C B where g i C B a fin P i B P i C b fo P i B P i C i Nets C c T i B otherwise fin P i B P i C is de ned as the gain obtained in input pins of cluster C as de ned in Table Similarly fo P i B P i C is the gain obtained in output congestion The additional gain of value to the sum of these two gain terms corresponds to the edge gain T i B returns the type of the pin of Net i connected to basic block B It returns if the pin is an output pin and otherwise P i B is the set of all pins of Net i that are on block B P i C is the set of pins of Net i connected to cluster C Nets C is the set of nets connected to cluster C a b and c are the weights for di erent components of the function Inserting a whole multi terminal net in one cluster is practically impossible In most of the cases the best we can do is to eliminate two terminal nets Therefore reducing an edge from a multi terminal net should not be considered equivalent to reducing an edge by inserting a two terminal net inside a cluster The average gain that a block can take from an n terminal net i connected to one of its pins depending on type of the net can be estimated from Table as follows Gainavg i n n n n n n According to Equation the average gain obtained from a two terminal net is the highest This implies that the algorithm gives priority to pushing a two terminal net entirely inside a cluster as compared to reducing a pin of a multi terminal net This leads to a decrease in number of exposed nets satisfying the last routability factor When the net connected to a block is a multi terminal net the gain associated with the multi terminal net is computed for each block containing a terminal of the net Therefore each edge of a net can have di erent impact on their corresponding blocks when a cluster is being constructed How many and what type of a terminal of the net do already exist in the cluster What type of terminal of the net does the candidate block have Answers to these questions for each net connected to the candidate block determine the gain of the block Therefore we conclude that in the bottom up clustering gain weight of each edge should be assigned dynamically according to individual situations In the next sub sections we explain our method of packing the basic blocks inside the clusters based on the routability gain function mentioned We also analyze timing driven clustering algorithm in terms of routability based on routability gain function Equations and According to this analysis we integrate routability factors into timing driven clustering RPack Algorithm The input to our packing algorithm is a list of LUTs registers and connections among the resources In RPack similar to VPack in the rst stage a LUT and a register are packed into a basic logic block when possible After that the blocks are packed into clusters using a greedy heuristic Clusters are constructed sequentially First the seed is chosen from the unclustered basic blocks The criteria is to choose the block with the most used inputs as mentioned in After choosing a seed for a cluster the logic block that gives the highest gain is selected to be added to the current cluster provided that it is a legal choice This means that the number of external inputs do not exceed the number of input pins of the cluster The algorithm continues adding blocks into one cluster until the cluster is full or no more legal choices can be found Similarly new clusters are constructed until all the blocks are packed into clusters We propose RPack a routability driven packing algorithm based on routability factors described in previous sub section RPack is developed on top of VPack The di erence between the two approaches is in the de nition of gain function VPack uses the function de ned in Equation while RPack uses the gain function in Equation The pseudo code of our approach is shown in Figure The complexity of RPack Algorithm is O I M where M is the number of clusters Finding the seed for each cluster takes O M time using a priority queue to store the candidate nodes where M is the number of nodes basic blocks When a node v is inserted to a cluster only the gain of the neighbors of candidate nodes Candidate nodes are those who have not been assigned to any node so far need to be updated The number of neighbors is equal to the edge degree of the current node i e deg v When a neighbor is visited the type and status of the edges connected to the neighbor are checked which takes O deg v Note that when each neighbor node is visited the edges that belong to the same hyper edge multi terminal net is counted once However when a block is being added to a cluster the number of neighbors are all the nodes connected to the node by any edge i e deg v By amortized analysis it is observed that the gain of a node is updated at most once associated with any connection between the node and the neighbors Therefore the Input Netlist of LUTs and Registers N Cluster Size K LUT Size I Inputs per Cluster Output List of Logic Clusters Pack LUTs and Registers together into Basic Blocks while Unclustered Basic Blocks available Find Seed for new Cluster while Cluster is not full Update gains of unclustered Basic Blocks Candidate blocks Choose Basic Block with highest gain Pick a candidate block If Candidate is NOT feasible then Go to Step Else Remove block from unclustered blocks list Add block to current Cluster end while end while Figure RPack Pseudo code for Packing Algorithm total clustering process takes O P vi V G deg vi O jEj where E is the edge set of connectivity graph G Also E P i Net ni P i Net ni P where ni is the number of the terminals of the net i and P is the total number of pins for all clusters which is I M Based on this analysis the complexity of the algorithm can be expressed as O I M t RPack Timing Driven RPack By clustering the LUTs in coarser CLBs the complexity of interconnection between the CLBs is reduced Hence fewer number of routing resources is required Another bene t of clusters is the fast interconnection inside the clusters Those connections being packed inside the clusters use the hard wired interconnect resources of CLBs This leads to better performance In packing both objectives should be pursued In this paper our focus is mostly on routability In this section we discuss how routability is realized when timing is added into packing algorithm and based on our routability function we propose timing driven RPack After packing a subset of the netlist is routed inside the clusters without passing through switched routing resources By inserting the interconnection along the critical path of the circuit inside clusters delay can be improved As a result in timing driven clustering the priority is given to timing critical connections to be inserted inside the clusters In sequential bottom up clustering approach a seed for a cluster would be the most critical block The blocks are added to the cluster based on criticality In addition to timing routability has to be considered in clustering to avoid the routing congestion which is a bottleneck in current FPGAs However rst we should study the impact of timing based clustering on routability when choosing the seed and de ning the gain function based on criticality In Section the routability factors are described Based on that our routability gain function is de ned in Equation Using this model we can explain the routability issues in timing driven clustering After analyzing the approach we would be able to improve the routability more accurately This is where analysis and theory guide the heuristics The criticality of the blocks are de ned by their slack Connections along the critical paths have high criticality value Therefore clustering based on timing is similar to path based clustering Each path is a chain of output to input pin to pin connections between a set of blocks See Figure According to routability gain function Table the output to input connection has a high routability gain When a connection is marked to be critical it means that there is a long chain of input output connectivity from this point to the rest of the design This implies a prediction of high routability gain in later stages while constructing the cluster After a highly critical connection is added to a cluster more input output connections would be added to the cluster In other words criticality of a connection shows the depth of input output connections from the current connection to the rest of the circuit By inserting an edge of the net on the critical path timing driven packing exploits the routability obtained by inserting output and input pin of a net inside a cluster hence releasing an input pin of the cluster As explained above our routability model can express the routability impact of timing based clustering out in out out in in in in in out out out Figure Routability and Slack Computation The two terms of routability factors are inherently satis ed in criticality based analysis and slack compu tation for critical connections Other factors should be considered In addition routability for non critical nets should be taken into account during clustering Therefore we de ne the gain function as a linear combi nation of criticality and routability of a connection We use the same criticality function used in t VPack The routability component is routability gain function de ned in Equation shows the gain function used in timing driven RPack TotalGain B Criticality B RoutabilityGain B DepopulationFactor Another important issue is scaling the routability component in Equation When a cluster is just being constructed there are many available un used pins of the cluster In this stage the cluster desires to absorb as many connection as possible In later stages when most of the pins are used routability is more restricted and the used pins around the clusters create congestion around the block In this case depopulation can help improve the routability In addition when more blocks are added to the cluster the probability of getting higher gain in the later stages is increased due to the higher probability of existence of shared nets among the blocks This does not imply the higher routability due to higher connection tra c around the cluster Therefore scaling is required In order to achieve this the routability function value is scaled each time a block is added to the cluster The depopulation factor increases during the construction of a cluster The depopulation factor is de ned as follows DepopulationFactor UsedP in B UsedP in C UsedP in B and UsedP in C return the number of used pins of block B and cluster C respectively We need to mention that in t VPack the total gain function is a function of routability and criticality as well However the routability factor is same as the gain function used in VPack which is not a comprehensive routability function Also the routability is scaled by the number of pins of a LUT i e for input LUT This normalization remains constant during the clustering According to our discussion above this cannot re ect routability gain correctly With analysis and more accurate modeling of routability we are able to study the behavior of di erent methods of clustering in terms of routability and improve the approaches by having additional components considering other routability factors Our analysis in this section shows that timing and routability correlate very strongly Satisfying timing improves routability in some aspects That is why timing driven clustering outperforms a routability driven packing Our experimental results in the next section supports our claim as well EXPERIMENTAL RESULTS In previous sections we claimed that considering routability factors while packing logic into CLBs has signif icant impact in routing results and netlist complexity In this section we show a set of experimental results supporting our claim We have used the greedy clustering approach proposed in VPack and t VPack RPack is implemented on top of the clustering algorithms in V Pack and t RPack is implemented on top of t VPack The rst set of our experiments compares RPack and VPack We ran the largest MCNC benchmarks on VPack and RPack The blif input format of each benchmark is obtained by SIS logic minimization and FlowMap technology mapper The results presented in Table show that our method successfully decreased the number of exposed nets RPack and VPack use similar number of clusters such that the array size resulting from both approaches for almost all benchmarks is same Even in one case benchmark alu RPack yielded smaller array size The array size of each benchmark is reported in Table In accordance with average gain estimated in Section the results show that the major portion of the decrease in the number of the exposed nets is due to decrease in the number of two terminal nets In conclusion reducing the number of output pins is strongly related to reducing the number of exposed nets We also observed the congestion around each cluster We counted the number of exposed nets each cluster is connected to Figure shows the connectivity of the clusters resulted from VPack and RPack for benchmark bigkey The size of cluster is and number of input pins per cluster is The vertical axis shows the number of clusters for each number of pins used per cluster shown on the horizontal axis The plot shows that the clusters obtained from RPack have less tra c around In Figure the result for benchmark elliptic is shown as well As shown in the plot the connectivity obtained from RPack is more smoothly distributed compared to the one resulted from VPack In these two plots the type of interconnection is not re ected The number of terminals of the nets also a ects routability In order to verify that our method meets the objective of improving routability we synthesized the Benchmark bigkey 0 50 100 150 200 1 3 5 7 9 11 13 Number of Used Pins Nu m be r o f C lu st er s VPack RPack Figure Comparison of RPack and VPack in cluster characteristics in bigkey Benchmark Elliptic 0 50 100 150 200 250 300 350 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Number of Used Pins Per cluster Nu m be r o f C lu st er s VPack RPack Figure Comparison of VPack and RPack in cluster characteristics in elliptic benchmark circuits through the complete CAD ow to obtain the routing area We used VPR to place and route the benchmarks The routing architecture that we used employs only the single segment wires leading to better routability Subset switch type in which each track in a channel can be connected to the same track number of the neighboring channels is used In addition we have set the fraction of the tracks of each channel to which each logic block input and output pins connect to Table summarizes our results after placement and routing of the benchmarks As shown in Table RPack is able to improve the routing area by decreasing the number of tracks signi cantly The average improvement we obtained is The number of routing tracks required in each channel is a reliable metric for routing area since a smaller number of routing tracks does not only mean saving wiring area but also decreasing the size of the routing switches drastically Routing area is related to the square of the number of tracks per channel The improvement in routing area is over VPack on an average Such an improvement in routing area decreases the total chip area signi cantly since routing area is typically a large percentage of the total area The signi cant di erence between the two routability driven methods of VPack an RPack implies that each routability factor can a ect the routing results signi cantly According to the constraint on number of pins per CLB xed routing resources and xed number of LUTs in each CLB we considered di erent routability components in the gain function used in RPack The results support our claim that routability is an important objective in clustering and it results in a better distribution of interconnection among CLBs In the next set of experiments timing driven RPack is compared with t VPack in order to observe the impact of routability in timing driven packing It is not a correct comparison if RPack is compared with t VPack As mentioned in previous section timing based clustering inherently has e ective impact on routability itself Routability and speed both bene ts of cluster based FPGAs are realized by timing based packing algorithm Previous work shows that t VPack performs better in terms of routability compared to VPack The routing results critical path delay and number of exposed nets using both packing methods t RPack and t VPack are reported in Table t RPack uses our routability gain function as described in Section In order to observe the e ect of routability gain function depopulation factor is ignored The results show that considering other routability factors and more accurate routability gain for non critical nets can improve the routing area by The delay is improved by The reason is that the weight for input output connection is high in both timing and routability component for critical nets We can observe that delay has been improved in most of cases In another set of experiments we added the depopulation factor to control the routability versus timing The results are shown in Table The routing area is improved by while the critical path delay is same on average This implies that depopulation helps to obtain a more distributed connectivity between the clusters The experimental results show that di erent routability factors have signi cant impact on routing re sults Timing driven packing has strong correlation with some of routability factors for FPGAs Integrating circuit Array Size Number of Exposed Nets Number of Tracks VPack RPack VPack RPack alu apex apex bigkey clma des di eq dsip elliptic ex ex p frisc misex pdc s s s seq spla tseng Average...

    [...]

  • ...2 Note that RPack has same complexity as VPack [15, 5]....

    [...]

  • ...[15] includes a good survey of packing methods for cluster-based FPGAs....

    [...]

  • ...N K I Routability driven Router circuit Array Size Number of Exposed Nets Number of Tracks Delay s t VPack t RPack t VPack t RPack t VPack t RPack alu apex apex bigkey clma des di eq dsip elliptic ex ex p frisc misex pdc s s s seq spla tseng Average...

    [...]

Book ChapterDOI
01 Sep 1997
TL;DR: In terms of minimizing routing area, VPR outperforms all published FPGA place and route tools to which the authors can compare and presents placement and routing results on a new set of circuits more typical of today's industrial designs.
Abstract: We describe the capabilities of and algorithms used in a new FPGA CAD tool, Versatile Place and Route (VPR). In terms of minimizing routing area, VPR outperforms all published FPGA place and route tools to which we can compare. Although the algorithms used are based on previously known approaches, we present several enhancements that improve run-time and quality. We present placement and routing results on a new set of large circuits to allow future benchmark comparisons of FPGA place and route tools on circuit sizes more typical of today's industrial designs.

1,133 citations


"ROUTABILITY-DRIVEN PACKING: METRICS..." refers background or methods in this paper

  • ...In RPack, similar to VPack [4, 15], in the rst stage, a LUT and a register are packed into a basic logic block when possible....

    [...]

  • ...cluster C If the output terminal of a net is inside the cluster internal connections can be used to connect the input pins of the net located inside the cluster In such a case there is no need to use an input pin of the cluster to connect the net N to other terminals of the net outside the cluster since an output pin of the cluster can be used for external interconnection Therefore the contribution of net N to block gain is more than just covering an edge of a multi terminal net Actually by adding block B to cluster C an input pin of the cluster gets free and can be used for another net connection In Table this is de ned as in pin gain This increases the probability of acceptance of adding the block to the cluster In other words the probability of violating the input constraints of the cluster decreases Note that each block output pin is accessible from outside and there is no sharing among the output pins Therefore there would be no output pin constraints for the clusters hence saving on output pins does not bring any gain except in one case Suppose all the input pins of a net are already inside the cluster and the logic block being added to a cluster contains the output pin of the net Net N in Figure is an example of such a case The output pin of the cluster corresponding to the block driving the net N cannot be used by other blocks This means that there would be no connection from outside to this pin since all the terminals of the corresponding net are located inside the cluster Therefore the number of external connections of the cluster de ned as output congestion gain in Table decreases This yields less congestion among the clusters In other words it reduces the number of used pins of a cluster which is the fourth routability factor Net N has no pin in the cluster The gain from moving logic block B to the cluster would be zero according to the gain function above However not only no edge from N would be covered but also one input pin of the cluster would be used for N So the gain of moving logic block B to the cluster due to N in terms of used pins per cluster is This means N has a degrading e ect on the routability according to the fourth routability factor As explained above by considering just the number of shared inputs and outputs as in Equation the packing algorithm cannot di erentiate among the candidate blocks which have di erent impacts on routability All possible cases yielding di erent total gains are presented in Table for one net connected to a candidate block By incorporating the other routability factors the gain for each logic block B going into cluster C can be computed as the weighted combination of di erent routability factors as follows Gain B C f Nets B Nets C X i Nets B g i Nets C B where g i C B a fin P i B P i C b fo P i B P i C i Nets C c T i B otherwise fin P i B P i C is de ned as the gain obtained in input pins of cluster C as de ned in Table Similarly fo P i B P i C is the gain obtained in output congestion The additional gain of value to the sum of these two gain terms corresponds to the edge gain T i B returns the type of the pin of Net i connected to basic block B It returns if the pin is an output pin and otherwise P i B is the set of all pins of Net i that are on block B P i C is the set of pins of Net i connected to cluster C Nets C is the set of nets connected to cluster C a b and c are the weights for di erent components of the function Inserting a whole multi terminal net in one cluster is practically impossible In most of the cases the best we can do is to eliminate two terminal nets Therefore reducing an edge from a multi terminal net should not be considered equivalent to reducing an edge by inserting a two terminal net inside a cluster The average gain that a block can take from an n terminal net i connected to one of its pins depending on type of the net can be estimated from Table as follows Gainavg i n n n n n n According to Equation the average gain obtained from a two terminal net is the highest This implies that the algorithm gives priority to pushing a two terminal net entirely inside a cluster as compared to reducing a pin of a multi terminal net This leads to a decrease in number of exposed nets satisfying the last routability factor When the net connected to a block is a multi terminal net the gain associated with the multi terminal net is computed for each block containing a terminal of the net Therefore each edge of a net can have di erent impact on their corresponding blocks when a cluster is being constructed How many and what type of a terminal of the net do already exist in the cluster What type of terminal of the net does the candidate block have Answers to these questions for each net connected to the candidate block determine the gain of the block Therefore we conclude that in the bottom up clustering gain weight of each edge should be assigned dynamically according to individual situations In the next sub sections we explain our method of packing the basic blocks inside the clusters based on the routability gain function mentioned We also analyze timing driven clustering algorithm in terms of routability based on routability gain function Equations and According to this analysis we integrate routability factors into timing driven clustering RPack Algorithm The input to our packing algorithm is a list of LUTs registers and connections among the resources In RPack similar to VPack in the rst stage a LUT and a register are packed into a basic logic block when possible After that the blocks are packed into clusters using a greedy heuristic Clusters are constructed sequentially First the seed is chosen from the unclustered basic blocks The criteria is to choose the block with the most used inputs as mentioned in After choosing a seed for a cluster the logic block that gives the highest gain is selected to be added to the current cluster provided that it is a legal choice This means that the number of external inputs do not exceed the number of input pins of the cluster The algorithm continues adding blocks into one cluster until the cluster is full or no more legal choices can be found Similarly new clusters are constructed until all the blocks are packed into clusters We propose RPack a routability driven packing algorithm based on routability factors described in previous sub section RPack is developed on top of VPack The di erence between the two approaches is in the de nition of gain function VPack uses the function de ned in Equation while RPack uses the gain function in Equation The pseudo code of our approach is shown in Figure The complexity of RPack Algorithm is O I M where M is the number of clusters Finding the seed for each cluster takes O M time using a priority queue to store the candidate nodes where M is the number of nodes basic blocks When a node v is inserted to a cluster only the gain of the neighbors of candidate nodes Candidate nodes are those who have not been assigned to any node so far need to be updated The number of neighbors is equal to the edge degree of the current node i e deg v When a neighbor is visited the type and status of the edges connected to the neighbor are checked which takes O deg v Note that when each neighbor node is visited the edges that belong to the same hyper edge multi terminal net is counted once However when a block is being added to a cluster the number of neighbors are all the nodes connected to the node by any edge i e deg v By amortized analysis it is observed that the gain of a node is updated at most once associated with any connection between the node and the neighbors Therefore the Input Netlist of LUTs and Registers N Cluster Size K LUT Size I Inputs per Cluster Output List of Logic Clusters Pack LUTs and Registers together into Basic Blocks while Unclustered Basic Blocks available Find Seed for new Cluster while Cluster is not full Update gains of unclustered Basic Blocks Candidate blocks Choose Basic Block with highest gain Pick a candidate block If Candidate is NOT feasible then Go to Step Else Remove block from unclustered blocks list Add block to current Cluster end while end while Figure RPack Pseudo code for Packing Algorithm total clustering process takes O P vi V G deg vi O jEj where E is the edge set of connectivity graph G Also E P i Net ni P i Net ni P where ni is the number of the terminals of the net i and P is the total number of pins for all clusters which is I M Based on this analysis the complexity of the algorithm can be expressed as O I M t RPack Timing Driven RPack By clustering the LUTs in coarser CLBs the complexity of interconnection between the CLBs is reduced Hence fewer number of routing resources is required Another bene t of clusters is the fast interconnection inside the clusters Those connections being packed inside the clusters use the hard wired interconnect resources of CLBs This leads to better performance In packing both objectives should be pursued In this paper our focus is mostly on routability In this section we discuss how routability is realized when timing is added into packing algorithm and based on our routability function we propose timing driven RPack After packing a subset of the netlist is routed inside the clusters without passing through switched routing resources By inserting the interconnection along the critical path of the circuit inside clusters delay can be improved As a result in timing driven clustering the priority is given to timing critical connections to be inserted inside the clusters In sequential bottom up clustering approach a seed for a cluster would be the most critical block The blocks are added to the cluster based on criticality In addition to timing routability has to be considered in clustering to avoid the routing congestion which is a bottleneck in current FPGAs However rst we should study the impact of timing based clustering on routability when choosing the seed and de ning the gain function based on criticality In Section the routability factors are described Based on that our routability gain function is de ned in Equation Using this model we can explain the routability issues in timing driven clustering After analyzing the approach we would be able to improve the routability more accurately This is where analysis and theory guide the heuristics The criticality of the blocks are de ned by their slack Connections along the critical paths have high criticality value Therefore clustering based on timing is similar to path based clustering Each path is a chain of output to input pin to pin connections between a set of blocks See Figure According to routability gain function Table the output to input connection has a high routability gain When a connection is marked to be critical it means that there is a long chain of input output connectivity from this point to the rest of the design This implies a prediction of high routability gain in later stages while constructing the cluster After a highly critical connection is added to a cluster more input output connections would be added to the cluster In other words criticality of a connection shows the depth of input output connections from the current connection to the rest of the circuit By inserting an edge of the net on the critical path timing driven packing exploits the routability obtained by inserting output and input pin of a net inside a cluster hence releasing an input pin of the cluster As explained above our routability model can express the routability impact of timing based clustering out in out out in in in in in out out out Figure Routability and Slack Computation The two terms of routability factors are inherently satis ed in criticality based analysis and slack compu tation for critical connections Other factors should be considered In addition routability for non critical nets should be taken into account during clustering Therefore we de ne the gain function as a linear combi nation of criticality and routability of a connection We use the same criticality function used in t VPack The routability component is routability gain function de ned in Equation shows the gain function used in timing driven RPack TotalGain B Criticality B RoutabilityGain B DepopulationFactor Another important issue is scaling the routability component in Equation When a cluster is just being constructed there are many available un used pins of the cluster In this stage the cluster desires to absorb as many connection as possible In later stages when most of the pins are used routability is more restricted and the used pins around the clusters create congestion around the block In this case depopulation can help improve the routability In addition when more blocks are added to the cluster the probability of getting higher gain in the later stages is increased due to the higher probability of existence of shared nets among the blocks This does not imply the higher routability due to higher connection tra c around the cluster Therefore scaling is required In order to achieve this the routability function value is scaled each time a block is added to the cluster The depopulation factor increases during the construction of a cluster The depopulation factor is de ned as follows DepopulationFactor UsedP in B UsedP in C UsedP in B and UsedP in C return the number of used pins of block B and cluster C respectively We need to mention that in t VPack the total gain function is a function of routability and criticality as well However the routability factor is same as the gain function used in VPack which is not a comprehensive routability function Also the routability is scaled by the number of pins of a LUT i e for input LUT This normalization remains constant during the clustering According to our discussion above this cannot re ect routability gain correctly With analysis and more accurate modeling of routability we are able to study the behavior of di erent methods of clustering in terms of routability and improve the approaches by having additional components considering other routability factors Our analysis in this section shows that timing and routability correlate very strongly Satisfying timing improves routability in some aspects That is why timing driven clustering outperforms a routability driven packing Our experimental results in the next section supports our claim as well EXPERIMENTAL RESULTS In previous sections we claimed that considering routability factors while packing logic into CLBs has signif icant impact in routing results and netlist complexity In this section we show a set of experimental results supporting our claim We have used the greedy clustering approach proposed in VPack and t VPack RPack is implemented on top of the clustering algorithms in V Pack and t RPack is implemented on top of t VPack The rst set of our experiments compares RPack and VPack We ran the largest MCNC benchmarks on VPack and RPack The blif input format of each benchmark is obtained by SIS logic minimization and FlowMap technology mapper The results presented in Table show that our method successfully decreased the number of exposed nets RPack and VPack use similar number of clusters such that the array size resulting from both approaches for almost all benchmarks is same Even in one case benchmark alu RPack yielded smaller array size The array size of each benchmark is reported in Table In accordance with average gain estimated in Section the results show that the major portion of the decrease in the number of the exposed nets is due to decrease in the number of two terminal nets In conclusion reducing the number of output pins is strongly related to reducing the number of exposed nets We also observed the congestion around each cluster We counted the number of exposed nets each cluster is connected to Figure shows the connectivity of the clusters resulted from VPack and RPack for benchmark bigkey The size of cluster is and number of input pins per cluster is The vertical axis shows the number of clusters for each number of pins used per cluster shown on the horizontal axis The plot shows that the clusters obtained from RPack have less tra c around In Figure the result for benchmark elliptic is shown as well As shown in the plot the connectivity obtained from RPack is more smoothly distributed compared to the one resulted from VPack In these two plots the type of interconnection is not re ected The number of terminals of the nets also a ects routability In order to verify that our method meets the objective of improving routability we synthesized the Benchmark bigkey 0 50 100 150 200 1 3 5 7 9 11 13 Number of Used Pins Nu m be r o f C lu st er s VPack RPack Figure Comparison of RPack and VPack in cluster characteristics in bigkey Benchmark Elliptic 0 50 100 150 200 250 300 350 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Number of Used Pins Per cluster Nu m be r o f C lu st er s VPack RPack Figure Comparison of VPack and RPack in cluster characteristics in elliptic benchmark circuits through the complete CAD ow to obtain the routing area We used VPR to place and route the benchmarks The routing architecture that we used employs only the single segment wires leading to better routability Subset switch type in which each track in a channel can be connected to the same track number of the neighboring channels is used In addition we have set the fraction of the tracks of each channel to which each logic block input and output pins connect to Table summarizes our results after placement and routing of the benchmarks As shown in Table RPack is able to improve the routing area by decreasing the number of tracks signi cantly The average improvement we obtained is The number of routing tracks required in each channel is a reliable metric for routing area since a smaller number of routing tracks does not only mean saving wiring area but also decreasing the size of the routing switches drastically Routing area is related to the square of the number of tracks per channel The improvement in routing area is over VPack on an average Such an improvement in routing area decreases the total chip area signi cantly since routing area is typically a large percentage of the total area The signi cant di erence between the two routability driven methods of VPack an RPack implies that each routability factor can a ect the routing results signi cantly According to the constraint on number of pins per CLB xed routing resources and xed number of LUTs in each CLB we considered di erent routability components in the gain function used in RPack The results support our claim that routability is an important objective in clustering and it results in a better distribution of interconnection among CLBs In the next set of experiments timing driven RPack is compared with t VPack in order to observe the impact of routability in timing driven packing It is not a correct comparison if RPack is compared with t VPack As mentioned in previous section timing based clustering inherently has e ective impact on routability itself Routability and speed both bene ts of cluster based FPGAs are realized by timing based packing algorithm Previous work shows that t VPack performs better in terms of routability compared to VPack The routing results critical path delay and number of exposed nets using both packing methods t RPack and t VPack are reported in Table t RPack uses our routability gain function as described in Section In order to observe the e ect of routability gain function depopulation factor is ignored The results show that considering other routability factors and more accurate routability gain for non critical nets can improve the routing area by The delay is improved by The reason is that the weight for input output connection is high in both timing and routability component for critical nets We can observe that delay has been improved in most of cases In another set of experiments we added the depopulation factor to control the routability versus timing The results are shown in Table The routing area is improved by while the critical path delay is same on average This implies that depopulation helps to obtain a more distributed connectivity between the clusters The experimental results show that di erent routability factors have signi cant impact on routing re sults Timing driven packing has strong correlation with some of routability factors for FPGAs Integrating circuit Array Size Number of Exposed Nets Number of Tracks VPack RPack VPack RPack alu apex apex bigkey clma des di eq dsip elliptic ex ex p frisc misex pdc s s s seq spla tseng Average...

    [...]

  • ...N K I Routability driven Router circuit Array Size Number of Exposed Nets Number of Tracks Delay s t VPack t RPack t VPack t RPack t VPack t RPack alu apex apex bigkey clma des di eq dsip elliptic ex ex p frisc misex pdc s s s seq spla tseng Average...

    [...]

  • ...Most of an FPGA s area and delay are due to routing Considering routability at earlier steps of the CAD ow would both yield better quality and faster design process In this paper we discuss the metrics that a ect routability in packing logic into clusters We are presenting a routability driven clustering method for cluster based FPGAs Our method packs LUTs into logic clusters while incorporating routability metrics into a cost function Based on our routability model the routability in timing driven packing algorithm is analyzed We integrate our routability model into a timing driven packing algorithm Our method yields up to improvement in terms of the minimum number of routing tracks compared to VPack on average The average routing area improvement is over VPack and over t VPack Keywords VLSI CAD Field Programmable Gate Arrays FPGAs Technology mapping Clustering Techniques Optimization Algorithm INTRODUCTION Today s technology allows FPGAs to be designed as multi million system gate devices at the heart of elec tronic systems Since FPGA is an integral part of many digital systems the signi cance of optimization problems in mapping circuits on FPGA has increased There are two important issues related to the FPGA mapping process the quality of the resulting mapping and the run time of the tools serving in the process The former being more dominant for FPGAs both aspects are important Similar to ASIC design minimiz ing the delay is an important objective as well as minimizing the silicon area Area of an FPGA consists of routing area and logic area Optimizing the utilization of both routing and logic resources is very crucial to obtain a good quality result FPGAs consist of smaller con gurable building blocks called logic blocks or Con gurable Logic Blocks CLBs which are placed on the FPGA chip either on a two dimensional array see Figure or in a set of rows The CAD ow of mapping a circuit on FPGA consists of four major stages In the rst stage the circuit is basically logically optimized In stage the optimized circuit is divided into CLBs of the FPGA which is called technology mapping Placement and routing stages accomplish the assignment of subcircuits on CLBs and programming the routing switches of FPGA Due to highly constrained and discrete interconnect structure of current FPGAs routing is a challenging problem Most of the time current FPGA routers cannot use available routing resources e ciently This leads to a large portion of the routing area to be wasted Also depending on the complexity of the particular design routing might require a fairly large amount of time often several hours to be completed Hence considering routability at earlier steps of the CAD ow would both yield a better quality of the result and less design time in later stages FPGA vendors have di erent logic block con gurations There are two kinds of CLBs LUT based blocks and multiplexor based blocks LUT based logic blocks are more popular There have been several contribu tions in development and design of FPGAs towards reducing the gap in density and performance between ASIC and FPGA implementation Hierarchical features have been added into logic and routing architecture of FPGAs Many commercial FPGAs such as Xilinx Altera and Actel FPGAs include logic blocks that contain several LUTs A collection of basic logic elements that are grouped together to be placed in one complex logic block is called a cluster See Figure a FPGAs with logic blocks containing multiple basic blocks are called cluster based FPGAs Each CLB con gurable logic block is a cluster of basic logic elements in cluster based FPGAs The structure and granularity of the logic block have a signi cant impact on the area e ciency and performance of the FPGA If the logic block is ne grained the circuit to be im plemented will be distributed over more number of logic blocks This has a negative impact on routability since more blocks need to be interconnected Since the interconnect inside the logic blocks is hardwired local interconnect can be made very fast and e ciently This improves routability and decreases the load on the router signi cantly by reducing the size of problem Two main bene ts of clustering a basic block into CLBs are speed in compilation and circuit delay improvement On the other hand it is not feasible to increase the complexity of the logic blocks beyond a certain limit If the logic blocks become too complex it becomes di cult to utilize them fully hence several logic blocks will be wasted Due to constraints on the number of input pins and the number of blocks within each cluster all the resources in a cluster cannot be used in circuit implementation The task of assigning basic logic blocks to clusters is called packing Due to no accurate means to estimate the interconnect at logic synthesis level it is not easy to deal with routability of circuit at logic level However if special properties of the interconnect available at logic level such as sharing among the pins can be exploited during packing logics into basic blocks signi cant gains can be obtained in terms of routability In the past routability at the packing stage has not been considered as extensively as it has been at the technology mapping stage Packing can bring improvements on the routability since after technology mapping a more accurate estimation on the interconnect is available In this paper we propose a routability driven packing algorithm We show improvements in routing area upon the state of the art logic packing algorithms called VPack and t VPack Logic Block Packing Algorithm We are introducing a new method to consider routability at the packing stage Our method in selecting a block for clustering can easily be integrated with other clustering algorithms We are demonstrating the e ect of our method on the routability by synthesizing the benchmark circuits through the complete CAD ow We have technology mapped a given circuit then applied our routability driven packing method for clustering and nally placed and routed the circuit We present the results of the nal routing and show that our method improves the routability signi cantly Our new algorithm RPack indeed improves routability compared to VPack As our results on largest MCNC benchmarks show in Section we are able to improve the minimum required number of routing tracks by on an average A preliminary version of this work appeared in We also integrated our routability function in timing driven packing algorithm Based on our routability model routability in timing driven packing algorithm is analyzed Compared to t VPack the routing area is improved by on an average The organization of the paper is as follows Previous work on routability driven technology mapping and algorithms for cluster packing are discussed in Section Section describes the FPGA architecture we are targeting utilization and routability issues and problem formulation for the packing problem In Section RPack our routability driven packing method is described Experimental results are presented in Section Section includes conclusions and future work Previous Work Most commercial FPGAs use con gurable blocks containing several LUT Packing LUTs into clusters is an important design step introduced for cluster based FPGAs It can be viewed as a sub task within technology mapping stage in which logic gates are assigned to LUTs and registers We will rst mention contributions made in the technology mapping area The majority of research devoted to technology mapping has been done with the objective of improving either timing or area e ciency or trade o between depth and area Compared to the amount of the e ort made in this area there is little work done in the routability driven technology mapping domain The routability driven technology mapper for LUT based arrays Rmap employs a mapping strategy that considers routability The packing problem is a clustering problem Clustering has been studied extensively for various ap plications such as placement technology mapping etc Packing is a clustering problem with constraints on the number of input pins and the number of LUTs in each CLB The objective is to minimize the number of required CLBs to cover all the LUTs while satisfying the constraints Betz and Rose proposed VPack and t VPack logic block packing algorithms for cluster based FPGAs VPack and t VPack are one of the best known packing tools for FPGAs VPack rst packs a ip op and a LUT together into a basic logic element using a matching based method Then these BLEs are packed in a greedy manner into logic clusters with the local optimization objectives being to ll each cluster to its capacity and minimize the number of used inputs to each cluster This approach is inspired from In a timing driven packing tool for FPGAs t VPack is proposed The blocks on the critical path are preferred to be packed together in a CLB so that the delay can be improved by exploiting local wiring in the CLB to route the critical nets t VPack delivers a better routability compared to VPack Later we will describe the routability potential in timing driven packing algorithms Also in a packing approach is proposed based on maximum weight matching on circuit graph Recently researchers in have proposed a new technique for packing logic into clusters Based on Rent s rule for each application the connectivity of each cluster is de ned In this ap proach routability is weighted according to the connectivity of the application It is a good idea to consider routability based on connectivity of the circuit On the other hand the weight of routability in the overall optimization objective is xed during clustering for each application By this way routability cannot be considered accurately In this work we scale the weight of the routability factor dynamically In a good survey of packing methods for cluster based FPGAs is presented In all these approaches when a logic block is packed into an existing cluster the type of nets being shared is not considered An important issue in cluster based FPGAs is the limited number of inputs Therefore considering the input output pin sharing besides edge covering can improve the performance In this paper we analyze the issues during the packing process extensively We are introducing new metrics that are used to form a new objective function to evaluate routability We took the algorithm of VPack as a basis as we will describe in later sections and have built our own approach upon it PACKING IN CLUSTER BASED FPGAS In this section we will study the issues in packing stage of technology mapping for cluster based FPGAs Also the routability driven packing problem is formulated Cluster based FPGA Architecture The FPGA we are targeting is of the SRAM based island style structure It contains a square matrix of logic blocks Between each row and column routing tracks are located The structure of the basic logic block is illustrated in Figure b It contains a K input LUT and one ip op A K input LUT is able to implement any function of its K inputs K functions However size of look up table grows exponentially with the number of inputs It has been shown that LUT with input size is the most area e cient con guration The logic cluster is shown in Figure a The cluster size N is de ned as the number of basic blocks contained in the cluster The cluster takes I inputs that are connected to the LUTs inside basic blocks Not all N basic block inputs are accessible externally Only I out of these are connected to input multiplexors of the cluster These input multiplexors allow any of the I inputs to be connected to any of the N basic block inputs Also any output of N basic blocks can be connected to any basic block input through these multiplexors The cluster contains N output pins connecting each basic block output to one cluster output Similar structure is used in In packing stage of CAD ow for cluster based FPGAs the input circuit is represented in terms of LUTs and registers As shown in Figure c if LUT l is followed by register r and there is no interconnection to any other elements from the net connecting LUT l and register r they both can be implemented by a basic logic block shown in Figure c Otherwise each register or LUT should be assigned to one basic logic block An optimal pattern matching based method to pack the register LUT pairs into basic blocks is proposed in Hence the problem is simpli ed to packing a set of basic blocks into clusters We are focusing on clustering the basic blocks into logic clusters after each register and LUT are assigned to a basic logic block K-input K-input LUT LUT l l FLIP FLOP (reg ) FLIP FLOP (reg ) r r Logic Clusters (c) Basic Block Basic Block Basic Block ... .. ....

    [...]

  • ...Betz and Rose proposed VPack, a logic block packing algorithm [4] for cluster-based FPGAs....

    [...]

Journal ArticleDOI
TL;DR: A theoretical breakthrough is presented which shows that the LUT-based FPGA technology mapping problem for depth minimization can be solved optimally in polynomial time.
Abstract: The field programmable gate-array (FPGA) has become an important technology in VLSI ASIC designs. In the past few years, a number of heuristic algorithms have been proposed for technology mapping in lookup-table (LUT) based FPGA designs, but none of them guarantees optimal solutions for general Boolean networks and little is known about how far their solutions are away from the optimal ones. This paper presents a theoretical breakthrough which shows that the LUT-based FPGA technology mapping problem for depth minimization can be solved optimally in polynomial time. A key step in our algorithm is to compute a minimum height K-feasible cut in a network, which is solved optimally in polynomial time based on network flow computation. Our algorithm also effectively minimizes the number of LUT's by maximizing the volume of each cut and by several post-processing operations. Based on these results, we have implemented an LUT-based FPGA mapping package called FlowMap. We have tested FlowMap on a large set of benchmark examples and compared it with other LUT-based FPGA mapping algorithms for delay optimization, including Chortle-d, MIS-pga-delay, and DAG-Map. FlowMap reduces the LUT network depth by up to 7% and reduces the number of LUT's by up to 50% compared to the three previous methods. >

719 citations


"ROUTABILITY-DRIVEN PACKING: METRICS..." refers background or methods in this paper

  • ...cluster C If the output terminal of a net is inside the cluster internal connections can be used to connect the input pins of the net located inside the cluster In such a case there is no need to use an input pin of the cluster to connect the net N to other terminals of the net outside the cluster since an output pin of the cluster can be used for external interconnection Therefore the contribution of net N to block gain is more than just covering an edge of a multi terminal net Actually by adding block B to cluster C an input pin of the cluster gets free and can be used for another net connection In Table this is de ned as in pin gain This increases the probability of acceptance of adding the block to the cluster In other words the probability of violating the input constraints of the cluster decreases Note that each block output pin is accessible from outside and there is no sharing among the output pins Therefore there would be no output pin constraints for the clusters hence saving on output pins does not bring any gain except in one case Suppose all the input pins of a net are already inside the cluster and the logic block being added to a cluster contains the output pin of the net Net N in Figure is an example of such a case The output pin of the cluster corresponding to the block driving the net N cannot be used by other blocks This means that there would be no connection from outside to this pin since all the terminals of the corresponding net are located inside the cluster Therefore the number of external connections of the cluster de ned as output congestion gain in Table decreases This yields less congestion among the clusters In other words it reduces the number of used pins of a cluster which is the fourth routability factor Net N has no pin in the cluster The gain from moving logic block B to the cluster would be zero according to the gain function above However not only no edge from N would be covered but also one input pin of the cluster would be used for N So the gain of moving logic block B to the cluster due to N in terms of used pins per cluster is This means N has a degrading e ect on the routability according to the fourth routability factor As explained above by considering just the number of shared inputs and outputs as in Equation the packing algorithm cannot di erentiate among the candidate blocks which have di erent impacts on routability All possible cases yielding di erent total gains are presented in Table for one net connected to a candidate block By incorporating the other routability factors the gain for each logic block B going into cluster C can be computed as the weighted combination of di erent routability factors as follows Gain B C f Nets B Nets C X i Nets B g i Nets C B where g i C B a fin P i B P i C b fo P i B P i C i Nets C c T i B otherwise fin P i B P i C is de ned as the gain obtained in input pins of cluster C as de ned in Table Similarly fo P i B P i C is the gain obtained in output congestion The additional gain of value to the sum of these two gain terms corresponds to the edge gain T i B returns the type of the pin of Net i connected to basic block B It returns if the pin is an output pin and otherwise P i B is the set of all pins of Net i that are on block B P i C is the set of pins of Net i connected to cluster C Nets C is the set of nets connected to cluster C a b and c are the weights for di erent components of the function Inserting a whole multi terminal net in one cluster is practically impossible In most of the cases the best we can do is to eliminate two terminal nets Therefore reducing an edge from a multi terminal net should not be considered equivalent to reducing an edge by inserting a two terminal net inside a cluster The average gain that a block can take from an n terminal net i connected to one of its pins depending on type of the net can be estimated from Table as follows Gainavg i n n n n n n According to Equation the average gain obtained from a two terminal net is the highest This implies that the algorithm gives priority to pushing a two terminal net entirely inside a cluster as compared to reducing a pin of a multi terminal net This leads to a decrease in number of exposed nets satisfying the last routability factor When the net connected to a block is a multi terminal net the gain associated with the multi terminal net is computed for each block containing a terminal of the net Therefore each edge of a net can have di erent impact on their corresponding blocks when a cluster is being constructed How many and what type of a terminal of the net do already exist in the cluster What type of terminal of the net does the candidate block have Answers to these questions for each net connected to the candidate block determine the gain of the block Therefore we conclude that in the bottom up clustering gain weight of each edge should be assigned dynamically according to individual situations In the next sub sections we explain our method of packing the basic blocks inside the clusters based on the routability gain function mentioned We also analyze timing driven clustering algorithm in terms of routability based on routability gain function Equations and According to this analysis we integrate routability factors into timing driven clustering RPack Algorithm The input to our packing algorithm is a list of LUTs registers and connections among the resources In RPack similar to VPack in the rst stage a LUT and a register are packed into a basic logic block when possible After that the blocks are packed into clusters using a greedy heuristic Clusters are constructed sequentially First the seed is chosen from the unclustered basic blocks The criteria is to choose the block with the most used inputs as mentioned in After choosing a seed for a cluster the logic block that gives the highest gain is selected to be added to the current cluster provided that it is a legal choice This means that the number of external inputs do not exceed the number of input pins of the cluster The algorithm continues adding blocks into one cluster until the cluster is full or no more legal choices can be found Similarly new clusters are constructed until all the blocks are packed into clusters We propose RPack a routability driven packing algorithm based on routability factors described in previous sub section RPack is developed on top of VPack The di erence between the two approaches is in the de nition of gain function VPack uses the function de ned in Equation while RPack uses the gain function in Equation The pseudo code of our approach is shown in Figure The complexity of RPack Algorithm is O I M where M is the number of clusters Finding the seed for each cluster takes O M time using a priority queue to store the candidate nodes where M is the number of nodes basic blocks When a node v is inserted to a cluster only the gain of the neighbors of candidate nodes Candidate nodes are those who have not been assigned to any node so far need to be updated The number of neighbors is equal to the edge degree of the current node i e deg v When a neighbor is visited the type and status of the edges connected to the neighbor are checked which takes O deg v Note that when each neighbor node is visited the edges that belong to the same hyper edge multi terminal net is counted once However when a block is being added to a cluster the number of neighbors are all the nodes connected to the node by any edge i e deg v By amortized analysis it is observed that the gain of a node is updated at most once associated with any connection between the node and the neighbors Therefore the Input Netlist of LUTs and Registers N Cluster Size K LUT Size I Inputs per Cluster Output List of Logic Clusters Pack LUTs and Registers together into Basic Blocks while Unclustered Basic Blocks available Find Seed for new Cluster while Cluster is not full Update gains of unclustered Basic Blocks Candidate blocks Choose Basic Block with highest gain Pick a candidate block If Candidate is NOT feasible then Go to Step Else Remove block from unclustered blocks list Add block to current Cluster end while end while Figure RPack Pseudo code for Packing Algorithm total clustering process takes O P vi V G deg vi O jEj where E is the edge set of connectivity graph G Also E P i Net ni P i Net ni P where ni is the number of the terminals of the net i and P is the total number of pins for all clusters which is I M Based on this analysis the complexity of the algorithm can be expressed as O I M t RPack Timing Driven RPack By clustering the LUTs in coarser CLBs the complexity of interconnection between the CLBs is reduced Hence fewer number of routing resources is required Another bene t of clusters is the fast interconnection inside the clusters Those connections being packed inside the clusters use the hard wired interconnect resources of CLBs This leads to better performance In packing both objectives should be pursued In this paper our focus is mostly on routability In this section we discuss how routability is realized when timing is added into packing algorithm and based on our routability function we propose timing driven RPack After packing a subset of the netlist is routed inside the clusters without passing through switched routing resources By inserting the interconnection along the critical path of the circuit inside clusters delay can be improved As a result in timing driven clustering the priority is given to timing critical connections to be inserted inside the clusters In sequential bottom up clustering approach a seed for a cluster would be the most critical block The blocks are added to the cluster based on criticality In addition to timing routability has to be considered in clustering to avoid the routing congestion which is a bottleneck in current FPGAs However rst we should study the impact of timing based clustering on routability when choosing the seed and de ning the gain function based on criticality In Section the routability factors are described Based on that our routability gain function is de ned in Equation Using this model we can explain the routability issues in timing driven clustering After analyzing the approach we would be able to improve the routability more accurately This is where analysis and theory guide the heuristics The criticality of the blocks are de ned by their slack Connections along the critical paths have high criticality value Therefore clustering based on timing is similar to path based clustering Each path is a chain of output to input pin to pin connections between a set of blocks See Figure According to routability gain function Table the output to input connection has a high routability gain When a connection is marked to be critical it means that there is a long chain of input output connectivity from this point to the rest of the design This implies a prediction of high routability gain in later stages while constructing the cluster After a highly critical connection is added to a cluster more input output connections would be added to the cluster In other words criticality of a connection shows the depth of input output connections from the current connection to the rest of the circuit By inserting an edge of the net on the critical path timing driven packing exploits the routability obtained by inserting output and input pin of a net inside a cluster hence releasing an input pin of the cluster As explained above our routability model can express the routability impact of timing based clustering out in out out in in in in in out out out Figure Routability and Slack Computation The two terms of routability factors are inherently satis ed in criticality based analysis and slack compu tation for critical connections Other factors should be considered In addition routability for non critical nets should be taken into account during clustering Therefore we de ne the gain function as a linear combi nation of criticality and routability of a connection We use the same criticality function used in t VPack The routability component is routability gain function de ned in Equation shows the gain function used in timing driven RPack TotalGain B Criticality B RoutabilityGain B DepopulationFactor Another important issue is scaling the routability component in Equation When a cluster is just being constructed there are many available un used pins of the cluster In this stage the cluster desires to absorb as many connection as possible In later stages when most of the pins are used routability is more restricted and the used pins around the clusters create congestion around the block In this case depopulation can help improve the routability In addition when more blocks are added to the cluster the probability of getting higher gain in the later stages is increased due to the higher probability of existence of shared nets among the blocks This does not imply the higher routability due to higher connection tra c around the cluster Therefore scaling is required In order to achieve this the routability function value is scaled each time a block is added to the cluster The depopulation factor increases during the construction of a cluster The depopulation factor is de ned as follows DepopulationFactor UsedP in B UsedP in C UsedP in B and UsedP in C return the number of used pins of block B and cluster C respectively We need to mention that in t VPack the total gain function is a function of routability and criticality as well However the routability factor is same as the gain function used in VPack which is not a comprehensive routability function Also the routability is scaled by the number of pins of a LUT i e for input LUT This normalization remains constant during the clustering According to our discussion above this cannot re ect routability gain correctly With analysis and more accurate modeling of routability we are able to study the behavior of di erent methods of clustering in terms of routability and improve the approaches by having additional components considering other routability factors Our analysis in this section shows that timing and routability correlate very strongly Satisfying timing improves routability in some aspects That is why timing driven clustering outperforms a routability driven packing Our experimental results in the next section supports our claim as well EXPERIMENTAL RESULTS In previous sections we claimed that considering routability factors while packing logic into CLBs has signif icant impact in routing results and netlist complexity In this section we show a set of experimental results supporting our claim We have used the greedy clustering approach proposed in VPack and t VPack RPack is implemented on top of the clustering algorithms in V Pack and t RPack is implemented on top of t VPack The rst set of our experiments compares RPack and VPack We ran the largest MCNC benchmarks on VPack and RPack The blif input format of each benchmark is obtained by SIS logic minimization and FlowMap technology mapper The results presented in Table show that our method successfully decreased the number of exposed nets RPack and VPack use similar number of clusters such that the array size resulting from both approaches for almost all benchmarks is same Even in one case benchmark alu RPack yielded smaller array size The array size of each benchmark is reported in Table In accordance with average gain estimated in Section the results show that the major portion of the decrease in the number of the exposed nets is due to decrease in the number of two terminal nets In conclusion reducing the number of output pins is strongly related to reducing the number of exposed nets We also observed the congestion around each cluster We counted the number of exposed nets each cluster is connected to Figure shows the connectivity of the clusters resulted from VPack and RPack for benchmark bigkey The size of cluster is and number of input pins per cluster is The vertical axis shows the number of clusters for each number of pins used per cluster shown on the horizontal axis The plot shows that the clusters obtained from RPack have less tra c around In Figure the result for benchmark elliptic is shown as well As shown in the plot the connectivity obtained from RPack is more smoothly distributed compared to the one resulted from VPack In these two plots the type of interconnection is not re ected The number of terminals of the nets also a ects routability In order to verify that our method meets the objective of improving routability we synthesized the Benchmark bigkey 0 50 100 150 200 1 3 5 7 9 11 13 Number of Used Pins Nu m be r o f C lu st er s VPack RPack Figure Comparison of RPack and VPack in cluster characteristics in bigkey Benchmark Elliptic 0 50 100 150 200 250 300 350 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Number of Used Pins Per cluster Nu m be r o f C lu st er s VPack RPack Figure Comparison of VPack and RPack in cluster characteristics in elliptic benchmark circuits through the complete CAD ow to obtain the routing area We used VPR to place and route the benchmarks The routing architecture that we used employs only the single segment wires leading to better routability Subset switch type in which each track in a channel can be connected to the same track number of the neighboring channels is used In addition we have set the fraction of the tracks of each channel to which each logic block input and output pins connect to Table summarizes our results after placement and routing of the benchmarks As shown in Table RPack is able to improve the routing area by decreasing the number of tracks signi cantly The average improvement we obtained is The number of routing tracks required in each channel is a reliable metric for routing area since a smaller number of routing tracks does not only mean saving wiring area but also decreasing the size of the routing switches drastically Routing area is related to the square of the number of tracks per channel The improvement in routing area is over VPack on an average Such an improvement in routing area decreases the total chip area signi cantly since routing area is typically a large percentage of the total area The signi cant di erence between the two routability driven methods of VPack an RPack implies that each routability factor can a ect the routing results signi cantly According to the constraint on number of pins per CLB xed routing resources and xed number of LUTs in each CLB we considered di erent routability components in the gain function used in RPack The results support our claim that routability is an important objective in clustering and it results in a better distribution of interconnection among CLBs In the next set of experiments timing driven RPack is compared with t VPack in order to observe the impact of routability in timing driven packing It is not a correct comparison if RPack is compared with t VPack As mentioned in previous section timing based clustering inherently has e ective impact on routability itself Routability and speed both bene ts of cluster based FPGAs are realized by timing based packing algorithm Previous work shows that t VPack performs better in terms of routability compared to VPack The routing results critical path delay and number of exposed nets using both packing methods t RPack and t VPack are reported in Table t RPack uses our routability gain function as described in Section In order to observe the e ect of routability gain function depopulation factor is ignored The results show that considering other routability factors and more accurate routability gain for non critical nets can improve the routing area by The delay is improved by The reason is that the weight for input output connection is high in both timing and routability component for critical nets We can observe that delay has been improved in most of cases In another set of experiments we added the depopulation factor to control the routability versus timing The results are shown in Table The routing area is improved by while the critical path delay is same on average This implies that depopulation helps to obtain a more distributed connectivity between the clusters The experimental results show that di erent routability factors have signi cant impact on routing re sults Timing driven packing has strong correlation with some of routability factors for FPGAs Integrating circuit Array Size Number of Exposed Nets Number of Tracks VPack RPack VPack RPack alu apex apex bigkey clma des di eq dsip elliptic ex ex p frisc misex pdc s s s seq spla tseng Average...

    [...]

  • ...The majority of research devoted to technology mapping has been done with the objective of improving either timing [7, 12, 18, 22, 13] or area-e ciency [19, 24, 20] or trade-o between depth and area [19]....

    [...]

  • ...The blif input format of each benchmark is obtained by SIS [27] logic minimization and FlowMap [18] technology mapper....

    [...]

Frequently Asked Questions (1)
Q1. What have the authors contributed in "Routability-driven packing: metrics and algorithms for cluster-based fpgas" ?

In this paper the authors discuss the metrics that a ect routability in packing logic into clusters The authors are presenting a routability driven clustering method for cluster based FPGAs