Journal Article•DOI•

ROUTABILITY-DRIVEN PACKING: METRICS AND ALGORITHMS FOR CLUSTER-BASED FPGAs

Eli Bozorgzadeh¹, S. Ogrenci Memik², X. Yang, Majid Sarrafzadeh³•Institutions (3)

University of California, Irvine¹, Northwestern University², University of California, Los Angeles³

01 Feb 2004-Journal of Circuits, Systems, and Computers (World Scientific Publishing Company)-Vol. 13, Iss: 1, pp 77-100

TL;DR: A routability-driven clustering method for cluster-based FPGAs that packs LUTs into logic clusters while incorporating routability metrics into a cost function and integrates the routability model into a timing-driven packing algorithm.

read less

Abstract: Most of the FPGA's area and delay are due to routing. Considering routability at earlier steps of the CAD flow would both yield better quality and faster design process. In this paper, we discuss the metrics that affect routability in packing logic into clusters. We are presenting a routability-driven clustering method for cluster-based FPGAs. Our method packs LUTs into logic clusters while incorporating routability metrics into a cost function. Based on our routability model, the routability in timing-driven packing algorithm is analyzed. We integrate our routability model into a timing-driven packing algorithm. Our method yields up to 50% improvement in terms of the minimum number of routing tracks compared to VPack (16.5% on average). The average routing area improvement is 27% over VPack and 12% over t-VPack.

...read moreread less

Summary (1 min read)

Jump to: [Introduction] – [RoutabilityGain B j Nets B Nets C j] – [Table Routability Gain of a Candidate Block According to a Single Net] and [Table Logic Size Number of Exposed Nets Number of Routing Tracks and Critical Path t]

Introduction

The organization of the paper is as follows Previous work on routability driven technology mapping and algorithms for cluster packing are discussed in Section Section describes the FPGA architecture the authors are targeting utilization and routability issues and problem formulation for the packing problem.
The authors are introducing new metrics that are used to form a new objective function to evaluate routability.
By clustering the logic blocks the number of connections between clusters is reduced.

RoutabilityGain B j Nets B Nets C j

If these nets are observed closely their contributions to routability gain of the block are slightly di erent Block B has three common nets with cluster C N N and N.
In fact the gain function originates from the third routability factor discussed in Section i e reducing pins per net.

Table Routability Gain of a Candidate Block According to a Single Net

If the output terminal of a net is inside the cluster internal connections can be used to connect the input pins of the net located inside the cluster.
When the net connected to a block is a multi terminal net the gain associated with the multi terminal net is computed for each block containing a terminal of the net.
This implies a prediction of high routability gain in later stages while constructing the cluster After a highly critical connection is added to a cluster more input output connections would be added to the cluster The authors analysis in this section shows that timing and routability correlate very strongly Satisfying timing improves routability in some aspects.
The routing results critical path delay and number of exposed nets using both packing methods t RPack and t VPack are reported in Table t RPack uses their routability gain function as described in Section.

Table Logic Size Number of Exposed Nets Number of Routing Tracks and Critical Path t

In this paper the authors addressed routability issues and their impact on performance and routing area.
A routability driven packing method for cluster based FPGAs is proposed.
The authors method is able to improve the routability by decreasing the number of required tracks in the FPGA routing channels.
This improvement was achieved by incorporating several routability factors in their packing algorithm Based on their routability model the authors analyzed the timing driven packing Criticality of a connection in terms of timing re ects the role of this connection in routability as well.

Did you find this useful? Give us your feedback

Figures (1)

Content maybe subject to copyright Report

UCLA

UCLA Previously Published Works

Title

Routability-driven packing: Metrics and algorithms for cluster-based FPGAs

Permalink

https://escholarship.org/uc/item/4v53n326

Journal

Journal of Circuits Systems and Computers, 13(1)

ISSN

0218-1266

Authors

Bozorgzadeh, E

Memik, S O

Yang, X

et al.

Publication Date

2004-02-01

Peer reviewed

eScholarship.org Powered by the California Digital Library

University of California

Routability-driven Packing: Metrics and Algorithms for

Cluster-based FPGAs

E. Bozorgzadeh

S. Ogrenci Memik

X. Yang

M. Sarrafzadeh

Computer Science Department

University of California, Los Angeles (UCLA)

3531C Bo elter Hall

Los Angeles, CA 90095, USA

e-mail:

elib,seda,ma jid

@cs.ucla.edu

Synplicity Inc.

600 W California Ave.

Sunnyvale, CA 94086

email: xjyang@synplicity.com

ABSTRACT

Most of an FPGA's area and delay are due to routing. Considering routability at earlier steps of the CAD ow would

both yield better quality and faster design process. In this paper, we discuss the metrics that aect routability in

packing logic into clusters. We arepresenting aroutability-driven clustering method for cluster-based FPGAs. Our

method packs LUTs into logic clusters while incorporating routability metrics into a cost function. Based on our

routability model, the routability in timing-driven packing algorithm is analyzed. We integrate our routability model

into a timing-driven packing algorithm. Our method yields up to 50

improvement in terms of the minimum number

of routing tracks comparedtoVPack(

on average). The average routing areaimprovement is

27%

over VPack

and

12%

over t-VPack.

Keywords:

VLSI CAD, Field Programmable Gate Arrays (FPGAs), Technology mapping, Clustering Techniques,

Optimization, Algorithm.

CLB

S S

CLB

S S

CLB

Routing Segments

I/O Pad

Routing Switch Box

Configurable Logic Block

Figure 1:

Island style FPGA

1 INTRODUCTION

Today's technology allowsFPGAstobedesignedasmulti-million system gate devices at the heart of elec-

tronic systems. Since FPGA is an integral part of many digital systems, the signicance of optimization

problems in mapping circuits on FPGA has increased. There are two important issues related to the FPGA

mapping pro cess: the quality of the resulting mapping and the run-time of the to ols serving in the pro cess.

The former b eing more dominant for FPGAs, both aspects are imp ortant. Similar to ASIC design, minimiz-

ing the delayisanimportant ob jectiveaswell as minimizing the silicon area. Area of an FPGA consists of

routing area and logic area. Optimizing the utilization of b oth routing and logic resources is very crucial to

obtain a go od quality result.

FPGAs consist of smaller congurable building blo cks called logic blocks or Congurable Logic Blo cks

(CLBs), which are placed on the FPGA chip either on atwo-dimensional array (see Figure 1) or in a set

of rows. The CAD ow of mapping a circuit on FPGA consists of four ma jor stages. In the rst stage the

circuit is basically logically optimized. In stage 2, the optimized circuit is divided into CLBs of the FPGA,

which is called technology mapping. Placement and routing stages accomplish the assignment of sub circuits

on CLBs and programming the routing switches of FPGA.

Due to highly constrained and discrete interconnect structure of current FPGAs, routing is a challenging

problem. Most of the time current FPGA routers cannot use available routing resources eciently. This leads

to a large portion of the routing area to be wasted. Also, dep ending on the complexity of the particular

design routing might require a fairly large amount of time, often several hours to be completed. Hence

considering routability at earlier steps of the CAD owwould both yield a better quality of the result and

less design time in later stages.

FPGA vendors have dierent logic block congurations. There are two kinds of CLBs: LUT-based blo cks

and multiplexor-based blocks. LUT-based logic blo cks are more popular. There have b een several contribu-

tionsindevelopment and design of FPGAs towards reducing the gap in density and performance b etween

ASIC and FPGA implementation. Hierarchical features have been added into logic and routing architecture

of FPGAs. Many commercial FPGAs, such as Xilinx, Altera, and Actel FPGAs include logic blocks that

contain several LUTs 1]. A collection of basic logic elements that are group ed together to be placed in

one complex logic blo ck is called a

cluster

(See Figure 2(a)). FPGAs with logic blocks containing multiple

basic blo cks are called

cluster-based

FPGAs. Each CLB (congurable logic block) is a cluster of basic logic

elements in cluster-based FPGAs. The structure and granularity of the logic blo ckhave a signicant impact

on the area-eciency and p erformance of the FPGA. If the logic block is ne-grained, the circuit to be im-

plemented will be distributed over more number of logic blocks. This has a negative impact on routability,

since more blo cks need to b e interconnected. Since the interconnect inside the logic blocks is hardwired, lo cal

interconnect can b e made very fast and eciently. This improves routability and decreases the load on the

router signicantly by reducing the size of problem. Two main b enets of clustering a basic blo ckinto CLBs

are sp eed in compilation and circuit delay improvement. On the other hand, it is not feasible to increase

the complexityofthelogicblocks b eyond a certain limit. If the logic blo cks b ecome to o complex it b ecomes

dicult to utilize them fully, hence several logic blo cks will b e wasted. Due to constraints on the number

of input pins and the number of blo cks within each cluster, all the resources in a cluster cannot be used

in circuit implementation. The task of assigning basic logic blo cks to clusters is called

packing

. Due to no

accurate means to estimate the interconnect at logic synthesis level, it is not easy to deal with routabilityof

circuit at logic level. However, if special properties of the interconnect available at logic level, such as sharing

among the pins, can be exploited during packing logics into basic blocks, signicant gains can be obtained

in terms of routability. In the past routability at the packing stage has not b een considered as extensively

as it has been at the technology mapping stage. Packing can bring improvements on the routability,since

after technology mapping a more accurate estimation on the interconnect is available.

In this pap er we prop ose a routability-driven packing algorithm. Weshowimprovements in routing area

upon the state-of-the-art logic packing algorithms called VPack and t-VPack: Logic Blo ckPacking Algorithm

4, 6]. We are introducing a new method to consider routabilityatthepacking stage. Our method in selecting

a block for clustering can easily b e integrated with other clustering algorithms. We are demonstrating the

eect of our method on the routabilitybysynthesizing the benchmark circuits through the complete CAD

ow. Wehavetechnology mapp ed a given circuit, then applied our routability-driven packing method for

clustering, and nally placed and routed the circuit. We present the results of the nal routing and showthat

our method improves the routability signicantly. Our new algorithm, RPack, indeed improves routability

compared to VPack. As our results on 20 largest MCNC benchmarks show in Section 5, we are able to

improve the minimum required number of routing tracks by 16.5% on an average. A preliminary version of

this work appeared in 8]. We also integrated our routability function in timing-driven packing algorithm.

Based on our routability mo del, routabilityin timing-driven packing algorithm is analyzed. Compared to

t-VPack, the routing area is improved by 12% on an average.

The organization of the paper is as follows: Previous work on routability-driven technology mapping and

algorithms for cluster packing are discussed in Section 2. Section 3 describ es the FPGA architecture we are

targeting, utilization and routability issues and problem formulation for the packing problem. In Section 4

RPack, our routability-driven packing metho d is described. Exp erimental results are presented in Section 5.

Section 6 includes conclusions and future work.

2 Previous Work

Most commercial FPGAs use congurable blo cks containing several LUT. Packing LUTs into clusters is an

important design step intro duced for cluster-based FPGAs. It can b e viewed as a sub-task within technology

mapping stage in which logic gates are assigned to LUTs and registers. We will rst mention contributions

made in the technology mapping area. The ma jorityofresearchdevoted to technology mapping has b een

done with the ob jective of improving either timing 7, 12,18,22,13] or area-eciency 19,24,20] or trade-o

between depth and area 19]. Compared to the amount of the eort made in this area there is little work done

in the routabilitydriven technology mapping domain 15], 17]. The routability driven technology mapp er

for LUT-based arrays, Rmap 17], employs a mapping strategy that considers routability.

The packing problem is a clustering problem. Clustering has been studied extensively for various ap-

plications, such as placement 25], technology mapping 4, 17], etc. Packing is a clustering problem with

constraints on the number of input pins and the numberofLUTsineach CLB. The ob jectiveistominimize

the numb er of required CLBs to cover all the LUTs while satisfying the constraints. Betz and Rose proposed

VPackandt-VPack, logic blockpacking algorithms 4] for cluster-based FPGAs. VPack and t-VPack are

oneofthebest known packing tools for FPGAs. VPack rst packs a ip op and a LUT together into a

basic logic element using a matching based metho d. Then these BLEs are packed in a greedy manner into

logic clusters with the lo cal optimization ob jectives being to ll each cluster to its capacity and minimize the

number of used inputs to each cluster. This approach is inspired from 21]. In 6] a timing-driven packing

tool for FPGAs, t-VPack is proposed. The blocks on the critical path are preferred to b e packed together

in a CLB so that the delay can b e improved by exploiting lo cal wiring in the CLB to route the critical nets.

t-VPack delivers a better routability compared to VPack. Later, we will describ e the routability potential in

timing-driven packing algorithms. Also in 23], a packing approach is prop osed based on maximum weight

matching on circuit graph. Recently researchers in 9] have prop osed a new technique for packing logic into

clusters. Based on Rent's rule for each application, the connectivityofeach cluster is dened. In this ap-

proach routabilityisweighted according to the connectivity of the application. It is a goo d idea to consider

routability based on connectivity of the circuit. On the other hand, the weight of routability in the overall

HTML Viewer

References

PDF

Open Access

More filters

Johnson: Computers and Intractability-A Guide to the Theory of NP-Completeness

[...]

Michael Randolph Garey

01 Jan 1979

42,654 citations

Book•

Approximation Algorithms for NP-Hard Problems

[...]

Dorit S. Hochba¹•Institutions (1)

University of California, Berkeley¹

01 Jan 1996

TL;DR: This book reviews the design techniques for approximation algorithms and the developments in this area since its inception about three decades ago and the "closeness" to optimum that is achievable in polynomial time.

...read moreread less

Abstract: Approximation algorithms have developed in response to the impossibility of solving a great variety of important optimization problems. Too frequently, when attempting to get a solution for a problem, one is confronted with the fact that the problem is NP-hard. This, in the words of Garey and Johnson, means "I can't find an efficient algorithm, but neither can all of these famous people." While this is a significant theoretical step, it hardly qualifies as a cheering piece of news.If the optimal solution is unattainable then it is reasonable to sacrifice optimality and settle for a "good" feasible solution that can be computed efficiently. Of course, we would like to sacrifice as little optimality as possible, while gaining as much as possible in efficiency. Trading-off optimality in favor of tractability is the paradigm of approximation algorithms.The main themes of this book revolve around the design of such algorithms and the "closeness" to optimum that is achievable in polynomial time. To evaluate the limits of approximability, it is important to derive lower bounds or inapproximability results. In some cases, approximation algorithms must satisfy additional structural requirements such as being on-line, or working within limited space. This book reviews the design techniques for such algorithms and the developments in this area since its inception about three decades ago.

...read moreread less

2,488 citations

"ROUTABILITY-DRIVEN PACKING: METRICS..." refers background in this paper

...Many di erent heuristics have been proposed in clustering area [3]....
[...]

Book•

Architecture and CAD for Deep-Submicron FPGAS

[...]

Vaughn Betz, Jonathan Rose, Alexander Marquardt

31 Mar 1999

TL;DR: From the Publisher: Architecture and CAD for Deep-Submicron FPGAs addresses several key issues in the design of high-performance FPGA architectures and CAD tools, with particular emphasis on issues that are important for FPG as implemented in deep-submicron processes.

...read moreread less

Abstract: From the Publisher: Architecture and CAD for Deep-Submicron FPGAs addresses several key issues in the design of high-performance FPGA architectures and CAD tools, with particular emphasis on issues that are important for FPGAs implemented in deep-submicron processes. Three factors combine to determine the performance of an FPGA: the quality of the CAD tools used to map circuits into the FPGA, the quality of the FPGA architecture, and the electrical (i.e. transistor-level) design of the FPGA. Architecture and CAD for Deep-Submicron FPGAs examines all three of these issues in concert.

...read moreread less

1,335 citations

"ROUTABILITY-DRIVEN PACKING: METRICS..." refers background or methods in this paper

...In RPack, similar to VPack [4, 15], in the rst stage, a LUT and a register are packed into a basic logic block when possible....
[...]
...cluster C If the output terminal of a net is inside the cluster internal connections can be used to connect the input pins of the net located inside the cluster In such a case there is no need to use an input pin of the cluster to connect the net N to other terminals of the net outside the cluster since an output pin of the cluster can be used for external interconnection Therefore the contribution of net N to block gain is more than just covering an edge of a multi terminal net Actually by adding block B to cluster C an input pin of the cluster gets free and can be used for another net connection In Table this is de ned as in pin gain This increases the probability of acceptance of adding the block to the cluster In other words the probability of violating the input constraints of the cluster decreases Note that each block output pin is accessible from outside and there is no sharing among the output pins Therefore there would be no output pin constraints for the clusters hence saving on output pins does not bring any gain except in one case Suppose all the input pins of a net are already inside the cluster and the logic block being added to a cluster contains the output pin of the net Net N in Figure is an example of such a case The output pin of the cluster corresponding to the block driving the net N cannot be used by other blocks This means that there would be no connection from outside to this pin since all the terminals of the corresponding net are located inside the cluster Therefore the number of external connections of the cluster de ned as output congestion gain in Table decreases This yields less congestion among the clusters In other words it reduces the number of used pins of a cluster which is the fourth routability factor Net N has no pin in the cluster The gain from moving logic block B to the cluster would be zero according to the gain function above However not only no edge from N would be covered but also one input pin of the cluster would be used for N So the gain of moving logic block B to the cluster due to N in terms of used pins per cluster is This means N has a degrading e ect on the routability according to the fourth routability factor As explained above by considering just the number of shared inputs and outputs as in Equation the packing algorithm cannot di erentiate among the candidate blocks which have di erent impacts on routability All possible cases yielding di erent total gains are presented in Table for one net connected to a candidate block By incorporating the other routability factors the gain for each logic block B going into cluster C can be computed as the weighted combination of di erent routability factors as follows Gain B C f Nets B Nets C X i Nets B g i Nets C B where g i C B a fin P i B P i C b fo P i B P i C i Nets C c T i B otherwise fin P i B P i C is de ned as the gain obtained in input pins of cluster C as de ned in Table Similarly fo P i B P i C is the gain obtained in output congestion The additional gain of value to the sum of these two gain terms corresponds to the edge gain T i B returns the type of the pin of Net i connected to basic block B It returns if the pin is an output pin and otherwise P i B is the set of all pins of Net i that are on block B P i C is the set of pins of Net i connected to cluster C Nets C is the set of nets connected to cluster C a b and c are the weights for di erent components of the function Inserting a whole multi terminal net in one cluster is practically impossible In most of the cases the best we can do is to eliminate two terminal nets Therefore reducing an edge from a multi terminal net should not be considered equivalent to reducing an edge by inserting a two terminal net inside a cluster The average gain that a block can take from an n terminal net i connected to one of its pins depending on type of the net can be estimated from Table as follows Gainavg i n n n n n n According to Equation the average gain obtained from a two terminal net is the highest This implies that the algorithm gives priority to pushing a two terminal net entirely inside a cluster as compared to reducing a pin of a multi terminal net This leads to a decrease in number of exposed nets satisfying the last routability factor When the net connected to a block is a multi terminal net the gain associated with the multi terminal net is computed for each block containing a terminal of the net Therefore each edge of a net can have di erent impact on their corresponding blocks when a cluster is being constructed How many and what type of a terminal of the net do already exist in the cluster What type of terminal of the net does the candidate block have Answers to these questions for each net connected to the candidate block determine the gain of the block Therefore we conclude that in the bottom up clustering gain weight of each edge should be assigned dynamically according to individual situations In the next sub sections we explain our method of packing the basic blocks inside the clusters based on the routability gain function mentioned We also analyze timing driven clustering algorithm in terms of routability based on routability gain function Equations and According to this analysis we integrate routability factors into timing driven clustering RPack Algorithm The input to our packing algorithm is a list of LUTs registers and connections among the resources In RPack similar to VPack in the rst stage a LUT and a register are packed into a basic logic block when possible After that the blocks are packed into clusters using a greedy heuristic Clusters are constructed sequentially First the seed is chosen from the unclustered basic blocks The criteria is to choose the block with the most used inputs as mentioned in After choosing a seed for a cluster the logic block that gives the highest gain is selected to be added to the current cluster provided that it is a legal choice This means that the number of external inputs do not exceed the number of input pins of the cluster The algorithm continues adding blocks into one cluster until the cluster is full or no more legal choices can be found Similarly new clusters are constructed until all the blocks are packed into clusters We propose RPack a routability driven packing algorithm based on routability factors described in previous sub section RPack is developed on top of VPack The di erence between the two approaches is in the de nition of gain function VPack uses the function de ned in Equation while RPack uses the gain function in Equation The pseudo code of our approach is shown in Figure The complexity of RPack Algorithm is O I M where M is the number of clusters Finding the seed for each cluster takes O M time using a priority queue to store the candidate nodes where M is the number of nodes basic blocks When a node v is inserted to a cluster only the gain of the neighbors of candidate nodes Candidate nodes are those who have not been assigned to any node so far need to be updated The number of neighbors is equal to the edge degree of the current node i e deg v When a neighbor is visited the type and status of the edges connected to the neighbor are checked which takes O deg v Note that when each neighbor node is visited the edges that belong to the same hyper edge multi terminal net is counted once However when a block is being added to a cluster the number of neighbors are all the nodes connected to the node by any edge i e deg v By amortized analysis it is observed that the gain of a node is updated at most once associated with any connection between the node and the neighbors Therefore the Input Netlist of LUTs and Registers N Cluster Size K LUT Size I Inputs per Cluster Output List of Logic Clusters Pack LUTs and Registers together into Basic Blocks while Unclustered Basic Blocks available Find Seed for new Cluster while Cluster is not full Update gains of unclustered Basic Blocks Candidate blocks Choose Basic Block with highest gain Pick a candidate block If Candidate is NOT feasible then Go to Step Else Remove block from unclustered blocks list Add block to current Cluster end while end while Figure RPack Pseudo code for Packing Algorithm total clustering process takes O P vi V G deg vi O jEj where E is the edge set of connectivity graph G Also E P i Net ni P i Net ni P where ni is the number of the terminals of the net i and P is the total number of pins for all clusters which is I M Based on this analysis the complexity of the algorithm can be expressed as O I M t RPack Timing Driven RPack By clustering the LUTs in coarser CLBs the complexity of interconnection between the CLBs is reduced Hence fewer number of routing resources is required Another bene t of clusters is the fast interconnection inside the clusters Those connections being packed inside the clusters use the hard wired interconnect resources of CLBs This leads to better performance In packing both objectives should be pursued In this paper our focus is mostly on routability In this section we discuss how routability is realized when timing is added into packing algorithm and based on our routability function we propose timing driven RPack After packing a subset of the netlist is routed inside the clusters without passing through switched routing resources By inserting the interconnection along the critical path of the circuit inside clusters delay can be improved As a result in timing driven clustering the priority is given to timing critical connections to be inserted inside the clusters In sequential bottom up clustering approach a seed for a cluster would be the most critical block The blocks are added to the cluster based on criticality In addition to timing routability has to be considered in clustering to avoid the routing congestion which is a bottleneck in current FPGAs However rst we should study the impact of timing based clustering on routability when choosing the seed and de ning the gain function based on criticality In Section the routability factors are described Based on that our routability gain function is de ned in Equation Using this model we can explain the routability issues in timing driven clustering After analyzing the approach we would be able to improve the routability more accurately This is where analysis and theory guide the heuristics The criticality of the blocks are de ned by their slack Connections along the critical paths have high criticality value Therefore clustering based on timing is similar to path based clustering Each path is a chain of output to input pin to pin connections between a set of blocks See Figure According to routability gain function Table the output to input connection has a high routability gain When a connection is marked to be critical it means that there is a long chain of input output connectivity from this point to the rest of the design This implies a prediction of high routability gain in later stages while constructing the cluster After a highly critical connection is added to a cluster more input output connections would be added to the cluster In other words criticality of a connection shows the depth of input output connections from the current connection to the rest of the circuit By inserting an edge of the net on the critical path timing driven packing exploits the routability obtained by inserting output and input pin of a net inside a cluster hence releasing an input pin of the cluster As explained above our routability model can express the routability impact of timing based clustering out in out out in in in in in out out out Figure Routability and Slack Computation The two terms of routability factors are inherently satis ed in criticality based analysis and slack compu tation for critical connections Other factors should be considered In addition routability for non critical nets should be taken into account during clustering Therefore we de ne the gain function as a linear combi nation of criticality and routability of a connection We use the same criticality function used in t VPack The routability component is routability gain function de ned in Equation shows the gain function used in timing driven RPack TotalGain B Criticality B RoutabilityGain B DepopulationFactor Another important issue is scaling the routability component in Equation When a cluster is just being constructed there are many available un used pins of the cluster In this stage the cluster desires to absorb as many connection as possible In later stages when most of the pins are used routability is more restricted and the used pins around the clusters create congestion around the block In this case depopulation can help improve the routability In addition when more blocks are added to the cluster the probability of getting higher gain in the later stages is increased due to the higher probability of existence of shared nets among the blocks This does not imply the higher routability due to higher connection tra c around the cluster Therefore scaling is required In order to achieve this the routability function value is scaled each time a block is added to the cluster The depopulation factor increases during the construction of a cluster The depopulation factor is de ned as follows DepopulationFactor UsedP in B UsedP in C UsedP in B and UsedP in C return the number of used pins of block B and cluster C respectively We need to mention that in t VPack the total gain function is a function of routability and criticality as well However the routability factor is same as the gain function used in VPack which is not a comprehensive routability function Also the routability is scaled by the number of pins of a LUT i e for input LUT This normalization remains constant during the clustering According to our discussion above this cannot re ect routability gain correctly With analysis and more accurate modeling of routability we are able to study the behavior of di erent methods of clustering in terms of routability and improve the approaches by having additional components considering other routability factors Our analysis in this section shows that timing and routability correlate very strongly Satisfying timing improves routability in some aspects That is why timing driven clustering outperforms a routability driven packing Our experimental results in the next section supports our claim as well EXPERIMENTAL RESULTS In previous sections we claimed that considering routability factors while packing logic into CLBs has signif icant impact in routing results and netlist complexity In this section we show a set of experimental results supporting our claim We have used the greedy clustering approach proposed in VPack and t VPack RPack is implemented on top of the clustering algorithms in V Pack and t RPack is implemented on top of t VPack The rst set of our experiments compares RPack and VPack We ran the largest MCNC benchmarks on VPack and RPack The blif input format of each benchmark is obtained by SIS logic minimization and FlowMap technology mapper The results presented in Table show that our method successfully decreased the number of exposed nets RPack and VPack use similar number of clusters such that the array size resulting from both approaches for almost all benchmarks is same Even in one case benchmark alu RPack yielded smaller array size The array size of each benchmark is reported in Table In accordance with average gain estimated in Section the results show that the major portion of the decrease in the number of the exposed nets is due to decrease in the number of two terminal nets In conclusion reducing the number of output pins is strongly related to reducing the number of exposed nets We also observed the congestion around each cluster We counted the number of exposed nets each cluster is connected to Figure shows the connectivity of the clusters resulted from VPack and RPack for benchmark bigkey The size of cluster is and number of input pins per cluster is The vertical axis shows the number of clusters for each number of pins used per cluster shown on the horizontal axis The plot shows that the clusters obtained from RPack have less tra c around In Figure the result for benchmark elliptic is shown as well As shown in the plot the connectivity obtained from RPack is more smoothly distributed compared to the one resulted from VPack In these two plots the type of interconnection is not re ected The number of terminals of the nets also a ects routability In order to verify that our method meets the objective of improving routability we synthesized the Benchmark bigkey 0 50 100 150 200 1 3 5 7 9 11 13 Number of Used Pins Nu m be r o f C lu st er s VPack RPack Figure Comparison of RPack and VPack in cluster characteristics in bigkey Benchmark Elliptic 0 50 100 150 200 250 300 350 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Number of Used Pins Per cluster Nu m be r o f C lu st er s VPack RPack Figure Comparison of VPack and RPack in cluster characteristics in elliptic benchmark circuits through the complete CAD ow to obtain the routing area We used VPR to place and route the benchmarks The routing architecture that we used employs only the single segment wires leading to better routability Subset switch type in which each track in a channel can be connected to the same track number of the neighboring channels is used In addition we have set the fraction of the tracks of each channel to which each logic block input and output pins connect to Table summarizes our results after placement and routing of the benchmarks As shown in Table RPack is able to improve the routing area by decreasing the number of tracks signi cantly The average improvement we obtained is The number of routing tracks required in each channel is a reliable metric for routing area since a smaller number of routing tracks does not only mean saving wiring area but also decreasing the size of the routing switches drastically Routing area is related to the square of the number of tracks per channel The improvement in routing area is over VPack on an average Such an improvement in routing area decreases the total chip area signi cantly since routing area is typically a large percentage of the total area The signi cant di erence between the two routability driven methods of VPack an RPack implies that each routability factor can a ect the routing results signi cantly According to the constraint on number of pins per CLB xed routing resources and xed number of LUTs in each CLB we considered di erent routability components in the gain function used in RPack The results support our claim that routability is an important objective in clustering and it results in a better distribution of interconnection among CLBs In the next set of experiments timing driven RPack is compared with t VPack in order to observe the impact of routability in timing driven packing It is not a correct comparison if RPack is compared with t VPack As mentioned in previous section timing based clustering inherently has e ective impact on routability itself Routability and speed both bene ts of cluster based FPGAs are realized by timing based packing algorithm Previous work shows that t VPack performs better in terms of routability compared to VPack The routing results critical path delay and number of exposed nets using both packing methods t RPack and t VPack are reported in Table t RPack uses our routability gain function as described in Section In order to observe the e ect of routability gain function depopulation factor is ignored The results show that considering other routability factors and more accurate routability gain for non critical nets can improve the routing area by The delay is improved by The reason is that the weight for input output connection is high in both timing and routability component for critical nets We can observe that delay has been improved in most of cases In another set of experiments we added the depopulation factor to control the routability versus timing The results are shown in Table The routing area is improved by while the critical path delay is same on average This implies that depopulation helps to obtain a more distributed connectivity between the clusters The experimental results show that di erent routability factors have signi cant impact on routing re sults Timing driven packing has strong correlation with some of routability factors for FPGAs Integrating circuit Array Size Number of Exposed Nets Number of Tracks VPack RPack VPack RPack alu apex apex bigkey clma des di eq dsip elliptic ex ex p frisc misex pdc s s s seq spla tseng Average...
[...]
...2 Note that RPack has same complexity as VPack [15, 5]....
[...]
...[15] includes a good survey of packing methods for cluster-based FPGAs....
[...]
...N K I Routability driven Router circuit Array Size Number of Exposed Nets Number of Tracks Delay s t VPack t RPack t VPack t RPack t VPack t RPack alu apex apex bigkey clma des di eq dsip elliptic ex ex p frisc misex pdc s s s seq spla tseng Average...
[...]

Book Chapter•DOI•

VPR: A new packing, placement and routing tool for FPGA research

[...]

Vaughn Betz¹, Jonathan Rose¹•Institutions (1)

University of Toronto¹

01 Sep 1997

TL;DR: In terms of minimizing routing area, VPR outperforms all published FPGA place and route tools to which the authors can compare and presents placement and routing results on a new set of circuits more typical of today's industrial designs.

...read moreread less

Abstract: We describe the capabilities of and algorithms used in a new FPGA CAD tool, Versatile Place and Route (VPR). In terms of minimizing routing area, VPR outperforms all published FPGA place and route tools to which we can compare. Although the algorithms used are based on previously known approaches, we present several enhancements that improve run-time and quality. We present placement and routing results on a new set of large circuits to allow future benchmark comparisons of FPGA place and route tools on circuit sizes more typical of today's industrial designs.

...read moreread less

1,133 citations

"ROUTABILITY-DRIVEN PACKING: METRICS..." refers background or methods in this paper

...In RPack, similar to VPack [4, 15], in the rst stage, a LUT and a register are packed into a basic logic block when possible....
[...]
...cluster C If the output terminal of a net is inside the cluster internal connections can be used to connect the input pins of the net located inside the cluster In such a case there is no need to use an input pin of the cluster to connect the net N to other terminals of the net outside the cluster since an output pin of the cluster can be used for external interconnection Therefore the contribution of net N to block gain is more than just covering an edge of a multi terminal net Actually by adding block B to cluster C an input pin of the cluster gets free and can be used for another net connection In Table this is de ned as in pin gain This increases the probability of acceptance of adding the block to the cluster In other words the probability of violating the input constraints of the cluster decreases Note that each block output pin is accessible from outside and there is no sharing among the output pins Therefore there would be no output pin constraints for the clusters hence saving on output pins does not bring any gain except in one case Suppose all the input pins of a net are already inside the cluster and the logic block being added to a cluster contains the output pin of the net Net N in Figure is an example of such a case The output pin of the cluster corresponding to the block driving the net N cannot be used by other blocks This means that there would be no connection from outside to this pin since all the terminals of the corresponding net are located inside the cluster Therefore the number of external connections of the cluster de ned as output congestion gain in Table decreases This yields less congestion among the clusters In other words it reduces the number of used pins of a cluster which is the fourth routability factor Net N has no pin in the cluster The gain from moving logic block B to the cluster would be zero according to the gain function above However not only no edge from N would be covered but also one input pin of the cluster would be used for N So the gain of moving logic block B to the cluster due to N in terms of used pins per cluster is This means N has a degrading e ect on the routability according to the fourth routability factor As explained above by considering just the number of shared inputs and outputs as in Equation the packing algorithm cannot di erentiate among the candidate blocks which have di erent impacts on routability All possible cases yielding di erent total gains are presented in Table for one net connected to a candidate block By incorporating the other routability factors the gain for each logic block B going into cluster C can be computed as the weighted combination of di erent routability factors as follows Gain B C f Nets B Nets C X i Nets B g i Nets C B where g i C B a fin P i B P i C b fo P i B P i C i Nets C c T i B otherwise fin P i B P i C is de ned as the gain obtained in input pins of cluster C as de ned in Table Similarly fo P i B P i C is the gain obtained in output congestion The additional gain of value to the sum of these two gain terms corresponds to the edge gain T i B returns the type of the pin of Net i connected to basic block B It returns if the pin is an output pin and otherwise P i B is the set of all pins of Net i that are on block B P i C is the set of pins of Net i connected to cluster C Nets C is the set of nets connected to cluster C a b and c are the weights for di erent components of the function Inserting a whole multi terminal net in one cluster is practically impossible In most of the cases the best we can do is to eliminate two terminal nets Therefore reducing an edge from a multi terminal net should not be considered equivalent to reducing an edge by inserting a two terminal net inside a cluster The average gain that a block can take from an n terminal net i connected to one of its pins depending on type of the net can be estimated from Table as follows Gainavg i n n n n n n According to Equation the average gain obtained from a two terminal net is the highest This implies that the algorithm gives priority to pushing a two terminal net entirely inside a cluster as compared to reducing a pin of a multi terminal net This leads to a decrease in number of exposed nets satisfying the last routability factor When the net connected to a block is a multi terminal net the gain associated with the multi terminal net is computed for each block containing a terminal of the net Therefore each edge of a net can have di erent impact on their corresponding blocks when a cluster is being constructed How many and what type of a terminal of the net do already exist in the cluster What type of terminal of the net does the candidate block have Answers to these questions for each net connected to the candidate block determine the gain of the block Therefore we conclude that in the bottom up clustering gain weight of each edge should be assigned dynamically according to individual situations In the next sub sections we explain our method of packing the basic blocks inside the clusters based on the routability gain function mentioned We also analyze timing driven clustering algorithm in terms of routability based on routability gain function Equations and According to this analysis we integrate routability factors into timing driven clustering RPack Algorithm The input to our packing algorithm is a list of LUTs registers and connections among the resources In RPack similar to VPack in the rst stage a LUT and a register are packed into a basic logic block when possible After that the blocks are packed into clusters using a greedy heuristic Clusters are constructed sequentially First the seed is chosen from the unclustered basic blocks The criteria is to choose the block with the most used inputs as mentioned in After choosing a seed for a cluster the logic block that gives the highest gain is selected to be added to the current cluster provided that it is a legal choice This means that the number of external inputs do not exceed the number of input pins of the cluster The algorithm continues adding blocks into one cluster until the cluster is full or no more legal choices can be found Similarly new clusters are constructed until all the blocks are packed into clusters We propose RPack a routability driven packing algorithm based on routability factors described in previous sub section RPack is developed on top of VPack The di erence between the two approaches is in the de nition of gain function VPack uses the function de ned in Equation while RPack uses the gain function in Equation The pseudo code of our approach is shown in Figure The complexity of RPack Algorithm is O I M where M is the number of clusters Finding the seed for each cluster takes O M time using a priority queue to store the candidate nodes where M is the number of nodes basic blocks When a node v is inserted to a cluster only the gain of the neighbors of candidate nodes Candidate nodes are those who have not been assigned to any node so far need to be updated The number of neighbors is equal to the edge degree of the current node i e deg v When a neighbor is visited the type and status of the edges connected to the neighbor are checked which takes O deg v Note that when each neighbor node is visited the edges that belong to the same hyper edge multi terminal net is counted once However when a block is being added to a cluster the number of neighbors are all the nodes connected to the node by any edge i e deg v By amortized analysis it is observed that the gain of a node is updated at most once associated with any connection between the node and the neighbors Therefore the Input Netlist of LUTs and Registers N Cluster Size K LUT Size I Inputs per Cluster Output List of Logic Clusters Pack LUTs and Registers together into Basic Blocks while Unclustered Basic Blocks available Find Seed for new Cluster while Cluster is not full Update gains of unclustered Basic Blocks Candidate blocks Choose Basic Block with highest gain Pick a candidate block If Candidate is NOT feasible then Go to Step Else Remove block from unclustered blocks list Add block to current Cluster end while end while Figure RPack Pseudo code for Packing Algorithm total clustering process takes O P vi V G deg vi O jEj where E is the edge set of connectivity graph G Also E P i Net ni P i Net ni P where ni is the number of the terminals of the net i and P is the total number of pins for all clusters which is I M Based on this analysis the complexity of the algorithm can be expressed as O I M t RPack Timing Driven RPack By clustering the LUTs in coarser CLBs the complexity of interconnection between the CLBs is reduced Hence fewer number of routing resources is required Another bene t of clusters is the fast interconnection inside the clusters Those connections being packed inside the clusters use the hard wired interconnect resources of CLBs This leads to better performance In packing both objectives should be pursued In this paper our focus is mostly on routability In this section we discuss how routability is realized when timing is added into packing algorithm and based on our routability function we propose timing driven RPack After packing a subset of the netlist is routed inside the clusters without passing through switched routing resources By inserting the interconnection along the critical path of the circuit inside clusters delay can be improved As a result in timing driven clustering the priority is given to timing critical connections to be inserted inside the clusters In sequential bottom up clustering approach a seed for a cluster would be the most critical block The blocks are added to the cluster based on criticality In addition to timing routability has to be considered in clustering to avoid the routing congestion which is a bottleneck in current FPGAs However rst we should study the impact of timing based clustering on routability when choosing the seed and de ning the gain function based on criticality In Section the routability factors are described Based on that our routability gain function is de ned in Equation Using this model we can explain the routability issues in timing driven clustering After analyzing the approach we would be able to improve the routability more accurately This is where analysis and theory guide the heuristics The criticality of the blocks are de ned by their slack Connections along the critical paths have high criticality value Therefore clustering based on timing is similar to path based clustering Each path is a chain of output to input pin to pin connections between a set of blocks See Figure According to routability gain function Table the output to input connection has a high routability gain When a connection is marked to be critical it means that there is a long chain of input output connectivity from this point to the rest of the design This implies a prediction of high routability gain in later stages while constructing the cluster After a highly critical connection is added to a cluster more input output connections would be added to the cluster In other words criticality of a connection shows the depth of input output connections from the current connection to the rest of the circuit By inserting an edge of the net on the critical path timing driven packing exploits the routability obtained by inserting output and input pin of a net inside a cluster hence releasing an input pin of the cluster As explained above our routability model can express the routability impact of timing based clustering out in out out in in in in in out out out Figure Routability and Slack Computation The two terms of routability factors are inherently satis ed in criticality based analysis and slack compu tation for critical connections Other factors should be considered In addition routability for non critical nets should be taken into account during clustering Therefore we de ne the gain function as a linear combi nation of criticality and routability of a connection We use the same criticality function used in t VPack The routability component is routability gain function de ned in Equation shows the gain function used in timing driven RPack TotalGain B Criticality B RoutabilityGain B DepopulationFactor Another important issue is scaling the routability component in Equation When a cluster is just being constructed there are many available un used pins of the cluster In this stage the cluster desires to absorb as many connection as possible In later stages when most of the pins are used routability is more restricted and the used pins around the clusters create congestion around the block In this case depopulation can help improve the routability In addition when more blocks are added to the cluster the probability of getting higher gain in the later stages is increased due to the higher probability of existence of shared nets among the blocks This does not imply the higher routability due to higher connection tra c around the cluster Therefore scaling is required In order to achieve this the routability function value is scaled each time a block is added to the cluster The depopulation factor increases during the construction of a cluster The depopulation factor is de ned as follows DepopulationFactor UsedP in B UsedP in C UsedP in B and UsedP in C return the number of used pins of block B and cluster C respectively We need to mention that in t VPack the total gain function is a function of routability and criticality as well However the routability factor is same as the gain function used in VPack which is not a comprehensive routability function Also the routability is scaled by the number of pins of a LUT i e for input LUT This normalization remains constant during the clustering According to our discussion above this cannot re ect routability gain correctly With analysis and more accurate modeling of routability we are able to study the behavior of di erent methods of clustering in terms of routability and improve the approaches by having additional components considering other routability factors Our analysis in this section shows that timing and routability correlate very strongly Satisfying timing improves routability in some aspects That is why timing driven clustering outperforms a routability driven packing Our experimental results in the next section supports our claim as well EXPERIMENTAL RESULTS In previous sections we claimed that considering routability factors while packing logic into CLBs has signif icant impact in routing results and netlist complexity In this section we show a set of experimental results supporting our claim We have used the greedy clustering approach proposed in VPack and t VPack RPack is implemented on top of the clustering algorithms in V Pack and t RPack is implemented on top of t VPack The rst set of our experiments compares RPack and VPack We ran the largest MCNC benchmarks on VPack and RPack The blif input format of each benchmark is obtained by SIS logic minimization and FlowMap technology mapper The results presented in Table show that our method successfully decreased the number of exposed nets RPack and VPack use similar number of clusters such that the array size resulting from both approaches for almost all benchmarks is same Even in one case benchmark alu RPack yielded smaller array size The array size of each benchmark is reported in Table In accordance with average gain estimated in Section the results show that the major portion of the decrease in the number of the exposed nets is due to decrease in the number of two terminal nets In conclusion reducing the number of output pins is strongly related to reducing the number of exposed nets We also observed the congestion around each cluster We counted the number of exposed nets each cluster is connected to Figure shows the connectivity of the clusters resulted from VPack and RPack for benchmark bigkey The size of cluster is and number of input pins per cluster is The vertical axis shows the number of clusters for each number of pins used per cluster shown on the horizontal axis The plot shows that the clusters obtained from RPack have less tra c around In Figure the result for benchmark elliptic is shown as well As shown in the plot the connectivity obtained from RPack is more smoothly distributed compared to the one resulted from VPack In these two plots the type of interconnection is not re ected The number of terminals of the nets also a ects routability In order to verify that our method meets the objective of improving routability we synthesized the Benchmark bigkey 0 50 100 150 200 1 3 5 7 9 11 13 Number of Used Pins Nu m be r o f C lu st er s VPack RPack Figure Comparison of RPack and VPack in cluster characteristics in bigkey Benchmark Elliptic 0 50 100 150 200 250 300 350 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Number of Used Pins Per cluster Nu m be r o f C lu st er s VPack RPack Figure Comparison of VPack and RPack in cluster characteristics in elliptic benchmark circuits through the complete CAD ow to obtain the routing area We used VPR to place and route the benchmarks The routing architecture that we used employs only the single segment wires leading to better routability Subset switch type in which each track in a channel can be connected to the same track number of the neighboring channels is used In addition we have set the fraction of the tracks of each channel to which each logic block input and output pins connect to Table summarizes our results after placement and routing of the benchmarks As shown in Table RPack is able to improve the routing area by decreasing the number of tracks signi cantly The average improvement we obtained is The number of routing tracks required in each channel is a reliable metric for routing area since a smaller number of routing tracks does not only mean saving wiring area but also decreasing the size of the routing switches drastically Routing area is related to the square of the number of tracks per channel The improvement in routing area is over VPack on an average Such an improvement in routing area decreases the total chip area signi cantly since routing area is typically a large percentage of the total area The signi cant di erence between the two routability driven methods of VPack an RPack implies that each routability factor can a ect the routing results signi cantly According to the constraint on number of pins per CLB xed routing resources and xed number of LUTs in each CLB we considered di erent routability components in the gain function used in RPack The results support our claim that routability is an important objective in clustering and it results in a better distribution of interconnection among CLBs In the next set of experiments timing driven RPack is compared with t VPack in order to observe the impact of routability in timing driven packing It is not a correct comparison if RPack is compared with t VPack As mentioned in previous section timing based clustering inherently has e ective impact on routability itself Routability and speed both bene ts of cluster based FPGAs are realized by timing based packing algorithm Previous work shows that t VPack performs better in terms of routability compared to VPack The routing results critical path delay and number of exposed nets using both packing methods t RPack and t VPack are reported in Table t RPack uses our routability gain function as described in Section In order to observe the e ect of routability gain function depopulation factor is ignored The results show that considering other routability factors and more accurate routability gain for non critical nets can improve the routing area by The delay is improved by The reason is that the weight for input output connection is high in both timing and routability component for critical nets We can observe that delay has been improved in most of cases In another set of experiments we added the depopulation factor to control the routability versus timing The results are shown in Table The routing area is improved by while the critical path delay is same on average This implies that depopulation helps to obtain a more distributed connectivity between the clusters The experimental results show that di erent routability factors have signi cant impact on routing re sults Timing driven packing has strong correlation with some of routability factors for FPGAs Integrating circuit Array Size Number of Exposed Nets Number of Tracks VPack RPack VPack RPack alu apex apex bigkey clma des di eq dsip elliptic ex ex p frisc misex pdc s s s seq spla tseng Average...
[...]
...N K I Routability driven Router circuit Array Size Number of Exposed Nets Number of Tracks Delay s t VPack t RPack t VPack t RPack t VPack t RPack alu apex apex bigkey clma des di eq dsip elliptic ex ex p frisc misex pdc s s s seq spla tseng Average...
[...]
...Most of an FPGA s area and delay are due to routing Considering routability at earlier steps of the CAD ow would both yield better quality and faster design process In this paper we discuss the metrics that a ect routability in packing logic into clusters We are presenting a routability driven clustering method for cluster based FPGAs Our method packs LUTs into logic clusters while incorporating routability metrics into a cost function Based on our routability model the routability in timing driven packing algorithm is analyzed We integrate our routability model into a timing driven packing algorithm Our method yields up to improvement in terms of the minimum number of routing tracks compared to VPack on average The average routing area improvement is over VPack and over t VPack Keywords VLSI CAD Field Programmable Gate Arrays FPGAs Technology mapping Clustering Techniques Optimization Algorithm INTRODUCTION Today s technology allows FPGAs to be designed as multi million system gate devices at the heart of elec tronic systems Since FPGA is an integral part of many digital systems the signi cance of optimization problems in mapping circuits on FPGA has increased There are two important issues related to the FPGA mapping process the quality of the resulting mapping and the run time of the tools serving in the process The former being more dominant for FPGAs both aspects are important Similar to ASIC design minimiz ing the delay is an important objective as well as minimizing the silicon area Area of an FPGA consists of routing area and logic area Optimizing the utilization of both routing and logic resources is very crucial to obtain a good quality result FPGAs consist of smaller con gurable building blocks called logic blocks or Con gurable Logic Blocks CLBs which are placed on the FPGA chip either on a two dimensional array see Figure or in a set of rows The CAD ow of mapping a circuit on FPGA consists of four major stages In the rst stage the circuit is basically logically optimized In stage the optimized circuit is divided into CLBs of the FPGA which is called technology mapping Placement and routing stages accomplish the assignment of subcircuits on CLBs and programming the routing switches of FPGA Due to highly constrained and discrete interconnect structure of current FPGAs routing is a challenging problem Most of the time current FPGA routers cannot use available routing resources e ciently This leads to a large portion of the routing area to be wasted Also depending on the complexity of the particular design routing might require a fairly large amount of time often several hours to be completed Hence considering routability at earlier steps of the CAD ow would both yield a better quality of the result and less design time in later stages FPGA vendors have di erent logic block con gurations There are two kinds of CLBs LUT based blocks and multiplexor based blocks LUT based logic blocks are more popular There have been several contribu tions in development and design of FPGAs towards reducing the gap in density and performance between ASIC and FPGA implementation Hierarchical features have been added into logic and routing architecture of FPGAs Many commercial FPGAs such as Xilinx Altera and Actel FPGAs include logic blocks that contain several LUTs A collection of basic logic elements that are grouped together to be placed in one complex logic block is called a cluster See Figure a FPGAs with logic blocks containing multiple basic blocks are called cluster based FPGAs Each CLB con gurable logic block is a cluster of basic logic elements in cluster based FPGAs The structure and granularity of the logic block have a signi cant impact on the area e ciency and performance of the FPGA If the logic block is ne grained the circuit to be im plemented will be distributed over more number of logic blocks This has a negative impact on routability since more blocks need to be interconnected Since the interconnect inside the logic blocks is hardwired local interconnect can be made very fast and e ciently This improves routability and decreases the load on the router signi cantly by reducing the size of problem Two main bene ts of clustering a basic block into CLBs are speed in compilation and circuit delay improvement On the other hand it is not feasible to increase the complexity of the logic blocks beyond a certain limit If the logic blocks become too complex it becomes di cult to utilize them fully hence several logic blocks will be wasted Due to constraints on the number of input pins and the number of blocks within each cluster all the resources in a cluster cannot be used in circuit implementation The task of assigning basic logic blocks to clusters is called packing Due to no accurate means to estimate the interconnect at logic synthesis level it is not easy to deal with routability of circuit at logic level However if special properties of the interconnect available at logic level such as sharing among the pins can be exploited during packing logics into basic blocks signi cant gains can be obtained in terms of routability In the past routability at the packing stage has not been considered as extensively as it has been at the technology mapping stage Packing can bring improvements on the routability since after technology mapping a more accurate estimation on the interconnect is available In this paper we propose a routability driven packing algorithm We show improvements in routing area upon the state of the art logic packing algorithms called VPack and t VPack Logic Block Packing Algorithm We are introducing a new method to consider routability at the packing stage Our method in selecting a block for clustering can easily be integrated with other clustering algorithms We are demonstrating the e ect of our method on the routability by synthesizing the benchmark circuits through the complete CAD ow We have technology mapped a given circuit then applied our routability driven packing method for clustering and nally placed and routed the circuit We present the results of the nal routing and show that our method improves the routability signi cantly Our new algorithm RPack indeed improves routability compared to VPack As our results on largest MCNC benchmarks show in Section we are able to improve the minimum required number of routing tracks by on an average A preliminary version of this work appeared in We also integrated our routability function in timing driven packing algorithm Based on our routability model routability in timing driven packing algorithm is analyzed Compared to t VPack the routing area is improved by on an average The organization of the paper is as follows Previous work on routability driven technology mapping and algorithms for cluster packing are discussed in Section Section describes the FPGA architecture we are targeting utilization and routability issues and problem formulation for the packing problem In Section RPack our routability driven packing method is described Experimental results are presented in Section Section includes conclusions and future work Previous Work Most commercial FPGAs use con gurable blocks containing several LUT Packing LUTs into clusters is an important design step introduced for cluster based FPGAs It can be viewed as a sub task within technology mapping stage in which logic gates are assigned to LUTs and registers We will rst mention contributions made in the technology mapping area The majority of research devoted to technology mapping has been done with the objective of improving either timing or area e ciency or trade o between depth and area Compared to the amount of the e ort made in this area there is little work done in the routability driven technology mapping domain The routability driven technology mapper for LUT based arrays Rmap employs a mapping strategy that considers routability The packing problem is a clustering problem Clustering has been studied extensively for various ap plications such as placement technology mapping etc Packing is a clustering problem with constraints on the number of input pins and the number of LUTs in each CLB The objective is to minimize the number of required CLBs to cover all the LUTs while satisfying the constraints Betz and Rose proposed VPack and t VPack logic block packing algorithms for cluster based FPGAs VPack and t VPack are one of the best known packing tools for FPGAs VPack rst packs a ip op and a LUT together into a basic logic element using a matching based method Then these BLEs are packed in a greedy manner into logic clusters with the local optimization objectives being to ll each cluster to its capacity and minimize the number of used inputs to each cluster This approach is inspired from In a timing driven packing tool for FPGAs t VPack is proposed The blocks on the critical path are preferred to be packed together in a CLB so that the delay can be improved by exploiting local wiring in the CLB to route the critical nets t VPack delivers a better routability compared to VPack Later we will describe the routability potential in timing driven packing algorithms Also in a packing approach is proposed based on maximum weight matching on circuit graph Recently researchers in have proposed a new technique for packing logic into clusters Based on Rent s rule for each application the connectivity of each cluster is de ned In this ap proach routability is weighted according to the connectivity of the application It is a good idea to consider routability based on connectivity of the circuit On the other hand the weight of routability in the overall optimization objective is xed during clustering for each application By this way routability cannot be considered accurately In this work we scale the weight of the routability factor dynamically In a good survey of packing methods for cluster based FPGAs is presented In all these approaches when a logic block is packed into an existing cluster the type of nets being shared is not considered An important issue in cluster based FPGAs is the limited number of inputs Therefore considering the input output pin sharing besides edge covering can improve the performance In this paper we analyze the issues during the packing process extensively We are introducing new metrics that are used to form a new objective function to evaluate routability We took the algorithm of VPack as a basis as we will describe in later sections and have built our own approach upon it PACKING IN CLUSTER BASED FPGAS In this section we will study the issues in packing stage of technology mapping for cluster based FPGAs Also the routability driven packing problem is formulated Cluster based FPGA Architecture The FPGA we are targeting is of the SRAM based island style structure It contains a square matrix of logic blocks Between each row and column routing tracks are located The structure of the basic logic block is illustrated in Figure b It contains a K input LUT and one ip op A K input LUT is able to implement any function of its K inputs K functions However size of look up table grows exponentially with the number of inputs It has been shown that LUT with input size is the most area e cient con guration The logic cluster is shown in Figure a The cluster size N is de ned as the number of basic blocks contained in the cluster The cluster takes I inputs that are connected to the LUTs inside basic blocks Not all N basic block inputs are accessible externally Only I out of these are connected to input multiplexors of the cluster These input multiplexors allow any of the I inputs to be connected to any of the N basic block inputs Also any output of N basic blocks can be connected to any basic block input through these multiplexors The cluster contains N output pins connecting each basic block output to one cluster output Similar structure is used in In packing stage of CAD ow for cluster based FPGAs the input circuit is represented in terms of LUTs and registers As shown in Figure c if LUT l is followed by register r and there is no interconnection to any other elements from the net connecting LUT l and register r they both can be implemented by a basic logic block shown in Figure c Otherwise each register or LUT should be assigned to one basic logic block An optimal pattern matching based method to pack the register LUT pairs into basic blocks is proposed in Hence the problem is simpli ed to packing a set of basic blocks into clusters We are focusing on clustering the basic blocks into logic clusters after each register and LUT are assigned to a basic logic block K-input K-input LUT LUT l l FLIP FLOP (reg ) FLIP FLOP (reg ) r r Logic Clusters (c) Basic Block Basic Block Basic Block ... .. ....
[...]
...Betz and Rose proposed VPack, a logic block packing algorithm [4] for cluster-based FPGAs....
[...]

Journal Article•DOI•

FlowMap: an optimal technology mapping algorithm for delay optimization in lookup-table based FPGA designs

[...]

Jason Cong¹, Yuzheng Ding¹•Institutions (1)

University of California, Los Angeles¹

01 Jan 1994-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: A theoretical breakthrough is presented which shows that the LUT-based FPGA technology mapping problem for depth minimization can be solved optimally in polynomial time.

...read moreread less

Abstract: The field programmable gate-array (FPGA) has become an important technology in VLSI ASIC designs. In the past few years, a number of heuristic algorithms have been proposed for technology mapping in lookup-table (LUT) based FPGA designs, but none of them guarantees optimal solutions for general Boolean networks and little is known about how far their solutions are away from the optimal ones. This paper presents a theoretical breakthrough which shows that the LUT-based FPGA technology mapping problem for depth minimization can be solved optimally in polynomial time. A key step in our algorithm is to compute a minimum height K-feasible cut in a network, which is solved optimally in polynomial time based on network flow computation. Our algorithm also effectively minimizes the number of LUT's by maximizing the volume of each cut and by several post-processing operations. Based on these results, we have implemented an LUT-based FPGA mapping package called FlowMap. We have tested FlowMap on a large set of benchmark examples and compared it with other LUT-based FPGA mapping algorithms for delay optimization, including Chortle-d, MIS-pga-delay, and DAG-Map. FlowMap reduces the LUT network depth by up to 7% and reduces the number of LUT's by up to 50% compared to the three previous methods. >

...read moreread less

719 citations

"ROUTABILITY-DRIVEN PACKING: METRICS..." refers background or methods in this paper

...cluster C If the output terminal of a net is inside the cluster internal connections can be used to connect the input pins of the net located inside the cluster In such a case there is no need to use an input pin of the cluster to connect the net N to other terminals of the net outside the cluster since an output pin of the cluster can be used for external interconnection Therefore the contribution of net N to block gain is more than just covering an edge of a multi terminal net Actually by adding block B to cluster C an input pin of the cluster gets free and can be used for another net connection In Table this is de ned as in pin gain This increases the probability of acceptance of adding the block to the cluster In other words the probability of violating the input constraints of the cluster decreases Note that each block output pin is accessible from outside and there is no sharing among the output pins Therefore there would be no output pin constraints for the clusters hence saving on output pins does not bring any gain except in one case Suppose all the input pins of a net are already inside the cluster and the logic block being added to a cluster contains the output pin of the net Net N in Figure is an example of such a case The output pin of the cluster corresponding to the block driving the net N cannot be used by other blocks This means that there would be no connection from outside to this pin since all the terminals of the corresponding net are located inside the cluster Therefore the number of external connections of the cluster de ned as output congestion gain in Table decreases This yields less congestion among the clusters In other words it reduces the number of used pins of a cluster which is the fourth routability factor Net N has no pin in the cluster The gain from moving logic block B to the cluster would be zero according to the gain function above However not only no edge from N would be covered but also one input pin of the cluster would be used for N So the gain of moving logic block B to the cluster due to N in terms of used pins per cluster is This means N has a degrading e ect on the routability according to the fourth routability factor As explained above by considering just the number of shared inputs and outputs as in Equation the packing algorithm cannot di erentiate among the candidate blocks which have di erent impacts on routability All possible cases yielding di erent total gains are presented in Table for one net connected to a candidate block By incorporating the other routability factors the gain for each logic block B going into cluster C can be computed as the weighted combination of di erent routability factors as follows Gain B C f Nets B Nets C X i Nets B g i Nets C B where g i C B a fin P i B P i C b fo P i B P i C i Nets C c T i B otherwise fin P i B P i C is de ned as the gain obtained in input pins of cluster C as de ned in Table Similarly fo P i B P i C is the gain obtained in output congestion The additional gain of value to the sum of these two gain terms corresponds to the edge gain T i B returns the type of the pin of Net i connected to basic block B It returns if the pin is an output pin and otherwise P i B is the set of all pins of Net i that are on block B P i C is the set of pins of Net i connected to cluster C Nets C is the set of nets connected to cluster C a b and c are the weights for di erent components of the function Inserting a whole multi terminal net in one cluster is practically impossible In most of the cases the best we can do is to eliminate two terminal nets Therefore reducing an edge from a multi terminal net should not be considered equivalent to reducing an edge by inserting a two terminal net inside a cluster The average gain that a block can take from an n terminal net i connected to one of its pins depending on type of the net can be estimated from Table as follows Gainavg i n n n n n n According to Equation the average gain obtained from a two terminal net is the highest This implies that the algorithm gives priority to pushing a two terminal net entirely inside a cluster as compared to reducing a pin of a multi terminal net This leads to a decrease in number of exposed nets satisfying the last routability factor When the net connected to a block is a multi terminal net the gain associated with the multi terminal net is computed for each block containing a terminal of the net Therefore each edge of a net can have di erent impact on their corresponding blocks when a cluster is being constructed How many and what type of a terminal of the net do already exist in the cluster What type of terminal of the net does the candidate block have Answers to these questions for each net connected to the candidate block determine the gain of the block Therefore we conclude that in the bottom up clustering gain weight of each edge should be assigned dynamically according to individual situations In the next sub sections we explain our method of packing the basic blocks inside the clusters based on the routability gain function mentioned We also analyze timing driven clustering algorithm in terms of routability based on routability gain function Equations and According to this analysis we integrate routability factors into timing driven clustering RPack Algorithm The input to our packing algorithm is a list of LUTs registers and connections among the resources In RPack similar to VPack in the rst stage a LUT and a register are packed into a basic logic block when possible After that the blocks are packed into clusters using a greedy heuristic Clusters are constructed sequentially First the seed is chosen from the unclustered basic blocks The criteria is to choose the block with the most used inputs as mentioned in After choosing a seed for a cluster the logic block that gives the highest gain is selected to be added to the current cluster provided that it is a legal choice This means that the number of external inputs do not exceed the number of input pins of the cluster The algorithm continues adding blocks into one cluster until the cluster is full or no more legal choices can be found Similarly new clusters are constructed until all the blocks are packed into clusters We propose RPack a routability driven packing algorithm based on routability factors described in previous sub section RPack is developed on top of VPack The di erence between the two approaches is in the de nition of gain function VPack uses the function de ned in Equation while RPack uses the gain function in Equation The pseudo code of our approach is shown in Figure The complexity of RPack Algorithm is O I M where M is the number of clusters Finding the seed for each cluster takes O M time using a priority queue to store the candidate nodes where M is the number of nodes basic blocks When a node v is inserted to a cluster only the gain of the neighbors of candidate nodes Candidate nodes are those who have not been assigned to any node so far need to be updated The number of neighbors is equal to the edge degree of the current node i e deg v When a neighbor is visited the type and status of the edges connected to the neighbor are checked which takes O deg v Note that when each neighbor node is visited the edges that belong to the same hyper edge multi terminal net is counted once However when a block is being added to a cluster the number of neighbors are all the nodes connected to the node by any edge i e deg v By amortized analysis it is observed that the gain of a node is updated at most once associated with any connection between the node and the neighbors Therefore the Input Netlist of LUTs and Registers N Cluster Size K LUT Size I Inputs per Cluster Output List of Logic Clusters Pack LUTs and Registers together into Basic Blocks while Unclustered Basic Blocks available Find Seed for new Cluster while Cluster is not full Update gains of unclustered Basic Blocks Candidate blocks Choose Basic Block with highest gain Pick a candidate block If Candidate is NOT feasible then Go to Step Else Remove block from unclustered blocks list Add block to current Cluster end while end while Figure RPack Pseudo code for Packing Algorithm total clustering process takes O P vi V G deg vi O jEj where E is the edge set of connectivity graph G Also E P i Net ni P i Net ni P where ni is the number of the terminals of the net i and P is the total number of pins for all clusters which is I M Based on this analysis the complexity of the algorithm can be expressed as O I M t RPack Timing Driven RPack By clustering the LUTs in coarser CLBs the complexity of interconnection between the CLBs is reduced Hence fewer number of routing resources is required Another bene t of clusters is the fast interconnection inside the clusters Those connections being packed inside the clusters use the hard wired interconnect resources of CLBs This leads to better performance In packing both objectives should be pursued In this paper our focus is mostly on routability In this section we discuss how routability is realized when timing is added into packing algorithm and based on our routability function we propose timing driven RPack After packing a subset of the netlist is routed inside the clusters without passing through switched routing resources By inserting the interconnection along the critical path of the circuit inside clusters delay can be improved As a result in timing driven clustering the priority is given to timing critical connections to be inserted inside the clusters In sequential bottom up clustering approach a seed for a cluster would be the most critical block The blocks are added to the cluster based on criticality In addition to timing routability has to be considered in clustering to avoid the routing congestion which is a bottleneck in current FPGAs However rst we should study the impact of timing based clustering on routability when choosing the seed and de ning the gain function based on criticality In Section the routability factors are described Based on that our routability gain function is de ned in Equation Using this model we can explain the routability issues in timing driven clustering After analyzing the approach we would be able to improve the routability more accurately This is where analysis and theory guide the heuristics The criticality of the blocks are de ned by their slack Connections along the critical paths have high criticality value Therefore clustering based on timing is similar to path based clustering Each path is a chain of output to input pin to pin connections between a set of blocks See Figure According to routability gain function Table the output to input connection has a high routability gain When a connection is marked to be critical it means that there is a long chain of input output connectivity from this point to the rest of the design This implies a prediction of high routability gain in later stages while constructing the cluster After a highly critical connection is added to a cluster more input output connections would be added to the cluster In other words criticality of a connection shows the depth of input output connections from the current connection to the rest of the circuit By inserting an edge of the net on the critical path timing driven packing exploits the routability obtained by inserting output and input pin of a net inside a cluster hence releasing an input pin of the cluster As explained above our routability model can express the routability impact of timing based clustering out in out out in in in in in out out out Figure Routability and Slack Computation The two terms of routability factors are inherently satis ed in criticality based analysis and slack compu tation for critical connections Other factors should be considered In addition routability for non critical nets should be taken into account during clustering Therefore we de ne the gain function as a linear combi nation of criticality and routability of a connection We use the same criticality function used in t VPack The routability component is routability gain function de ned in Equation shows the gain function used in timing driven RPack TotalGain B Criticality B RoutabilityGain B DepopulationFactor Another important issue is scaling the routability component in Equation When a cluster is just being constructed there are many available un used pins of the cluster In this stage the cluster desires to absorb as many connection as possible In later stages when most of the pins are used routability is more restricted and the used pins around the clusters create congestion around the block In this case depopulation can help improve the routability In addition when more blocks are added to the cluster the probability of getting higher gain in the later stages is increased due to the higher probability of existence of shared nets among the blocks This does not imply the higher routability due to higher connection tra c around the cluster Therefore scaling is required In order to achieve this the routability function value is scaled each time a block is added to the cluster The depopulation factor increases during the construction of a cluster The depopulation factor is de ned as follows DepopulationFactor UsedP in B UsedP in C UsedP in B and UsedP in C return the number of used pins of block B and cluster C respectively We need to mention that in t VPack the total gain function is a function of routability and criticality as well However the routability factor is same as the gain function used in VPack which is not a comprehensive routability function Also the routability is scaled by the number of pins of a LUT i e for input LUT This normalization remains constant during the clustering According to our discussion above this cannot re ect routability gain correctly With analysis and more accurate modeling of routability we are able to study the behavior of di erent methods of clustering in terms of routability and improve the approaches by having additional components considering other routability factors Our analysis in this section shows that timing and routability correlate very strongly Satisfying timing improves routability in some aspects That is why timing driven clustering outperforms a routability driven packing Our experimental results in the next section supports our claim as well EXPERIMENTAL RESULTS In previous sections we claimed that considering routability factors while packing logic into CLBs has signif icant impact in routing results and netlist complexity In this section we show a set of experimental results supporting our claim We have used the greedy clustering approach proposed in VPack and t VPack RPack is implemented on top of the clustering algorithms in V Pack and t RPack is implemented on top of t VPack The rst set of our experiments compares RPack and VPack We ran the largest MCNC benchmarks on VPack and RPack The blif input format of each benchmark is obtained by SIS logic minimization and FlowMap technology mapper The results presented in Table show that our method successfully decreased the number of exposed nets RPack and VPack use similar number of clusters such that the array size resulting from both approaches for almost all benchmarks is same Even in one case benchmark alu RPack yielded smaller array size The array size of each benchmark is reported in Table In accordance with average gain estimated in Section the results show that the major portion of the decrease in the number of the exposed nets is due to decrease in the number of two terminal nets In conclusion reducing the number of output pins is strongly related to reducing the number of exposed nets We also observed the congestion around each cluster We counted the number of exposed nets each cluster is connected to Figure shows the connectivity of the clusters resulted from VPack and RPack for benchmark bigkey The size of cluster is and number of input pins per cluster is The vertical axis shows the number of clusters for each number of pins used per cluster shown on the horizontal axis The plot shows that the clusters obtained from RPack have less tra c around In Figure the result for benchmark elliptic is shown as well As shown in the plot the connectivity obtained from RPack is more smoothly distributed compared to the one resulted from VPack In these two plots the type of interconnection is not re ected The number of terminals of the nets also a ects routability In order to verify that our method meets the objective of improving routability we synthesized the Benchmark bigkey 0 50 100 150 200 1 3 5 7 9 11 13 Number of Used Pins Nu m be r o f C lu st er s VPack RPack Figure Comparison of RPack and VPack in cluster characteristics in bigkey Benchmark Elliptic 0 50 100 150 200 250 300 350 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Number of Used Pins Per cluster Nu m be r o f C lu st er s VPack RPack Figure Comparison of VPack and RPack in cluster characteristics in elliptic benchmark circuits through the complete CAD ow to obtain the routing area We used VPR to place and route the benchmarks The routing architecture that we used employs only the single segment wires leading to better routability Subset switch type in which each track in a channel can be connected to the same track number of the neighboring channels is used In addition we have set the fraction of the tracks of each channel to which each logic block input and output pins connect to Table summarizes our results after placement and routing of the benchmarks As shown in Table RPack is able to improve the routing area by decreasing the number of tracks signi cantly The average improvement we obtained is The number of routing tracks required in each channel is a reliable metric for routing area since a smaller number of routing tracks does not only mean saving wiring area but also decreasing the size of the routing switches drastically Routing area is related to the square of the number of tracks per channel The improvement in routing area is over VPack on an average Such an improvement in routing area decreases the total chip area signi cantly since routing area is typically a large percentage of the total area The signi cant di erence between the two routability driven methods of VPack an RPack implies that each routability factor can a ect the routing results signi cantly According to the constraint on number of pins per CLB xed routing resources and xed number of LUTs in each CLB we considered di erent routability components in the gain function used in RPack The results support our claim that routability is an important objective in clustering and it results in a better distribution of interconnection among CLBs In the next set of experiments timing driven RPack is compared with t VPack in order to observe the impact of routability in timing driven packing It is not a correct comparison if RPack is compared with t VPack As mentioned in previous section timing based clustering inherently has e ective impact on routability itself Routability and speed both bene ts of cluster based FPGAs are realized by timing based packing algorithm Previous work shows that t VPack performs better in terms of routability compared to VPack The routing results critical path delay and number of exposed nets using both packing methods t RPack and t VPack are reported in Table t RPack uses our routability gain function as described in Section In order to observe the e ect of routability gain function depopulation factor is ignored The results show that considering other routability factors and more accurate routability gain for non critical nets can improve the routing area by The delay is improved by The reason is that the weight for input output connection is high in both timing and routability component for critical nets We can observe that delay has been improved in most of cases In another set of experiments we added the depopulation factor to control the routability versus timing The results are shown in Table The routing area is improved by while the critical path delay is same on average This implies that depopulation helps to obtain a more distributed connectivity between the clusters The experimental results show that di erent routability factors have signi cant impact on routing re sults Timing driven packing has strong correlation with some of routability factors for FPGAs Integrating circuit Array Size Number of Exposed Nets Number of Tracks VPack RPack VPack RPack alu apex apex bigkey clma des di eq dsip elliptic ex ex p frisc misex pdc s s s seq spla tseng Average...
[...]
...The majority of research devoted to technology mapping has been done with the objective of improving either timing [7, 12, 18, 22, 13] or area-e ciency [19, 24, 20] or trade-o between depth and area [19]....
[...]
...The blif input format of each benchmark is obtained by SIS [27] logic minimization and FlowMap [18] technology mapper....
[...]

Frequently Asked Questions (1)

Q1. What have the authors contributed in "Routability-driven packing: metrics and algorithms for cluster-based fpgas" ?

In this paper the authors discuss the metrics that a ect routability in packing logic into clusters The authors are presenting a routability driven clustering method for cluster based FPGAs

ROUTABILITY-DRIVEN PACKING: METRICS AND ALGORITHMS FOR CLUSTER-BASED FPGAs

Summary (1 min read)

Introduction

RoutabilityGain B j Nets B Nets C j

Table Routability Gain of a Candidate Block According to a Single Net

Table Logic Size Number of Exposed Nets Number of Routing Tracks and Critical Path t

Figures (1)

Citations

Cites methods from "ROUTABILITY-DRIVEN PACKING: METRICS..."

Cites background from "ROUTABILITY-DRIVEN PACKING: METRICS..."

References

"ROUTABILITY-DRIVEN PACKING: METRICS..." refers background in this paper

"ROUTABILITY-DRIVEN PACKING: METRICS..." refers background or methods in this paper

"ROUTABILITY-DRIVEN PACKING: METRICS..." refers background or methods in this paper

"ROUTABILITY-DRIVEN PACKING: METRICS..." refers background or methods in this paper

Related Papers (5)

Frequently Asked Questions (1)

Q1. What have the authors contributed in "Routability-driven packing: metrics and algorithms for cluster-based fpgas" ?