What are the contributions mentioned in the paper "A fast and high quality multilevel scheme for partitioning irregular graphs∗" ?

Q: What are the contributions mentioned in the paper "A fast and high quality multilevel scheme for partitioning irregular graphs∗" ?

The authors investigate the effectiveness of many different choices for all three phases: coarsening, partition of the coarsest graph, and refinement. In particular, the authors present a new coarsening heuristic ( called heavy-edge heuristic ) for which the size of the partition of the coarse graph is within a small factor of the size of the final partition obtained after multilevel refinement. The authors also present a much faster variation of the Kernighan–Lin ( KL ) algorithm for refining during uncoarsening.

(Open Access) A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs (1998) | George Karypis

A FAST AND HIGH QUALITY MULTILEVEL SCHEME FOR

PARTITIONING IRREGULAR GRAPHS

∗

GEORGE KARYPIS

†

AND VIPIN KUMAR

†

SIAM J. S

CI. COMPUT.

1998 Society for Industrial and Applied Mathematics

Vol. 20, No. 1, pp. 359–392

Abstract. Recently, a number of researchers have investigated a class of graph partitioning

algorithms that reduce the size of the graph by collapsing vertices and edges, partition the smaller

graph, and then uncoarsen it to construct a partition for the original graph [Bui and Jones, Proc.

of the 6th SIAM Conference on Parallel Processing for Scientiﬁc Computing, 1993, 445–452; Hen-

drickson and Leland, A Multilevel Algorithm for Partitioning Graphs, Tech. report SAND 93-1301,

Sandia National Laboratories, Albuquerque, NM, 1993]. From the early work it was clear that

multilevel techniques held great promise; however, it was not known if they can be made to con-

sistently produce high quality partitions for graphs arising in a wide range of application domains.

We investigate the eﬀectiveness of many diﬀerent choices for all three phases: coarsening, partition

of the coarsest graph, and reﬁnement. In particular, we present a new coarsening heuristic (called

heavy-edge heuristic) for which the size of the partition of the coarse graph is within a small factor

of the size of the ﬁnal partition obtained after multilevel reﬁnement. We also present a much faster

variation of the Kernighan–Lin (KL) algorithm for reﬁning during uncoarsening. We test our scheme

on a large number of graphs arising in various domains including ﬁnite element methods, linear pro-

gramming, VLSI, and transportation. Our experiments show that our scheme produces partitions

that are consistently better than those produced by spectral partitioning schemes in substantially

smaller time. Also, when our scheme is used to compute ﬁll-reducing orderings for sparse matrices,

it produces orderings that have substantially smaller ﬁll than the widely used multiple minimum

degree algorithm.

Key words. graph partitioning, parallel computations, ﬁll-reducing orderings, ﬁnite element

computations

AMS subject classiﬁcations. 68B10, 05C85

PII. S1064827595287997

1. Introduction. Graph partitioning is an important problem that has exten-

sive applications in many areas, including scientiﬁc computing, VLSI design, and task

scheduling. The problem is to partition the vertices of a graph in p roughly equal

parts, such that the number of edges connecting vertices in diﬀerent parts is mini-

mized. For example, the solution of a sparse system of linear equations Ax = b via

iterative methods on a parallel computer gives rise to a graph partitioning problem.

A key step in each iteration of these methods is the multiplication of a sparse matrix

and a (dense) vector. A good partition of the graph corresponding to matrix A can

signiﬁcantly reduce the amount of communication in parallel sparse matrix-vector

multiplication [32]. If parallel direct methods are used to solve a sparse system of

equations, then a graph partitioning algorithm can be used to compute a ﬁll-reducing

ordering that leads to a high degree of concurrency in the factorization phase [32, 12].

The multiple minimum degree ordering used almost exclusively in serial direct meth-

∗

Received by the editors June 19, 1995; accepted for publication (in revised form) January 28,

1997; published electronically August 4, 1998. This work was supported by Army Research Of-

ﬁce contract DA/DAAH04-95-1-0538, NSF grant CCR-9423082, IBM Partnership Award, and by

Army High Performance Computing Research Center under the auspices of the Department of the

Army, Army Research Laboratory cooperative agreement DAAH04-95-2-0003/contract DAAH04-

95-C-0008. Access to computing facilities was provided by AHPCRC, Minnesota Supercomputer

Institute, Cray Research Inc., and by the Pittsburgh Supercomputing Center.

http://www.siam.org/journals/sisc/20-1/28799.html

†

Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN

55455 (karypis@cs.umn.edu, kumar@cs.umn.edu).

359

360 GEORGE KARYPIS AND VIPIN KUMAR

ods is not suitable for parallel direct methods, as it provides very little concurrency

in the parallel factorization phase.

The graph partitioning problem is NP-complete. However, many algorithms have

been developed that ﬁnd a reasonably good partition. Spectral partitioning meth-

ods are known to produce good partitions for a wide class of problems, and they are

used quite extensively [45, 47, 24]. However, these methods are very expensive since

they require the computation of the eigenvector corresponding to the second smallest

eigenvalue (Fiedler vector). Execution time of the spectral methods can be reduced

if computation of the Fiedler vector is done by using a multilevel algorithm [2]. This

multilevel spectral bisection (MSB) algorithm usually manages to speed up the spec-

tral partitioning methods by an order of magnitude without any loss in the quality of

the edge-cut. However, even MSB can take a large amount of time. In particular, in

parallel direct solvers, the time for computing ordering using MSB can be several or-

ders of magnitude higher than the time taken by the parallel factorization algorithm,

and thus ordering time can dominate the overall time to solve the problem [18].

Another class of graph partitioning techniques uses the geometric information of

the graph to ﬁnd a good partition. Geometric partitioning algorithms [23, 48, 37,

36, 38] tend to be fast but often yield partitions that are worse than those obtained

by spectral methods. Among the most prominent of these schemes is the algorithm

described in [37, 36]. This algorithm produces partitions that are provably within the

bounds that exist for some special classes of graphs (that includes graphs arising

in ﬁnite element applications). However, due to the randomized nature of these

algorithms, multiple trials are often required (5 to 50) to obtain solutions that are

comparable in quality with spectral methods. Multiple trials do increase the time

[15], but the overall runtime is still substantially lower than the time required by

the spectral methods. Geometric graph partitioning algorithms are applicable only

if coordinates are available for the vertices of the graph. In many problem areas

(e.g., linear programming, VLSI), there is no geometry associated with the graph.

Recently, an algorithm has been proposed to compute coordinates for graph vertices

[6] by using spectral methods. But these methods are much more expensive and

dominate the overall time taken by the graph partitioning algorithm.

Another class of graph partitioning algorithms reduces the size of the graph (i.e.,

coarsen the graph) by collapsing vertices and edges, partitions the smaller graph, and

then uncoarsens it to construct a partition for the original graph. These are called

multilevel graph partitioning schemes [4, 7, 19, 20, 26, 10, 43]. Some researchers

investigated multilevel schemes primarily to decrease the partitioning time, at the cost

of somewhat worse partition quality [43]. Recently, a number of multilevel algorithms

have been proposed [4, 26, 7, 20, 10] that further reﬁne the partition during the

uncoarsening phase. These schemes tend to give good partitions at a reasonable

cost. Bui and Jones [4] use random maximal matching to successively coarsen the

graph down to a few hundred vertices; they partition the smallest graph and then

uncoarsen the graph level by level, applying the KL algorithm to reﬁne the partition.

Hendrickson and Leland [26] enhance this approach by using edge and vertex weights

to capture the collapsing of the vertex and edges. In particular, this latter work

showed that multilevel schemes can provide better partitions than spectral methods

at lower cost for a variety of ﬁnite element problems.

In this paper we build on the work of Hendrickson and Leland. We experiment

with various parameters of multilevel algorithms and their eﬀect on the quality of

partition and ordering. We investigate the eﬀectiveness of many diﬀerent choices

MULTILEVEL GRAPH PARTITIONING 361

for all three phases: coarsening, partition of the coarsest graph, and reﬁnement. In

particular, we present a new coarsening heuristic (called heavy-edge heuristic) for

which the size of the partition of the coarse graph is within a small factor of the

size of the ﬁnal partition obtained after multilevel reﬁnement. We also present a new

variation of the KL algorithm for reﬁning the partition during the uncoarsening phase

that is much faster than the KL reﬁnement used in [26].

We test our scheme on a large number of graphs arising in various domains includ-

ing ﬁnite element methods, linear programming, VLSI, and transportation. Our ex-

periments show that our scheme consistently produces partitions that are better than

those produced by spectral partitioning schemes in substantially smaller times (10 to

35 times faster than multilevel spectral bisection).

Compared with the multilevel

scheme of [26], our scheme is about two to seven times faster, and it is consistently

better in terms of cut size. Much of the improvement in runtime comes from our

faster reﬁnement heuristic, and the improvement in quality is due to the heavy-edge

heuristic used during coarsening.

We also used our graph partitioning scheme to compute ﬁll-reducing orderings for

sparse matrices. Surprisingly, our scheme substantially outperforms the multiple min-

imum degree algorithm [35], which is the most commonly used method for computing

ﬁll-reducing orderings of a sparse matrix.

Even though multilevel algorithms are quite fast compared with spectral methods,

they can still be the bottleneck if the sparse system of equations is being solved in

parallel [32, 18]. The coarsening phase of these methods is relatively easy to parallelize

[30], but the KL heuristic used in the reﬁnement phase is very diﬃcult to parallelize

[16]. Since both the coarsening phase and the reﬁnement phase with the KL heuristic

take roughly the same amount of time, the overall runtime of the multilevel scheme

of [26] cannot be reduced signiﬁcantly. Our new faster methods for reﬁnement reduce

this bottleneck substantially. In fact our parallel implementation [30] of this multilevel

partitioning is able to get a speedup of as much as 56 on a 128-processor Cray T3D

for moderate size problems.

The remainder of the paper is organized as follows. Section 2 deﬁnes the graph

partitioning problem and describes the basic ideas of multilevel graph partitioning.

Sections 3, 4, and 5 describe diﬀerent algorithms for the coarsening, initial partition-

ing, and the uncoarsening phase, respectively. Section 6 presents an experimental

evaluation of the various parameters of multilevel graph partitioning algorithms and

compares their performance with that of multilevel spectral bisection algorithm. Sec-

tion 7 compares the quality of the orderings produced by multilevel nested dissection

to those produced by multiple minimum degree and spectral nested dissection. Sec-

tion 9 provides a summary of the various results. A short version of this paper appears

in [29].

2. Graph partitioning. The k-way graph partitioning problem is deﬁned as fol-

lows: given a graph G =(V,E) with |V | = n, partition V into k subsets, V

,...,V

such that V

∩V

= ∅ for i 6= j, |V

| = n/k, and

= V , and the number of edges of

E whose incident vertices belong to diﬀerent subsets is minimized. The k-way graph

partitioning problem can be naturally extended to graphs that have weights associ-

ated with the vertices and the edges of the graph. In this case, the goal is to partition

the vertices into k disjoint subsets such that the sum of the vertex-weights in each

We used the MSB algorithm in the Chaco [25] graph partitioning package to obtain the timings

for MSB.

362 GEORGE KARYPIS AND VIPIN KUMAR

subset is the same, and the sum of the edge-weights whose incident vertices belong to

diﬀerent subsets is minimized. A k-way partition of V is commonly represented by a

partition vector P of length n, such that for every vertex v ∈ V , P [v] is an integer

between 1 and k, indicating the partition at which vertex v belongs. Given a partition

P , the number of edges whose incident vertices belong to diﬀerent subsets is called

the edge-cut of the partition.

The eﬃcient implementation of many parallel algorithms usually requires the so-

lution to a graph partitioning problem, where vertices represent computational tasks,

and edges represent data exchanges. Depending on the amount of the computation

performed by each task, the vertices are assigned a proportional weight. Similarly,

the edges are assigned weights that reﬂect the amount of data that need to be ex-

changed. A k-way partitioning of this computation graph can be used to assign tasks

to k processors. Since the partitioning assigns to each processor tasks whose total

weight is the same, the work is balanced among k processors. Furthermore, since the

algorithm minimizes the edge-cut (subject to the balanced load requirements), the

communication overhead is also minimized.

One such example is the sparse matrix-vector multiplication y = Ax. Matrix

n×n

and vector x are usually partitioned along rows, with each of the p processors

receiving n/p rows of A and the corresponding n/p elements of x [32]. For matrix A an

n-vertex graph G

can be constructed such that each row of the matrix corresponds

to a vertex, and if row i has a nonzero entry in column j (i 6= j), then there is

an edge between vertex i and vertex j. As discussed in [32], any edges connecting

vertices from two diﬀerent partitions lead to communication for retrieving the value

of vector x that is not local but is needed to perform the dot-product. Thus, in order

to minimize the communication overhead, we need to obtain a p-way partition of G

and then to distribute the rows of A according to this partition.

Another important application of recursive bisection is to ﬁnd a ﬁll-reducing or-

dering for sparse matrix factorization [12, 32, 22]. These algorithms are generally

referred to as nested dissection ordering algorithms. Nested dissection recursively

splits a graph into almost equal halves by selecting a vertex separator until the de-

sired number of partitions is obtained. One way of obtaining a vertex separator is

to ﬁrst obtain a bisection of the graph and then compute a vertex separator from

the edge separator. The vertices of the graph are numbered such that at each level

of recursion the separator vertices are numbered after the vertices in the partitions.

The eﬀectiveness and the complexity of a nested dissection scheme depend on the

separator computing algorithm. In general, small separators result in low ﬁll-in.

The k-way partition problem is frequently solved by recursive bisection. That is,

we ﬁrst obtain a 2-way partition of V , and then we further subdivide each part using

2-way partitions. After log k phases, graph G is partitioned into k parts. Thus, the

problem of performing a k-way partition can be solved by performing a sequence of

2-way partitions or bisections. Even though this scheme does not necessarily lead to

optimal partition, it is used extensively due to its simplicity [12, 22].

2.1. Multilevel graph bisection. The graph G can be bisected using a mul-

tilevel algorithm. The basic structure of a multilevel algorithm is very simple. The

graph G is ﬁrst coarsened down to a few hundred vertices, a bisection of this much

smaller graph is computed, and then this partition is projected back toward the orig-

inal graph (ﬁner graph). At each step of the graph uncoarsening, the partition is

further reﬁned. Since the ﬁner graph has more degrees of freedom, such reﬁnements

usually decrease the edge-cut. This process is graphically illustrated in Figure 1.

MULTILEVEL GRAPH PARTITIONING 363

projected partition

refined partition

Coarsening Phase

Uncoarsening Phase

Initial Partitioning Phase

Multilevel Graph Bisection

Fig. 1. The various phases of the multilevel graph bisection. During the coarsening phase, the

size of the graph is successively decreased; during the initial partitioning phase, a bisection of the

smaller graph is computed; and during the uncoarsening phase, the bisection is successively reﬁned as

it is projected to the larger graphs. During the uncoarsening phase the light lines indicate projected

partitions, and dark lines indicate partitions that were produced after reﬁnement.

Formally, a multilevel graph bisection algorithm works as follows: consider a

weighted graph G

=(V

), with weights both on vertices and edges. A multilevel

graph bisection algorithm consists of the following three phases.

Coarsening phase. The graph G

is transformed into a sequence of smaller

graphs G

,...,G

such that |V

| > |V

| > ···> |V

Partitioning phase. A 2-way partition P

of the graph G

=(V

)is

computed that partitions V

into two parts, each containing half the vertices

of G

Uncoarsening phase. The partition P

of G

is projected back to G

by going

through intermediate partitions P

m−1

m−2

,...,P

3. Coarsening phase. During the coarsening phase, a sequence of smaller

graphs, each with fewer vertices, is constructed. Graph coarsening can be achieved in

various ways. Some possibilities are shown in Figure 2.

In most coarsening schemes, a set of vertices of G

is combined to form a single

vertex of the next level coarser graph G

i+1

. Let V

be the set of vertices of G

combined to form vertex v of G

i+1

. We will refer to vertex v as a multinode. In order

for a bisection of a coarser graph to be good with respect to the original graph, the

A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

Figures

Citations

Scalable Molecular Dynamics with NAMD

Data clustering: 50 years beyond K-means

Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

Data Clustering: 50 Years Beyond K-means

References

Combinatorial optimization: algorithms and complexity

An efficient heuristic procedure for partitioning graphs

The Symmetric Eigenvalue Problem.

The Symmetric Eigenvalue Problem

Combinatorial Optimization

Related Papers (5)

An efficient heuristic procedure for partitioning graphs

Normalized cuts and image segmentation

Computers and Intractability: A Guide to the Theory of NP-Completeness

Iterative Methods for Sparse Linear Systems

The university of Florida sparse matrix collection

Frequently Asked Questions (1)

Q1. What are the contributions mentioned in the paper "A fast and high quality multilevel scheme for partitioning irregular graphs∗" ?