scispace - formally typeset
Search or ask a question
Topic

Graph partition

About: Graph partition is a research topic. Over the lifetime, 4324 publications have been published within this topic receiving 159108 citations.


Papers
More filters
Posted Content
TL;DR: In this article, the authors employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities.
Abstract: A large body of work has been devoted to defining and identifying clusters or communities in social and information networks. We explore from a novel perspective several questions related to identifying meaningful communities in large social and information networks, and we come to several striking conclusions. We employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities. In particular, we define the network community profile plot, which characterizes the "best" possible community--according to the conductance measure--over a wide range of size scales. We study over 100 large real-world social and information networks. Our results suggest a significantly more refined picture of community structure in large networks than has been appreciated previously. In particular, we observe tight communities that are barely connected to the rest of the network at very small size scales; and communities of larger size scales gradually "blend into" the expander-like core of the network and thus become less "community-like." This behavior is not explained, even at a qualitative level, by any of the commonly-used network generation models. Moreover, it is exactly the opposite of what one would expect based on intuition from expander graphs, low-dimensional or manifold-like graphs, and from small social networks that have served as testbeds of community detection algorithms. We have found that a generative graph model, in which new edges are added via an iterative "forest fire" burning process, is able to produce graphs exhibiting a network community profile plot similar to what we observe in our network datasets.

1,555 citations

Journal ArticleDOI
TL;DR: This paper discusses annealing and its parameterized generic implementation, describes how this generic algorithm was adapted to the graph partitioning problem, and reports how well it compared to standard algorithms like the Kernighan-Lin algorithm.
Abstract: In this and two companion papers, we report on an extended empirical study of the simulated annealing approach to combinatorial optimization proposed by S. Kirkpatrick et al. That study investigated how best to adapt simulated annealing to particular problems and compared its performance to that of more traditional algorithms. This paper (Part I) discusses annealing and our parameterized generic implementation of it, describes how we adapted this generic algorithm to the graph partitioning problem, and reports how well it compared to standard algorithms like the Kernighan-Lin algorithm. (For sparse random graphs, it tended to outperform Kernighan-Lin as the number of vertices become large, even when its much greater running time was taken into account. It did not perform nearly so well, however, on graphs generated with a built-in geometric structure.) We also discuss how we went about optimizing our implementation, and describe the effects of changing the various annealing parameters or varying the basic...

1,355 citations

Journal ArticleDOI
TL;DR: The model, which nicely fits into the so-called "statistical relational learning" framework, could also be used to compute document or word similarities, and could be applied to machine-learning and pattern-recognition tasks involving a relational database.
Abstract: This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on a Markov-chain model of random walk through the database. More precisely, we compute quantities (the average commute time, the pseudoinverse of the Laplacian matrix of the graph, etc.) that provide similarities between any pair of nodes, having the nice property of increasing when the number of paths connecting those elements increases and when the "length" of paths decreases. It turns out that the square root of the average commute time is a Euclidean distance and that the pseudoinverse of the Laplacian matrix is a kernel matrix (its elements are inner products closely related to commute times). A principal component analysis (PCA) of the graph is introduced for computing the subspace projection of the node vectors in a manner that preserves as much variance as possible in terms of the Euclidean commute-time distance. This graph PCA provides a nice interpretation to the "Fiedler vector," widely used for graph partitioning. The model is evaluated on a collaborative-recommendation task where suggestions are made about which movies people should watch based upon what they watched in the past. Experimental results on the MovieLens database show that the Laplacian-based similarities perform well in comparison with other methods. The model, which nicely fits into the so-called "statistical relational learning" framework, could also be used to compute document or word similarities, and, more generally, it could be applied to machine-learning and pattern-recognition tasks involving a relational database

1,276 citations

Journal ArticleDOI
TL;DR: The conjectured hardness of maximizing modularity both in the general case and with the restriction to cuts is proved and an Integer Linear Programming formulation is given.
Abstract: Modularity is a recently introduced quality measure for graph clusterings. It has immediately received considerable attention in several disciplines, particularly in the complex systems literature, although its properties are not well understood. We study the problem of finding clusterings with maximum modularity, thus providing theoretical foundations for past and present work based on this measure. More precisely, we prove the conjectured hardness of maximizing modularity both in the general case and with the restriction to cuts and give an Integer Linear Programming formulation. This is complemented by first insights into the behavior and performance of the commonly applied greedy agglomerative approach.

1,201 citations

Proceedings ArticleDOI
08 Dec 1995
TL;DR: A multilevel algorithm for graph partitioning in which the graph is approximated by a sequence of increasingly smaller graphs, and the smallest graph is then partitioned using a spectral method, and this partition is propagated back through the hierarchy of graphs.
Abstract: The graph partitioning problem is that of dividing the vertices of a graph into sets of specified sizes such that few edges cross between sets. This NP-complete problem arises in many important scientific and engineering problems. Prominent examples include the decomposition of data structures for parallel computation, the placement of circuit elements and the ordering of sparse matrix computations. We present a multilevel algorithm for graph partitioning in which the graph is approximated by a sequence of increasingly smaller graphs. The smallest graph is then partitioned using a spectral method, and this partition is propagated back through the hierarchy of graphs. A variant of the Kernighan-Lin algorithm is applied periodically to refine the partition. The entire algorithm can be implemented to execute in time proportional to the size of the original graph. Experiments indicate that, relative to other advanced methods, the multilevel algorithm produces high quality partitions at low cost.

1,162 citations


Network Information
Related Topics (5)
Graph (abstract data type)
69.9K papers, 1.2M citations
86% related
Server
79.5K papers, 1.4M citations
82% related
Optimization problem
96.4K papers, 2.1M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202342
202290
2021184
2020198
2019228
2018202