Greedy approximation algorithms for finding dense components in a graph

doi:10.1007/3-540-44436-X_10

Home
/
Papers
/
Greedy approximation algorithms for finding dense components in a graph

Book Chapter•DOI•

Greedy approximation algorithms for finding dense components in a graph

Moses Charikar¹•Institutions (1)

Stanford University¹

05 Sep 2000-Lecture Notes in Computer Science (Springer, Berlin, Heidelberg)-pp 84-95

TL;DR: This paper gives simple greedy approximation algorithms for these optimization problems of finding subgraphs maximizing these notions of density for undirected and directed graphs and answers an open question about the complexity of the optimization problem for directed graphs.

read less

Abstract: We study the problem of finding highly connected subgraphs of undirected and directed graphs. For undirected graphs, the notion of density of a subgraph we use is the average degree of the subgraph. For directed graphs, a corresponding notion of density was introduced recently by Kannan and Vinay. This is designed to quantify highly connectedness of substructures in a sparse directed graph such as the web graph. We study the optimization problems of finding subgraphs maximizing these notions of density for undirected and directed graphs. This paper gives simple greedy approximation algorithms for these optimization problems. We also answer an open question about the complexity of the optimization problem for directed graphs.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Entity Linking meets Word Sense Disambiguation: A Unified Approach

[...]

Andrea Moro¹, Alessandro Raganato¹, Roberto Navigli¹•Institutions (1)

Sapienza University of Rome¹

31 May 2014-Transactions of the Association for Computational Linguistics

TL;DR: Babelfy is presented, a unified graph-based approach to EL and WSD based on a loose identification of candidate meanings coupled with a densest subgraph heuristic which selects high-coherence semantic interpretations.

...read moreread less

Abstract: Entity Linking (EL) and Word Sense Disambiguation (WSD) both address the lexical ambiguity of language. But while the two tasks are pretty similar, they differ in a fundamental respect: in EL the textual mention can be linked to a named entity which may or may not contain the exact mention, while in WSD there is a perfect match between the word form (better, its lemma) and a suitable word sense. In this paper we present Babelfy, a unified graph-based approach to EL and WSD based on a loose identification of candidate meanings coupled with a densest subgraph heuristic which selects high-coherence semantic interpretations. Our experiments show state-of- the-art performances on both tasks on 6 different datasets, including a multilingual setting. Babelfy is online at http://babelfy.org

...read moreread less

811 citations

Cites methods from "Greedy approximation algorithms for..."

...Therefore, we define a heuristic for k-partite graphs inspired by a 2-approximation greedy algorithm for arbitrary graphs (Charikar, 2000; Khuller and Saha, 2009)....
[...]

Proceedings Article•DOI•

The community-search problem and how to plan a successful cocktail party

[...]

Mauro Sozio¹, Aristides Gionis²•Institutions (2)

Max Planck Society¹, Yahoo!²

25 Jul 2010

TL;DR: This paper studies a query-dependent variant of the community-detection problem, which it is called thecommunity-search problem: given a graph G, and a set of query nodes in the graph, it is sought to find a subgraph of G that contains the query nodes and it is densely connected, and develops an optimum greedy algorithm for this measure.

...read moreread less

Abstract: A lot of research in graph mining has been devoted in the discovery of communities. Most of the work has focused in the scenario where communities need to be discovered with only reference to the input graph. However, for many interesting applications one is interested in finding the community formed by a given set of nodes. In this paper we study a query-dependent variant of the community-detection problem, which we call the community-search problem: given a graph G, and a set of query nodes in the graph, we seek to find a subgraph of G that contains the query nodes and it is densely connected. We motivate a measure of density based on minimum degree and distance constraints, and we develop an optimum greedy algorithm for this measure. We proceed by characterizing a class of monotone constraints and we generalize our algorithm to compute optimum solutions satisfying any set of monotone constraints. Finally we modify the greedy algorithm and we present two heuristic algorithms that find communities of size no greater than a specified upper bound. Our experimental evaluation on real datasets demonstrates the efficiency of the proposed algorithms and the quality of the solutions we obtain.

...read moreread less

462 citations

Cites background or methods from "Greedy approximation algorithms for..."

...Charikar [7] showed that the greedy algorithm that we consider in this paper can be used to .nd a factor-2 approximation....
[...]
...As observed by Charikar [7], Greedy can be implemented in linear time....
[...]
...As observed by Charikar [7], Greedy can be implemented in linear time....
[...]
...Charikar [7] showed that the greedy algorithm that we consider in this paper can be used to find a factor-2 approximation....
[...]
...[4] and later analyzed by Charikar [7], who showed that it achieves a factor 2 approximation guarantee for the densest-subgraph problem....
[...]

Journal Article•DOI•

Identification of functional modules using network topology and high-throughput data

[...]

Igor Ulitsky¹, Ron Shamir¹•Institutions (1)

Tel Aviv University¹

26 Jan 2007-BMC Systems Biology

TL;DR: A novel algorithmic framework is described that can accurately identify functional modules and carries the promise to be highly useful in analysis of high throughput data.

...read moreread less

Abstract: With the advent of systems biology, biological knowledge is often represented today by networks. These include regulatory and metabolic networks, protein-protein interaction networks, and many others. At the same time, high-throughput genomics and proteomics techniques generate very large data sets, which require sophisticated computational analysis. Usually, separate and different analysis methodologies are applied to each of the two data types. An integrated investigation of network and high-throughput information together can improve the quality of the analysis by accounting simultaneously for topological network properties alongside intrinsic features of the high-throughput data. We describe a novel algorithmic framework for this challenge. We first transform the high-throughput data into similarity values, (e.g., by computing pairwise similarity of gene expression patterns from microarray data). Then, given a network of genes or proteins and similarity values between some of them, we seek connected sub-networks (or modules) that manifest high similarity. We develop algorithms for this problem and evaluate their performance on the osmotic shock response network in S. cerevisiae and on the human cell cycle network. We demonstrate that focused, biologically meaningful and relevant functional modules are obtained. In comparison with extant algorithms, our approach has higher sensitivity and higher specificity. We have demonstrated that our method can accurately identify functional modules. Hence, it carries the promise to be highly useful in analysis of high throughput data.

...read moreread less

321 citations

Cites methods from "Greedy approximation algorithms for..."

...Heaviest-subnet This method is inspired by Charikar's 2-approximation algorithm for the densest subgraph problem [46]....
[...]

Journal Article•DOI•

Modularity-Maximizing Graph Communities via Mathematical Programming

[...]

Gaurav Agarwal¹, Gaurav Agarwal², David Kempe¹•Institutions (2)

University of Southern California¹, Google²

27 Nov 2008-European Physical Journal B

TL;DR: This paper introduces the technique of rounding mathematical programs to the problem of modularity maximization, presenting two novel algorithms that perform comparably or better than past algorithms, while being more efficient than exhaustive techniques.

...read moreread less

Abstract: In many networks, it is of great interest to identify communities, unusually densely knit groups of individuals. Such communities often shed light on the function of the networks or underlying properties of the individuals. Recently, Newman suggested modularity as a natural measure of the quality of a network partitioning into communities. Since then, various algorithms have been proposed for (approximately) maximizing the modularity of the partitioning determined. In this paper, we introduce the technique of rounding mathematical programs to the problem of modularity maximization, presenting two novel algorithms. More specifically, the algorithms round solutions to linear and vector programs. Importantly, the linear programing algorithm comes with an a posteriori approximation guarantee: by comparing the solution quality to the fractional solution of the linear program, a bound on the available “room for improvement” can be obtained. The vector programming algorithm provides a similar bound for the best partition into two communities. We evaluate both algorithms using experiments on several standard test cases for network partitioning algorithms, and find that they perform comparably or better than past algorithms, while being more efficient than exhaustive techniques.

...read moreread less

303 citations

Cites background from "Greedy approximation algorithms for..."

...Often, the communities identified will correspond to some notion of “dense subgraphs” [4, 13, 14, 23]....
[...]

Proceedings Article•DOI•

Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees

[...]

Charalampos E. Tsourakakis¹, Francesco Bonchi², Aristides Gionis³, Francesco Gullo², Maria A. Tsiarli⁴ - Show less +1 more•Institutions (4)

Carnegie Mellon University¹, Yahoo!², Aalto University³, University of Pittsburgh⁴

11 Aug 2013

TL;DR: This paper defines a novel density function, which gives subgraphs of much higher quality than densest sub graphs: the graphs found by the method are compact, dense, and with smaller diameter.

...read moreread less

Abstract: Finding dense subgraphs is an important graph-mining task with many applications. Given that the direct optimization of edge density is not meaningful, as even a single edge achieves maximum density, research has focused on optimizing alternative density functions. A very popular among such functions is the average degree, whose maximization leads to the well-known densest-subgraph notion. Surprisingly enough, however, densest subgraphs are typically large graphs, with small edge density and large diameter. In this paper, we define a novel density function, which gives subgraphs of much higher quality than densest subgraphs: the graphs found by our method are compact, dense, and with smaller diameter. We show that the proposed function can be derived from a general framework, which includes other important density functions as subcases and for which we show interesting general theoretical properties. To optimize the proposed function we provide an additive approximation algorithm and a local-search heuristic. Both algorithms are very efficient and scale well to large graphs. We evaluate our algorithms on real and synthetic datasets, and we also devise several application studies as variants of our original problem. When compared with the method that finds the subgraph of the largest average degree, our algorithms return denser subgraphs with smaller diameter. Finally, we discuss new interesting research directions that our problem leaves open.

...read moreread less

290 citations

Cites background or methods from "Greedy approximation algorithms for..."

...Charikar [10] shows that the greedy algorithm proposed by Asashiro et al....
[...]
...For finding densest subgraphs, we use the Goldberg’s exact algorithm [19] for small graphs, while for graphs whose size does not allow the Goldberg’s algorithm to terminate in reasonable time we use the Charikar’s 1 2 -approximation algorithm [10]....
[...]
...[6], which has been shown to provide a 1 2 -approximation for the densest subgraph problem [10]....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105

Collapse

References

PDF

Open Access

More filters

Book•

Combinatorial optimization: networks and matroids

[...]

Eugene L. Lawler

16 Aug 2021

2,526 citations

Proceedings Article•DOI•

Authoritative sources in a hyperlinked environment

[...]

Jon Kleinberg¹•Institutions (1)

Cornell University¹

01 Jan 1998

TL;DR: This work proposes and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of \hub pages that join them together in the link structure, that has connections to the eigenvectors of certain matrices associated with the link graph.

...read moreread less

Abstract: The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have eective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their eectiveness in a variety of contexts on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of \authoritative" information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of \hub pages" that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristics for link-based analysis.

...read moreread less

1,440 citations

Journal Article•DOI•

Trawling the Web for emerging cyber-communities

[...]

Ravi Kumar¹, Prabhakar Raghavan¹, Sridhar Rajagopalan¹, Andrew Tomkins¹•Institutions (1)

IBM¹

17 May 1999

TL;DR: The subject of this paper is the systematic enumeration of over 100,000 emerging communities from a Web crawl, motivating a graph-theoretic approach to locating such communities, and describing the algorithms and algorithmic engineering necessary to find structures that subscribe to this notion.

...read moreread less

Abstract: The Web harbors a large number of communities — groups of content-creators sharing a common interest — each of which manifests itself as a set of interlinked Web pages. Newgroups and commercial Web directories together contain of the order of 20,000 such communities; our particular interest here is on emerging communities — those that have little or no representation in such fora. The subject of this paper is the systematic enumeration of over 100,000 such emerging communities from a Web crawl: we call our process trawling. We motivate a graph-theoretic approach to locating such communities, and describe the algorithms, and the algorithmic engineering necessary to find structures that subscribe to this notion, the challenges in handling such a huge data set, and the results of our experiment. © 1999 Published by Elsevier Science B.V. All rights reserved.

...read moreread less

1,126 citations

Book Chapter•DOI•

The web as a graph: measurements, models, and methods

[...]

Jon Kleinberg¹, Ravi Kumar², Prabhakar Raghavan², Sridhar Rajagopalan², Andrew Tomkins² - Show less +1 more•Institutions (2)

Cornell University¹, IBM²

26 Jul 1999

TL;DR: This paper describes two algorithms that operate on the Web graph, addressing problems from Web search and automatic community discovery, and proposes a new family of random graph models that point to a rich new sub-field of the study of random graphs, and raises questions about the analysis of graph algorithms on the Internet.

...read moreread less

Abstract: The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons -- mathematical, sociological, and commercial -- for studying the evolution of this graph. In this paper we begin by describing two algorithms that operate on the Web graph, addressing problems from Web search and automatic community discovery. We then report a number of measurements and properties of this graph that manifested themselves as we ran these algorithms on the Web. Finally, we observe that traditional random graph models do not explain these observations, and we propose a new family of random graph models. These models point to a rich new sub-field of the study of random graphs, and raise questions about the analysis of graph algorithms on the Web.

...read moreread less

1,116 citations

"Greedy approximation algorithms for..." refers background in this paper

...Recently, the problem of finding relatively highly connected sub-structures in the web graph has received a lot of attention [8,10,11,12]....
[...]

Proceedings Article•DOI•

Inferring Web communities from link topology

[...]

David Gibson¹, Jon Kleinberg², Prabhakar Raghavan³•Institutions (3)

University of California, Berkeley¹, Cornell University², IBM³

01 May 1998

TL;DR: This investigation shows that although the process by which users of the Web create pages and links is very difficult to understand at a “local” level, it results in a much greater degree of orderly high-level structure than has typically been assumed.

...read moreread less

Abstract: The World Wide Web grows through a decentralized, almost anarchic process, and this has resulted in a large hyperlinked corpus without the kind of logical organization that can be built into more tradit,ionally-created hypermedia. To extract, meaningful structure under such circumstances, we develop a notion of hyperlinked communities on the www t,hrough an analysis of the link topology. By invoking a simple, mathematically clean method for defining and exposing the structure of these communities, we are able to derive a number of themes: The communities can be viewed as containing a core of central, “authoritative” pages linked togh and they exhibit a natural type of hierarchical topic generalization that can be inferred directly from the pat,t,ern of linkage. Our investigation shows that although the process by which users of the Web create pages and links is very difficult to understand at a “local” level, it results in a much greater degree of orderly high-level structure than has typically been assumed.

...read moreread less

905 citations

"Greedy approximation algorithms for..." refers background in this paper

...Recently, the problem of finding relatively highly connected sub-structures in the web graph has received a lot of attention [ 8 ,10,11,12]....
[...]