scispace - formally typeset
Search or ask a question
Book ChapterDOI

Greedy approximation algorithms for finding dense components in a graph

05 Sep 2000-Lecture Notes in Computer Science (Springer, Berlin, Heidelberg)-pp 84-95
TL;DR: This paper gives simple greedy approximation algorithms for these optimization problems of finding subgraphs maximizing these notions of density for undirected and directed graphs and answers an open question about the complexity of the optimization problem for directed graphs.
Abstract: We study the problem of finding highly connected subgraphs of undirected and directed graphs. For undirected graphs, the notion of density of a subgraph we use is the average degree of the subgraph. For directed graphs, a corresponding notion of density was introduced recently by Kannan and Vinay. This is designed to quantify highly connectedness of substructures in a sparse directed graph such as the web graph. We study the optimization problems of finding subgraphs maximizing these notions of density for undirected and directed graphs. This paper gives simple greedy approximation algorithms for these optimization problems. We also answer an open question about the complexity of the optimization problem for directed graphs.
Citations
More filters
Journal ArticleDOI
TL;DR: Babelfy is presented, a unified graph-based approach to EL and WSD based on a loose identification of candidate meanings coupled with a densest subgraph heuristic which selects high-coherence semantic interpretations.
Abstract: Entity Linking (EL) and Word Sense Disambiguation (WSD) both address the lexical ambiguity of language. But while the two tasks are pretty similar, they differ in a fundamental respect: in EL the textual mention can be linked to a named entity which may or may not contain the exact mention, while in WSD there is a perfect match between the word form (better, its lemma) and a suitable word sense. In this paper we present Babelfy, a unified graph-based approach to EL and WSD based on a loose identification of candidate meanings coupled with a densest subgraph heuristic which selects high-coherence semantic interpretations. Our experiments show state-of- the-art performances on both tasks on 6 different datasets, including a multilingual setting. Babelfy is online at http://babelfy.org

811 citations


Cites methods from "Greedy approximation algorithms for..."

  • ...Therefore, we define a heuristic for k-partite graphs inspired by a 2-approximation greedy algorithm for arbitrary graphs (Charikar, 2000; Khuller and Saha, 2009)....

    [...]

Proceedings ArticleDOI
25 Jul 2010
TL;DR: This paper studies a query-dependent variant of the community-detection problem, which it is called thecommunity-search problem: given a graph G, and a set of query nodes in the graph, it is sought to find a subgraph of G that contains the query nodes and it is densely connected, and develops an optimum greedy algorithm for this measure.
Abstract: A lot of research in graph mining has been devoted in the discovery of communities. Most of the work has focused in the scenario where communities need to be discovered with only reference to the input graph. However, for many interesting applications one is interested in finding the community formed by a given set of nodes. In this paper we study a query-dependent variant of the community-detection problem, which we call the community-search problem: given a graph G, and a set of query nodes in the graph, we seek to find a subgraph of G that contains the query nodes and it is densely connected. We motivate a measure of density based on minimum degree and distance constraints, and we develop an optimum greedy algorithm for this measure. We proceed by characterizing a class of monotone constraints and we generalize our algorithm to compute optimum solutions satisfying any set of monotone constraints. Finally we modify the greedy algorithm and we present two heuristic algorithms that find communities of size no greater than a specified upper bound. Our experimental evaluation on real datasets demonstrates the efficiency of the proposed algorithms and the quality of the solutions we obtain.

462 citations


Cites background or methods from "Greedy approximation algorithms for..."

  • ...Charikar [7] showed that the greedy algorithm that we con­sider in this paper can be used to .nd a factor-2 approxi­mation....

    [...]

  • ...As observed by Charikar [7], Greedy can be implemented in linear time....

    [...]

  • ...As observed by Charikar [7], Greedy can be implemented in linear time....

    [...]

  • ...Charikar [7] showed that the greedy algorithm that we consider in this paper can be used to find a factor-2 approximation....

    [...]

  • ...[4] and later analyzed by Charikar [7], who showed that it achieves a factor 2 approximation guarantee for the densest-subgraph problem....

    [...]

Journal ArticleDOI
TL;DR: A novel algorithmic framework is described that can accurately identify functional modules and carries the promise to be highly useful in analysis of high throughput data.
Abstract: With the advent of systems biology, biological knowledge is often represented today by networks. These include regulatory and metabolic networks, protein-protein interaction networks, and many others. At the same time, high-throughput genomics and proteomics techniques generate very large data sets, which require sophisticated computational analysis. Usually, separate and different analysis methodologies are applied to each of the two data types. An integrated investigation of network and high-throughput information together can improve the quality of the analysis by accounting simultaneously for topological network properties alongside intrinsic features of the high-throughput data. We describe a novel algorithmic framework for this challenge. We first transform the high-throughput data into similarity values, (e.g., by computing pairwise similarity of gene expression patterns from microarray data). Then, given a network of genes or proteins and similarity values between some of them, we seek connected sub-networks (or modules) that manifest high similarity. We develop algorithms for this problem and evaluate their performance on the osmotic shock response network in S. cerevisiae and on the human cell cycle network. We demonstrate that focused, biologically meaningful and relevant functional modules are obtained. In comparison with extant algorithms, our approach has higher sensitivity and higher specificity. We have demonstrated that our method can accurately identify functional modules. Hence, it carries the promise to be highly useful in analysis of high throughput data.

321 citations


Cites methods from "Greedy approximation algorithms for..."

  • ...Heaviest-subnet This method is inspired by Charikar's 2-approximation algorithm for the densest subgraph problem [46]....

    [...]

Journal ArticleDOI
TL;DR: This paper introduces the technique of rounding mathematical programs to the problem of modularity maximization, presenting two novel algorithms that perform comparably or better than past algorithms, while being more efficient than exhaustive techniques.
Abstract: In many networks, it is of great interest to identify communities, unusually densely knit groups of individuals. Such communities often shed light on the function of the networks or underlying properties of the individuals. Recently, Newman suggested modularity as a natural measure of the quality of a network partitioning into communities. Since then, various algorithms have been proposed for (approximately) maximizing the modularity of the partitioning determined. In this paper, we introduce the technique of rounding mathematical programs to the problem of modularity maximization, presenting two novel algorithms. More specifically, the algorithms round solutions to linear and vector programs. Importantly, the linear programing algorithm comes with an a posteriori approximation guarantee: by comparing the solution quality to the fractional solution of the linear program, a bound on the available “room for improvement” can be obtained. The vector programming algorithm provides a similar bound for the best partition into two communities. We evaluate both algorithms using experiments on several standard test cases for network partitioning algorithms, and find that they perform comparably or better than past algorithms, while being more efficient than exhaustive techniques.

303 citations


Cites background from "Greedy approximation algorithms for..."

  • ...Often, the communities identified will correspond to some notion of “dense subgraphs” [4, 13, 14, 23]....

    [...]

Proceedings ArticleDOI
11 Aug 2013
TL;DR: This paper defines a novel density function, which gives subgraphs of much higher quality than densest sub graphs: the graphs found by the method are compact, dense, and with smaller diameter.
Abstract: Finding dense subgraphs is an important graph-mining task with many applications. Given that the direct optimization of edge density is not meaningful, as even a single edge achieves maximum density, research has focused on optimizing alternative density functions. A very popular among such functions is the average degree, whose maximization leads to the well-known densest-subgraph notion. Surprisingly enough, however, densest subgraphs are typically large graphs, with small edge density and large diameter. In this paper, we define a novel density function, which gives subgraphs of much higher quality than densest subgraphs: the graphs found by our method are compact, dense, and with smaller diameter. We show that the proposed function can be derived from a general framework, which includes other important density functions as subcases and for which we show interesting general theoretical properties. To optimize the proposed function we provide an additive approximation algorithm and a local-search heuristic. Both algorithms are very efficient and scale well to large graphs. We evaluate our algorithms on real and synthetic datasets, and we also devise several application studies as variants of our original problem. When compared with the method that finds the subgraph of the largest average degree, our algorithms return denser subgraphs with smaller diameter. Finally, we discuss new interesting research directions that our problem leaves open.

290 citations


Cites background or methods from "Greedy approximation algorithms for..."

  • ...Charikar [10] shows that the greedy algorithm proposed by Asashiro et al....

    [...]

  • ...For finding densest subgraphs, we use the Goldberg’s exact algorithm [19] for small graphs, while for graphs whose size does not allow the Goldberg’s algorithm to terminate in reasonable time we use the Charikar’s 1 2 -approximation algorithm [10]....

    [...]

  • ...[6], which has been shown to provide a 1 2 -approximation for the densest subgraph problem [10]....

    [...]

References
More filters
Book
16 Aug 2021

2,526 citations

Proceedings ArticleDOI
Jon Kleinberg1
01 Jan 1998
TL;DR: This work proposes and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of \hub pages that join them together in the link structure, that has connections to the eigenvectors of certain matrices associated with the link graph.
Abstract: The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have eective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their eectiveness in a variety of contexts on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of \authoritative" information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of \hub pages" that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristics for link-based analysis.

1,440 citations

Journal ArticleDOI
17 May 1999
TL;DR: The subject of this paper is the systematic enumeration of over 100,000 emerging communities from a Web crawl, motivating a graph-theoretic approach to locating such communities, and describing the algorithms and algorithmic engineering necessary to find structures that subscribe to this notion.
Abstract: The Web harbors a large number of communities — groups of content-creators sharing a common interest — each of which manifests itself as a set of interlinked Web pages. Newgroups and commercial Web directories together contain of the order of 20,000 such communities; our particular interest here is on emerging communities — those that have little or no representation in such fora. The subject of this paper is the systematic enumeration of over 100,000 such emerging communities from a Web crawl: we call our process trawling. We motivate a graph-theoretic approach to locating such communities, and describe the algorithms, and the algorithmic engineering necessary to find structures that subscribe to this notion, the challenges in handling such a huge data set, and the results of our experiment. © 1999 Published by Elsevier Science B.V. All rights reserved.

1,126 citations

Book ChapterDOI
26 Jul 1999
TL;DR: This paper describes two algorithms that operate on the Web graph, addressing problems from Web search and automatic community discovery, and proposes a new family of random graph models that point to a rich new sub-field of the study of random graphs, and raises questions about the analysis of graph algorithms on the Internet.
Abstract: The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons -- mathematical, sociological, and commercial -- for studying the evolution of this graph. In this paper we begin by describing two algorithms that operate on the Web graph, addressing problems from Web search and automatic community discovery. We then report a number of measurements and properties of this graph that manifested themselves as we ran these algorithms on the Web. Finally, we observe that traditional random graph models do not explain these observations, and we propose a new family of random graph models. These models point to a rich new sub-field of the study of random graphs, and raise questions about the analysis of graph algorithms on the Web.

1,116 citations


"Greedy approximation algorithms for..." refers background in this paper

  • ...Recently, the problem of finding relatively highly connected sub-structures in the web graph has received a lot of attention [8,10,11,12]....

    [...]

Proceedings ArticleDOI
01 May 1998
TL;DR: This investigation shows that although the process by which users of the Web create pages and links is very difficult to understand at a “local” level, it results in a much greater degree of orderly high-level structure than has typically been assumed.
Abstract: The World Wide Web grows through a decentralized, almost anarchic process, and this has resulted in a large hyperlinked corpus without the kind of logical organization that can be built into more tradit,ionally-created hypermedia. To extract, meaningful structure under such circumstances, we develop a notion of hyperlinked communities on the www t,hrough an analysis of the link topology. By invoking a simple, mathematically clean method for defining and exposing the structure of these communities, we are able to derive a number of themes: The communities can be viewed as containing a core of central, “authoritative” pages linked togh and they exhibit a natural type of hierarchical topic generalization that can be inferred directly from the pat,t,ern of linkage. Our investigation shows that although the process by which users of the Web create pages and links is very difficult to understand at a “local” level, it results in a much greater degree of orderly high-level structure than has typically been assumed.

905 citations


"Greedy approximation algorithms for..." refers background in this paper

  • ...Recently, the problem of finding relatively highly connected sub-structures in the web graph has received a lot of attention [ 8 ,10,11,12]....

    [...]

Trending Questions (1)
Calculate the density of directed and undirected graph?

The density of a directed graph is defined as the maximum density of any subset of vertices, while the density of an undirected graph is defined as the maximum density of any subset of vertices.