Showing papers in "arXiv: Data Structures and Algorithms in 2008"

PDF

Open Access

Posted Content•

Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters

[...]

Jure Leskovec¹, Kevin J. Lang, Anirban Dasgupta, Michael W. Mahoney¹•Institutions (1)

08 Oct 2008-arXiv: Data Structures and Algorithms

TL;DR: In this article, the authors employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities.

...read moreread less

Abstract: A large body of work has been devoted to defining and identifying clusters or communities in social and information networks. We explore from a novel perspective several questions related to identifying meaningful communities in large social and information networks, and we come to several striking conclusions. We employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities. In particular, we define the network community profile plot, which characterizes the "best" possible community--according to the conductance measure--over a wide range of size scales. We study over 100 large real-world social and information networks. Our results suggest a significantly more refined picture of community structure in large networks than has been appreciated previously. In particular, we observe tight communities that are barely connected to the rest of the network at very small size scales; and communities of larger size scales gradually "blend into" the expander-like core of the network and thus become less "community-like." This behavior is not explained, even at a qualitative level, by any of the commonly-used network generation models. Moreover, it is exactly the opposite of what one would expect based on intuition from expander graphs, low-dimensional or manifold-like graphs, and from small social networks that have served as testbeds of community detection algorithms. We have found that a generative graph model, in which new edges are added via an iterative "forest fire" burning process, is able to produce graphs exhibiting a network community profile plot similar to what we observe in our network datasets.

...read moreread less

1,555 citations

Posted Content•

Spectral Sparsification of Graphs

[...]

Daniel A. Spielman¹, Shang-Hua Teng•Institutions (1)

Yale University¹

29 Aug 2008-arXiv: Data Structures and Algorithms

TL;DR: In this article, the authors introduce a new notion of graph sparsificaiton based on spectral similarity of graph Laplacians, and prove that every graph has a spectral sparsifier of nearly linear size.

...read moreread less

Abstract: We introduce a new notion of graph sparsificaiton based on spectral similarity of graph Laplacians: spectral sparsification requires that the Laplacian quadratic form of the sparsifier approximate that of the original. This is equivalent to saying that the Laplacian of the sparsifier is a good preconditioner for the Laplacian of the original. We prove that every graph has a spectral sparsifier of nearly linear size. Moreover, we present an algorithm that produces spectral sparsifiers in time $\softO{m}$, where $m$ is the number of edges in the original graph. This construction is a key component of a nearly-linear time algorithm for solving linear equations in diagonally-dominant matrcies. Our sparsification algorithm makes use of a nearly-linear time algorithm for graph partitioning that satisfies a strong guarantee: if the partition it outputs is very unbalanced, then the larger part is contained in a subgraph of high conductance.

...read moreread less

330 citations

Posted Content•

Multi-Armed Bandits in Metric Spaces

[...]

Robert Kleinberg¹, Aleksandrs Slivkins², Eli Upfal³•Institutions (3)

Cornell University¹, Microsoft², Brown University³

29 Sep 2008-arXiv: Data Structures and Algorithms

TL;DR: In this paper, the authors studied a general setting for the multi-armed bandit problem in which the strategies form a metric space, and the payoff function satisfies a Lipschitz condition with respect to the metric.

...read moreread less

Abstract: In a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a sequence of trials so as to maximize the total payoff of the chosen strategies. While the performance of bandit algorithms with a small finite strategy set is quite well understood, bandit problems with large strategy sets are still a topic of very active investigation, motivated by practical applications such as online auctions and web advertisement. The goal of such research is to identify broad and natural classes of strategy sets and payoff functions which enable the design of efficient solutions. In this work we study a very general setting for the multi-armed bandit problem in which the strategies form a metric space, and the payoff function satisfies a Lipschitz condition with respect to the metric. We refer to this problem as the "Lipschitz MAB problem". We present a complete solution for the multi-armed problem in this setting. That is, for every metric space (L,X) we define an isometry invariant which bounds from below the performance of Lipschitz MAB algorithms for X, and we present an algorithm which comes arbitrarily close to meeting this bound. Furthermore, our technique gives even better results for benign payoff functions.

...read moreread less

329 citations

Posted Content•

An Improved Approximation Algorithm for the Column Subset Selection Problem

[...]

Christos Boutsidis¹, Michael W. Mahoney², Petros Drineas¹•Institutions (2)

Rensselaer Polytechnic Institute¹, Stanford University²

22 Dec 2008-arXiv: Data Structures and Algorithms

TL;DR: A novel two-stage algorithm that runs in O(min{mn2, m2n}) time and returns as output an m x k matrix C consisting of exactly k columns of A, and it is proved that the spectral norm bound improves upon the best previously-existing result and is roughly O(√k!) better than the best previous algorithmic result.

...read moreread less

Abstract: We consider the problem of selecting the best subset of exactly $k$ columns from an $m \times n$ matrix $A$. We present and analyze a novel two-stage algorithm that runs in $O(\min\{mn^2,m^2n\})$ time and returns as output an $m \times k$ matrix $C$ consisting of exactly $k$ columns of $A$. In the first (randomized) stage, the algorithm randomly selects $\Theta(k \log k)$ columns according to a judiciously-chosen probability distribution that depends on information in the top-$k$ right singular subspace of $A$. In the second (deterministic) stage, the algorithm applies a deterministic column-selection procedure to select and return exactly $k$ columns from the set of columns selected in the first stage. Let $C$ be the $m \times k$ matrix containing those $k$ columns, let $P_C$ denote the projection matrix onto the span of those columns, and let $A_k$ denote the best rank-$k$ approximation to the matrix $A$. Then, we prove that, with probability at least 0.8, $$ \FNorm{A - P_CA} \leq \Theta(k \log^{1/2} k) \FNorm{A-A_k}. $$ This Frobenius norm bound is only a factor of $\sqrt{k \log k}$ worse than the best previously existing existential result and is roughly $O(\sqrt{k!})$ better than the best previous algorithmic result for the Frobenius norm version of this Column Subset Selection Problem (CSSP). We also prove that, with probability at least 0.8, $$ \TNorm{A - P_CA} \leq \Theta(k \log^{1/2} k)\TNorm{A-A_k} + \Theta(k^{3/4}\log^{1/4}k)\FNorm{A-A_k}. $$ This spectral norm bound is not directly comparable to the best previously existing bounds for the spectral norm version of this CSSP. Our bound depends on $\FNorm{A-A_k}$, whereas previous results depend on $\sqrt{n-k}\TNorm{A-A_k}$; if these two quantities are comparable, then our bound is asymptotically worse by a $(k \log k)^{1/4}$ factor.

...read moreread less

293 citations

Posted Content•

A Local Clustering Algorithm for Massive Graphs and its Application to Nearly-Linear Time Graph Partitioning

[...]

Daniel A. Spielman, Shang-Hua Teng

18 Sep 2008-arXiv: Data Structures and Algorithms

TL;DR: In this paper, a local clustering algorithm is proposed to find a good subset of vertices whose internal connections are significantly richer than its external connections near a given vertex, and the running time of this algorithm is nearly linear in the size of the cluster it outputs.

...read moreread less

Abstract: We study the design of local algorithms for massive graphs. A local algorithm is one that finds a solution containing or near a given vertex without looking at the whole graph. We present a local clustering algorithm. Our algorithm finds a good cluster--a subset of vertices whose internal connections are significantly richer than its external connections--near a given vertex. The running time of our algorithm, when it finds a non-empty local cluster, is nearly linear in the size of the cluster it outputs. Our clustering algorithm could be a useful primitive for handling massive graphs, such as social networks and web-graphs. As an application of this clustering algorithm, we present a partitioning algorithm that finds an approximate sparsest cut with nearly optimal balance. Our algorithm takes time nearly linear in the number edges of the graph. Using the partitioning algorithm of this paper, we have designed a nearly-linear time algorithm for constructing spectral sparsifiers of graphs, which we in turn use in a nearly-linear time algorithm for solving linear systems in symmetric, diagonally-dominant matrices. The linear system solver also leads to a nearly linear-time algorithm for approximating the second-smallest eigenvalue and corresponding eigenvector of the Laplacian matrix of a graph. These other results are presented in two companion papers.

...read moreread less

261 citations

Posted Content•

Submodular approximation: sampling-based algorithms and lower bounds

[...]

Zoya Svitkina¹, Lisa Fleischer²•Institutions (2)

Cornell University¹, Dartmouth College²

07 May 2008-arXiv: Data Structures and Algorithms

TL;DR: In this paper, the authors introduced several generalizations of classical computer science problems obtained by replacing simpler objective functions with general submodular functions, and established upper and lower bounds for the approximability of these problems with a polynomial number of queries to a function-value oracle.

...read moreread less

Abstract: We introduce several generalizations of classical computer science problems obtained by replacing simpler objective functions with general submodular functions. The new problems include submodular load balancing, which generalizes load balancing or minimum-makespan scheduling, submodular sparsest cut and submodular balanced cut, which generalize their respective graph cut problems, as well as submodular function minimization with a cardinality lower bound. We establish upper and lower bounds for the approximability of these problems with a polynomial number of queries to a function-value oracle. The approximation guarantees for most of our algorithms are of the order of sqrt(n/ln n). We show that this is the inherent difficulty of the problems by proving matching lower bounds. We also give an improved lower bound for the problem of approximately learning a monotone submodular function. In addition, we present an algorithm for approximately learning submodular functions with special structure, whose guarantee is close to the lower bound. Although quite restrictive, the class of functions with this structure includes the ones that are used for lower bounds both by us and in previous work. This demonstrates that if there are significantly stronger lower bounds for this problem, they rely on more general submodular functions.

...read moreread less

154 citations

Posted Content•

A constructive proof of the Lovasz Local Lemma

[...]

Robin A. Moser¹•Institutions (1)

ETH Zurich¹

27 Oct 2008-arXiv: Data Structures and Algorithms

TL;DR: This paper gives a randomized algorithm that finds a satisfying assignment to every k-CNF formula in which each clause has a neighbourhood of at most the asymptotic optimum of 2(k-5)-1 other clauses and that runs in expected time polynomial in the size of the formula, irrespective of k.

...read moreread less

Abstract: The Lovasz Local Lemma [EL75] is a powerful tool to prove the existence of combinatorial objects meeting a prescribed collection of criteria. The technique can directly be applied to the satisfiability problem, yielding that a k-CNF formula in which each clause has common variables with at most 2^(k-2) other clauses is always satisfiable. All hitherto known proofs of the Local Lemma are non-constructive and do thus not provide a recipe as to how a satisfying assignment to such a formula can be efficiently found. In his breakthrough paper [Bec91], Beck demonstrated that if the neighbourhood of each clause be restricted to O(2^(k/48)), a polynomial time algorithm for the search problem exists. Alon simplified and randomized his procedure and improved the bound to O(2^(k/8)) [Alo91]. Srinivasan presented in [Sri08] a variant that achieves a bound of essentially O(2^(k/4)). In [Mos08], we improved this to O(2^(k/2)). In the present paper, we give a randomized algorithm that finds a satisfying assignment to every k-CNF formula in which each clause has a neighbourhood of at most the asymptotic optimum of 2^(k-5)-1 other clauses and that runs in expected time polynomial in the size of the formula, irrespective of k. If k is considered a constant, we can also give a deterministic variant. In contrast to all previous approaches, our analysis does not anymore invoke the standard non-constructive versions of the Local Lemma and can therefore be considered an alternative, constructive proof of it.

...read moreread less

147 citations

Posted Content•

Algorithms for Secretary Problems on Graphs and Hypergraphs

[...]

Nitish Korula¹, Martin Pál²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Google²

07 Jul 2008-arXiv: Data Structures and Algorithms

TL;DR: Dimitrov and Plaxton as discussed by the authors gave a 2e-competitive algorithm for the secretary problem on graphic matroids, where, with edges appearing online, the goal is to find a maximum-weight acyclic subgraph of a given graph.

...read moreread less

Abstract: We examine several online matching problems, with applications to Internet advertising reservation systems. Consider an edge-weighted bipartite graph G, with partite sets L, R. We develop an 8-competitive algorithm for the following secretary problem: Initially given R, and the size of L, the algorithm receives the vertices of L sequentially, in a random order. When a vertex l \in L is seen, all edges incident to l are revealed, together with their weights. The algorithm must immediately either match l to an available vertex of R, or decide that l will remain unmatched. Dimitrov and Plaxton show a 16-competitive algorithm for the transversal matroid secretary problem, which is the special case with weights on vertices, not edges. (Equivalently, one may assume that for each l \in L, the weights on all edges incident to l are identical.) We use a similar algorithm, but simplify and improve the analysis to obtain a better competitive ratio for the more general problem. Perhaps of more interest is the fact that our analysis is easily extended to obtain competitive algorithms for similar problems, such as to find disjoint sets of edges in hypergraphs where edges arrive online. We also introduce secretary problems with adversarially chosen groups. Finally, we give a 2e-competitive algorithm for the secretary problem on graphic matroids, where, with edges appearing online, the goal is to find a maximum-weight acyclic subgraph of a given graph.

...read moreread less

116 citations

Posted Content•

Almost 2-SAT is Fixed-Parameter Tractable

[...]

Igor Razgon¹, Barry O'Sullivan¹•Institutions (1)

University College Cork¹

08 Jan 2008-arXiv: Data Structures and Algorithms

TL;DR: The fixed-parameter tractability of the 2-CNF deletion problem was shown in this article, where the authors proposed an algorithm that solves this problem in O(15k*k*m^3) time.

...read moreread less

Abstract: We consider the following problem. Given a 2-CNF formula, is it possible to remove at most $k$ clauses so that the resulting 2-CNF formula is satisfiable? This problem is known to different research communities in Theoretical Computer Science under the names 'Almost 2-SAT', 'All-but-$k$ 2-SAT', '2-CNF deletion', '2-SAT deletion'. The status of fixed-parameter tractability of this problem is a long-standing open question in the area of Parameterized Complexity. We resolve this open question by proposing an algorithm which solves this problem in $O(15^k*k*m^3)$ and thus we show that this problem is fixed-parameter tractable.

...read moreread less

113 citations

Posted Content•

Faster Approximate Lossy Generalized Flow via Interior Point Algorithms

[...]

Samuel I. Daitch¹, Daniel A. Spielman¹•Institutions (1)

Yale University¹

06 Mar 2008-arXiv: Data Structures and Algorithms

TL;DR: In this article, the authors presented a faster algorithm for generalized network flow problems, in which the flow out of an edge differs from the flow into the edge by a constant factor.

...read moreread less

Abstract: We present faster approximation algorithms for generalized network flow problems. A generalized flow is one in which the flow out of an edge differs from the flow into the edge by a constant factor. We limit ourselves to the lossy case, when these factors are at most 1. Our algorithm uses a standard interior-point algorithm to solve a linear program formulation of the network flow problem. The system of linear equations that arises at each step of the interior-point algorithm takes the form of a symmetric M-matrix. We present an algorithm for solving such systems in nearly linear time. The algorithm relies on the Spielman-Teng nearly linear time algorithm for solving linear systems in diagonally-dominant matrices. For a graph with m edges, our algorithm obtains an additive epsilon approximation of the maximum generalized flow and minimum cost generalized flow in time tildeO(m^(3/2) * log(1/epsilon)). In many parameter ranges, this improves over previous algorithms by a factor of approximately m^(1/2). We also obtain a similar improvement for exactly solving the standard min-cost flow problem.

...read moreread less

112 citations

Posted Content•

Nearly Tight Low Stretch Spanning Trees

[...]

Ittai Abraham¹, Yair Bartal¹, Ofer Neiman¹•Institutions (1)

Hebrew University of Jerusalem¹

14 Aug 2008-arXiv: Data Structures and Algorithms

TL;DR: In this paper, it was shown that any graph with n points has a distribution over spanning trees such that for any edge (u, v)$ the expected stretch $E{T \sim \mathcal{T}}[d_T(u,v)/d_G(u and v)]$ is bounded by a factor of O(log n).

...read moreread less

Abstract: We prove that any graph $G$ with $n$ points has a distribution $\mathcal{T}$ over spanning trees such that for any edge $(u,v)$ the expected stretch $E_{T \sim \mathcal{T}}[d_T(u,v)/d_G(u,v)]$ is bounded by $\tilde{O}(\log n)$. Our result is obtained via a new approach of building ``highways'' between portals and a new strong diameter probabilistic decomposition theorem.

...read moreread less

Posted Content•

Simpler Analyses of Local Search Algorithms for Facility Location

[...]

Anupam Gupta, Kanat Tangwongsan¹•Institutions (1)

Carnegie Mellon University¹

15 Sep 2008-arXiv: Data Structures and Algorithms

TL;DR: A proof of the $k-median result which avoids the ``coupling'' argument and can be used in other settings where the Arya et al. arguments have been used.

...read moreread less

Abstract: We study local search algorithms for metric instances of facility location problems: the uncapacitated facility location problem (UFL), as well as uncapacitated versions of the $k$-median, $k$-center and $k$-means problems. All these problems admit natural local search heuristics: for example, in the UFL problem the natural moves are to open a new facility, close an existing facility, and to swap a closed facility for an open one; in $k$-medians, we are allowed only swap moves. The local-search algorithm for $k$-median was analyzed by Arya et al. (SIAM J. Comput. 33(3):544-562, 2004), who used a clever ``coupling'' argument to show that local optima had cost at most constant times the global optimum. They also used this argument to show that the local search algorithm for UFL was 3-approximation; their techniques have since been applied to other facility location problems. In this paper, we give a proof of the $k$-median result which avoids this coupling argument. These arguments can be used in other settings where the Arya et al. arguments have been used. We also show that for the problem of opening $k$ facilities $F$ to minimize the objective function $\Phi_p(F) = \big(\sum_{j \in V} d(j, F)^p\big)^{1/p}$, the natural swap-based local-search algorithm is a $\Theta(p)$-approximation. This implies constant-factor approximations for $k$-medians (when $p=1$), and $k$-means (when $p = 2$), and an $O(\log n)$-approximation algorithm for the $k$-center problem (which is essentially $p = \log n$).

...read moreread less

Posted Content•

Optimal Succinctness for Range Minimum Queries

[...]

Johannes Fischer¹•Institutions (1)

University of Tübingen¹

15 Dec 2008-arXiv: Data Structures and Algorithms

TL;DR: This work shows how to preprocess A into a scheme of size 2n+o(n) bits that allows to answer range minimum queries on A in constant time, and improves on LCA-computation in BPS- or DFUDS-encoded trees.

...read moreread less

Abstract: For a static array A of n ordered objects, a range minimum query asks for the position of the minimum between two specified array indices. We show how to preprocess A into a scheme of size 2n+o(n) bits that allows to answer range minimum queries on A in constant time. This space is asymptotically optimal in the important setting where access to A is not permitted after the preprocessing step. Our scheme can be computed in linear time, using only n + o(n) additional bits at construction time. In interesting by-product is that we also improve on LCA-computation in BPS- or DFUDS-encoded trees.

...read moreread less

Posted Content•

Betweenness Centrality : Algorithms and Lower Bounds

[...]

Shiva Kintali¹•Institutions (1)

Georgia Institute of Technology¹

11 Sep 2008-arXiv: Data Structures and Algorithms

TL;DR: This paper presents a randomized parallel algorithm and an algebraic method for computing betweenness centrality of all nodes in a network and proves that any path-comparison based algorithm cannot compute betweenness in less than O(nm) time.

...read moreread less

Abstract: One of the most fundamental problems in large scale network analysis is to determine the importance of a particular node in a network. Betweenness centrality is the most widely used metric to measure the importance of a node in a network. In this paper, we present a randomized parallel algorithm and an algebraic method for computing betweenness centrality of all nodes in a network. We prove that any path-comparison based algorithm cannot compute betweenness in less than O(nm) time.

...read moreread less

Posted Content•

Twice-Ramanujan Sparsifiers

[...]

Joshua Batson, Daniel A. Spielman, Nikhil Srivastava

01 Aug 2008-arXiv: Data Structures and Algorithms

TL;DR: In this article, it was shown that every graph has a spectral sparsifier with a number of edges linear in its number of vertices, and that sparsifiers of arbitrary graphs can be viewed as generalizations of expander graphs.

...read moreread less

Abstract: We prove that every graph has a spectral sparsifier with a number of edges linear in its number of vertices. As linear-sized spectral sparsifiers of complete graphs are expanders, our sparsifiers of arbitrary graphs can be viewed as generalizations of expander graphs. In particular, we prove that for every $d>1$ and every undirected, weighted graph $G=(V,E,w)$ on $n$ vertices, there exists a weighted graph $H=(V,F,\tilde{w})$ with at most $\ceil{d(n-1)}$ edges such that for every $x \in \R^{V}$, \[ x^{T}L_{G}x \leq x^{T}L_{H}x \leq (\frac{d+1+2\sqrt{d}}{d+1-2\sqrt{d}})\cdot x^{T}L_{G}x \] where $L_{G}$ and $L_{H}$ are the Laplacian matrices of $G$ and $H$, respectively. Thus, $H$ approximates $G$ spectrally at least as well as a Ramanujan expander with $dn/2$ edges approximates the complete graph. We give an elementary deterministic polynomial time algorithm for constructing $H$.

...read moreread less

Posted Content•

Characterizing Truthful Multi-Armed Bandit Mechanisms

[...]

Moshe Babaioff¹, Y. Sharma², Aleksandrs Slivkins¹•Institutions (2)

Microsoft¹, Cornell University²

12 Dec 2008-arXiv: Data Structures and Algorithms

TL;DR: This work considers a multiround auction setting motivated by pay-per-click auctions for Internet advertising, and investigates how the design of multi-armed bandit algorithms is affected by the difference in social welfare.

...read moreread less

Abstract: We consider a multi-round auction setting motivated by pay-per-click auctions for Internet advertising. In each round the auctioneer selects an advertiser and shows her ad, which is then either clicked or not. An advertiser derives value from clicks; the value of a click is her private information. Initially, neither the auctioneer nor the advertisers have any information about the likelihood of clicks on the advertisements. The auctioneer's goal is to design a (dominant strategies) truthful mechanism that (approximately) maximizes the social welfare. If the advertisers bid their true private values, our problem is equivalent to the "multi-armed bandit problem", and thus can be viewed as a strategic version of the latter. In particular, for both problems the quality of an algorithm can be characterized by "regret", the difference in social welfare between the algorithm and the benchmark which always selects the same "best" advertisement. We investigate how the design of multi-armed bandit algorithms is affected by the restriction that the resulting mechanism must be truthful. We find that truthful mechanisms have certain strong structural properties -- essentially, they must separate exploration from exploitation -- and they incur much higher regret than the optimal multi-armed bandit algorithms. Moreover, we provide a truthful mechanism which (essentially) matches our lower bound on regret.

...read moreread less

Posted Content•

Kernel(s) for Problems With no Kernel: On Out-Trees With Many Leaves

[...]

Henning Fernau¹, Fedor V. Fomin², Daniel Lokshtanov², Daniel Raible¹, Saket Saurabh², Yngve Villanger² - Show less +2 more•Institutions (2)

University of Trier¹, University of Bergen²

27 Oct 2008-arXiv: Data Structures and Algorithms

TL;DR: This work gives the first polynomial kernel for Rooted k-Leaf-Out-Branching, a variant of k- leaf- out-branching where the root of the tree searched for is also a part of the input, and is the first ones separating Karp kernelization from Turing kernelization.

...read moreread less

Abstract: The {\sc $k$-Leaf Out-Branching} problem is to find an out-branching (i.e. a rooted oriented spanning tree) with at least $k$ leaves in a given digraph. The problem has recently received much attention from the viewpoint of parameterized algorithms {alonLNCS4596,AlonFGKS07fsttcs,BoDo2,KnLaRo}. In this paper we step aside and take a kernelization based approach to the {\sc $k$-Leaf-Out-Branching} problem. We give the first polynomial kernel for {\sc Rooted $k$-Leaf-Out-Branching}, a variant of {\sc $k$-Leaf-Out-Branching} where the root of the tree searched for is also a part of the input. Our kernel has cubic size and is obtained using extremal combinatorics. For the {\sc $k$-Leaf-Out-Branching} problem we show that no polynomial kernel is possible unless polynomial hierarchy collapses to third level %$PH=\Sigma_p^3$ by applying a recent breakthrough result by Bodlaender et al. {BDFH08} in a non-trivial fashion. However our positive results for {\sc Rooted $k$-Leaf-Out-Branching} immediately imply that the seemingly intractable the {\sc $k$-Leaf-Out-Branching} problem admits a data reduction to $n$ independent $O(k^3)$ kernels. These two results, tractability and intractability side by side, are the first separating {\it many-to-one kernelization} from {\it Turing kernelization}. This answers affirmatively an open problem regarding "cheat kernelization" raised in {IWPECOPEN08}.

...read moreread less

Posted Content•

An Optimal Bloom Filter Replacement Based on Matrix Solving

[...]

Ely Porat¹•Institutions (1)

Bar-Ilan University¹

11 Apr 2008-arXiv: Data Structures and Algorithms

TL;DR: This work suggests a method for holding a dictionary data structure, which maps keys to values, in the spirit of Bloom Filters, and suggests a data structure that requires only nk bits space, has O (n) preprocessing time, and has a O (logn ) query time.

...read moreread less

Abstract: We suggest a method for holding a dictionary data structure, which maps keys to values, in the spirit of Bloom Filters. The space requirements of the dictionary we suggest are much smaller than those of a hashtable. We allow storing n keys, each mapped to value which is a string of k bits. Our suggested method requires nk + o(n) bits space to store the dictionary, and O(n) time to produce the data structure, and allows answering a membership query in O(1) memory probes. The dictionary size does not depend on the size of the keys. However, reducing the space requirements of the data structure comes at a certain cost. Our dictionary has a small probability of a one sided error. When attempting to obtain the value for a key that is stored in the dictionary we always get the correct answer. However, when testing for membership of an element that is not stored in the dictionary, we may get an incorrect answer, and when requesting the value of such an element we may get a certain random value. Our method is based on solving equations in GF(2^k) and using several hash functions. Another significant advantage of our suggested method is that we do not require using sophisticated hash functions. We only require pairwise independent hash functions. We also suggest a data structure that requires only nk bits space, has O(n2) preprocessing time, and has a O(log n) query time. However, this data structures requires a uniform hash functions. In order replace a Bloom Filter of n elements with an error proability of 2^{-k}, we require nk + o(n) memory bits, O(1) query time, O(n) preprocessing time, and only pairwise independent hash function. Even the most advanced previously known Bloom Filter would require nk+O(n) space, and a uniform hash functions, so our method is significantly less space consuming especially when k is small.

...read moreread less

Posted Content•

Search Space Contraction in Canonical Labeling of Graphs

[...]

Adolfo Piperno

30 Apr 2008-arXiv: Data Structures and Algorithms

TL;DR: In this paper, the individualization-refinement paradigm for computing a canonical labeling and the automorphism group of a graph is investigated, and a new algorithmic design aimed at reducing the size of the associated search space is introduced.

...read moreread less

Abstract: The individualization-refinement paradigm for computing a canonical labeling and the automorphism group of a graph is investigated. A new algorithmic design aimed at reducing the size of the associated search space is introduced, and a new tool, named "Traces", is presented, together with experimental results and comparisons with existing software, such as McKay's "nauty". It is shown that the approach presented here leads to a huge reduction in the search space, thereby making computation feasible for several classes of graphs which are hard for all the main canonical labeling tools in the literature.

...read moreread less

Posted Content•

Optimal Tracking of Distributed Heavy Hitters and Quantiles

[...]

Ke Yi¹, Qin Zhang²•Institutions (2)

Hong Kong University of Science and Technology¹, National Research Foundation of South Africa²

01 Dec 2008-arXiv: Data Structures and Algorithms

TL;DR: In this paper, the authors considered the problem of tracking heavy hitters and quantiles in the distributed streaming model, and gave algorithms with worst-case communication cost O(k/επ \cdot \log n) for both problems, where n is the total number of items in the stream, and ε is the approximation error.

...read moreread less

Abstract: We consider the the problem of tracking heavy hitters and quantiles in the distributed streaming model. The heavy hitters and quantiles are two important statistics for characterizing a data distribution. Let $A$ be a multiset of elements, drawn from the universe $U=\{1,...,u\}$. For a given $0 \le \phi \le 1$, the $\phi$-heavy hitters are those elements of $A$ whose frequency in $A$ is at least $\phi |A|$; the $\phi$-quantile of $A$ is an element $x$ of $U$ such that at most $\phi|A|$ elements of $A$ are smaller than $A$ and at most $(1-\phi)|A|$ elements of $A$ are greater than $x$. Suppose the elements of $A$ are received at $k$ remote {\em sites} over time, and each of the sites has a two-way communication channel to a designated {\em coordinator}, whose goal is to track the set of $\phi$-heavy hitters and the $\phi$-quantile of $A$ approximately at all times with minimum communication. We give tracking algorithms with worst-case communication cost $O(k/\eps \cdot \log n)$ for both problems, where $n$ is the total number of items in $A$, and $\eps$ is the approximation error. This substantially improves upon the previous known algorithms. We also give matching lower bounds on the communication costs for both problems, showing that our algorithms are optimal. We also consider a more general version of the problem where we simultaneously track the $\phi$-quantiles for all $0 \le \phi \le 1$.

...read moreread less

Posted Content•

Succinct Data Structures for Retrieval and Approximate Membership

[...]

Martin Dietzfelbinger¹, Rasmus Pagh²•Institutions (2)

Technische Universität Ilmenau¹, IT University of Copenhagen²

26 Mar 2008-arXiv: Data Structures and Algorithms

TL;DR: It is shown that for any k, query time O(k) can be beachieved using space that is within a factor 1 + e-k of optimal, asymptotically forlarge n.

...read moreread less

Abstract: The retrieval problem is the problem of associating data with keys in a set. Formally, the data structure must store a function f: U ->{0,1}^r that has specified values on the elements of a given set S, a subset of U, |S|=n, but may have any value on elements outside S. Minimal perfect hashing makes it possible to avoid storing the set S, but this induces a space overhead of Theta(n) bits in addition to the nr bits needed for function values. In this paper we show how to eliminate this overhead. Moreover, we show that for any k query time O(k) can be achieved using space that is within a factor 1+e^{-k} of optimal, asymptotically for large n. If we allow logarithmic evaluation time, the additive overhead can be reduced to O(log log n) bits whp. The time to construct the data structure is O(n), expected. A main technical ingredient is to utilize existing tight bounds on the probability of almost square random matrices with rows of low weight to have full row rank. In addition to direct constructions, we point out a close connection between retrieval structures and hash tables where keys are stored in an array and some kind of probing scheme is used. Further, we propose a general reduction that transfers the results on retrieval into analogous results on approximate membership, a problem traditionally addressed using Bloom filters. Again, we show how to eliminate the space overhead present in previously known methods, and get arbitrarily close to the lower bound. The evaluation procedures of our data structures are extremely simple (similar to a Bloom filter). For the results stated above we assume free access to fully random hash functions. However, we show how to justify this assumption using extra space o(n) to simulate full randomness on a RAM.

...read moreread less

Posted Content•

Design by Measure and Conquer, A Faster Exact Algorithm for Dominating Set

[...]

Johan M. M. van Rooij¹, Hans L. Bodlaender¹•Institutions (1)

Utrecht University¹

20 Feb 2008-arXiv: Data Structures and Algorithms

TL;DR: Design by measure and conquer is proposed to be a form of computer aided algorithm design, thus giving a new, possibly faster algorithm in the design of algorithms.

...read moreread less

Abstract: The measure and conquer approach has proven to be a powerful tool to analyse exact algorithms for combinatorial problems, like Dominating Set and Independent Set. In this paper, we propose to use measure and conquer also as a tool in the design of algorithms. In an iterative process, we can obtain a series of branch and reduce algorithms. A mathematical analysis of an algorithm in the series with measure and conquer results in a quasiconvex programming problem. The solution by computer to this problem not only gives a bound on the running time, but also can give a new reduction rule, thus giving a new, possibly faster algorithm. This makes design by measure and conquer a form of computer aided algorithm design. When we apply the methodology to a Set Cover modelling of the Dominating Set problem, we obtain the currently fastest known exact algorithms for Dominating Set: an algorithm that uses $O(1.5134^n)$ time and polynomial space, and an algorithm that uses $O(1.5063^n)$ time.

...read moreread less

Posted Content•

Domination in graphs with bounded propagation: algorithms, formulations and hardness results

[...]

Ashkan Aazami¹•Institutions (1)

University of Waterloo¹

15 Feb 2008-arXiv: Data Structures and Algorithms

TL;DR: In this article, the authors introduced a hierarchy of problems between the dominating set problem and the power dominating set (PDS) problem, called the ''ell$-round power dominating sets'' problem, where the goal is to find a minimum size set of nodes that power dominates all the nodes.

...read moreread less

Abstract: We introduce a hierarchy of problems between the \textsc{Dominating Set} problem and the \textsc{Power Dominating Set} (PDS) problem called the $\ell$-round power dominating set ($\ell$-round PDS, for short) problem. For $\ell=1$, this is the \textsc{Dominating Set} problem, and for $\ell\geq n-1$, this is the PDS problem; here $n$ denotes the number of nodes in the input graph. In PDS the goal is to find a minimum size set of nodes $S$ that power dominates all the nodes, where a node $v$ is power dominated if (1) $v$ is in $S$ or it has a neighbor in $S$, or (2) $v$ has a neighbor $u$ such that $u$ and all of its neighbors except $v$ are power dominated. Note that rule (1) is the same as for the \textsc{Dominating Set} problem, and that rule (2) is a type of propagation rule that applies iteratively. The $\ell$-round PDS problem has the same set of rules as PDS, except we apply rule (2) in ``parallel'' in at most $\ell-1$ rounds. We prove that $\ell$-round PDS cannot be approximated better than $2^{\log^{1-\epsilon}{n}}$ even for $\ell=4$ in general graphs. We provide a dynamic programming algorithm to solve $\ell$-round PDS optimally in polynomial time on graphs of bounded tree-width. We present a PTAS (polynomial time approximation scheme) for $\ell$-round PDS on planar graphs for $\ell=O(\tfrac{\log{n}}{\log{\log{n}}})$. Finally, we give integer programming formulations for $\ell$-round PDS.

...read moreread less

Posted Content•

Minimum Leaf Out-branching and Related Problems

[...]

Gregory Gutin¹, Igor Razgon², Eun Jung Kim¹•Institutions (2)

Royal Holloway, University of London¹, University College Cork²

13 Jan 2008-arXiv: Data Structures and Algorithms

TL;DR: It is proved that MinLOB is polynomial-time solvable for acyclic digraphs and the FPT parameterization is fixed-parameter tractable (FPT), which implies that some parameterizations of the two problems are FPT as well.

...read moreread less

Abstract: Given a digraph $D$, the Minimum Leaf Out-Branching problem (MinLOB) is the problem of finding in $D$ an out-branching with the minimum possible number of leaves, i.e., vertices of out-degree 0. We prove that MinLOB is polynomial-time solvable for acyclic digraphs. In general, MinLOB is NP-hard and we consider three parameterizations of MinLOB. We prove that two of them are NP-complete for every value of the parameter, but the third one is fixed-parameter tractable (FPT). The FPT parametrization is as follows: given a digraph $D$ of order $n$ and a positive integral parameter $k$, check whether $D$ contains an out-branching with at most $n-k$ leaves (and find such an out-branching if it exists). We find a problem kernel of order $O(k^2)$ and construct an algorithm of running time $O(2^{O(k\log k)}+n^6),$ which is an `additive' FPT algorithm. We also consider transformations from two related problems, the minimum path covering and the maximum internal out-tree problems into MinLOB, which imply that some parameterizations of the two problems are FPT as well.

...read moreread less

Posted Content•

Spanning directed trees with many leaves

[...]

Noga Alon, Fedor V. Fomin, Gregory Gutin¹, Michael Krivelevich², Saket Saurabh¹ - Show less +1 more•Institutions (2)

Royal Holloway, University of London¹, University of Bergen²

05 Mar 2008-arXiv: Data Structures and Algorithms

TL;DR: In this paper, the authors obtained two combinatorial results on the number of leaves in out-branchings in strongly connected digraphs with minimum in-degree at least 3.

...read moreread less

Abstract: The {\sc Directed Maximum Leaf Out-Branching} problem is to find an out-branching (i.e. a rooted oriented spanning tree) in a given digraph with the maximum number of leaves. In this paper, we obtain two combinatorial results on the number of leaves in out-branchings. We show that - every strongly connected $n$-vertex digraph $D$ with minimum in-degree at least 3 has an out-branching with at least $(n/4)^{1/3}-1$ leaves; - if a strongly connected digraph $D$ does not contain an out-branching with $k$ leaves, then the pathwidth of its underlying graph UG($D$) is $O(k\log k)$. Moreover, if the digraph is acyclic, the pathwidth is at most $4k$. The last result implies that it can be decided in time $2^{O(k\log^2 k)}\cdot n^{O(1)}$ whether a strongly connected digraph on $n$ vertices has an out-branching with at least $k$ leaves. On acyclic digraphs the running time of our algorithm is $2^{O(k\log k)}\cdot n^{O(1)}$.

...read moreread less

Posted Content•

Geodesic Fr\'echet Distance Inside a Simple Polygon

[...]

Atlas F. Cook, Carola Wenk

20 Feb 2008-arXiv: Data Structures and Algorithms

TL;DR: An alternative to parametric search that applies to both the nongeodesic and geodesic Fréchet optimization problems is presented, based on a variant of red-blue intersections, which is appealing due to its elegance and practical efficiency when compared toParametric search.

...read moreread less

Abstract: We unveil an alluring alternative to parametric search that applies to both the non-geodesic and geodesic Fr\'echet optimization problems. This randomized approach is based on a variant of red-blue intersections and is appealing due to its elegance and practical efficiency when compared to parametric search. We present the first algorithm for the geodesic Fr\'echet distance between two polygonal curves $A$ and $B$ inside a simple bounding polygon $P$. The geodesic Fr\'echet decision problem is solved almost as fast as its non-geodesic sibling and requires $O(N^{2\log k)$ time and $O(k+N)$ space after $O(k)$ preprocessing, where $N$ is the larger of the complexities of $A$ and $B$ and $k$ is the complexity of $P$. The geodesic Fr\'echet optimization problem is solved by a randomized approach in $O(k+N^{2\log kN\log N)$ expected time and $O(k+N^{2)$ space. This runtime is only a logarithmic factor larger than the standard non-geodesic Fr\'echet algorithm (Alt and Godau 1995). Results are also presented for the geodesic Fr\'echet distance in a polygonal domain with obstacles and the geodesic Hausdorff distance for sets of points or sets of line segments inside a simple polygon $P$.

...read moreread less

Posted Content•

The 1-fixed-endpoint Path Cover Problem is Polynomial on Interval Graph

[...]

Katerina Asdre, Stavros D. Nikolopoulos

26 Jun 2008-arXiv: Data Structures and Algorithms

TL;DR: It is shown that the 1PC problem can be solved in polynomial time on the class of interval graphs, generalizing the 1HP problem which has been proved to be NP-complete even for small classes of graphs.

...read moreread less

Abstract: We consider a variant of the path cover problem, namely, the $k$-fixed-endpoint path cover problem, or kPC for short, on interval graphs. Given a graph $G$ and a subset $\mathcal{T}$ of $k$ vertices of $V(G)$, a $k$-fixed-endpoint path cover of $G$ with respect to $\mathcal{T}$ is a set of vertex-disjoint paths $\mathcal{P}$ that covers the vertices of $G$ such that the $k$ vertices of $\mathcal{T}$ are all endpoints of the paths in $\mathcal{P}$. The kPC problem is to find a $k$-fixed-endpoint path cover of $G$ of minimum cardinality; note that, if $\mathcal{T}$ is empty the stated problem coincides with the classical path cover problem. In this paper, we study the 1-fixed-endpoint path cover problem on interval graphs, or 1PC for short, generalizing the 1HP problem which has been proved to be NP-complete even for small classes of graphs. Motivated by a work of Damaschke, where he left both 1HP and 2HP problems open for the class of interval graphs, we show that the 1PC problem can be solved in polynomial time on the class of interval graphs. The proposed algorithm is simple, runs in $O(n^2)$ time, requires linear space, and also enables us to solve the 1HP problem on interval graphs within the same time and space complexity.

...read moreread less

Posted Content•

An Improved Randomized Truthful Mechanism for Scheduling Unrelated Machines

[...]

Pinyan Lu¹, Changyuan Yu•Institutions (1)

Tsinghua University¹

20 Feb 2008-arXiv: Data Structures and Algorithms

TL;DR: In this article, the authors improved this result by a 1.6737-approximation randomized truthful mechanism for the case of two machines, and generalize their result to a $0.8368m-based mechanism for task scheduling on unrelated machines.

...read moreread less

Abstract: We study the scheduling problem on unrelated machines in the mechanism design setting. This problem was proposed and studied in the seminal paper (Nisan and Ronen 1999), where they gave a 1.75-approximation randomized truthful mechanism for the case of two machines. We improve this result by a 1.6737-approximation randomized truthful mechanism. We also generalize our result to a $0.8368m$-approximation mechanism for task scheduling with $m$ machines, which improve the previous best upper bound of $0.875m(Mu'alem and Schapira 2007).

...read moreread less

Posted Content•

A new distance for high level RNA secondary structure comparison

[...]

Julien Allali¹, Marie-France Sagot²•Institutions (2)

L'Abri¹, Institut national des sciences Appliquées de Lyon²

22 Oct 2008-arXiv: Data Structures and Algorithms

TL;DR: An algorithm for comparing two RNA secondary structures coded in the form of trees that introduces two new operations, called node fusion and edge fusion, besides the tree edit operations of deletion, insertion, and relabeling classically used in the literature are described.

...read moreread less

Abstract: We describe an algorithm for comparing two RNA secondary structures coded in the form of trees that introduces two new operations, called node fusion and edge fusion, besides the tree edit operations of deletion, insertion, and relabeling classically used in the literature. This allows us to address some serious limitations of the more traditional tree edit operations when the trees represent RNAs and what is searched for is a common structural core of two RNAs. Although the algorithm complexity has an exponential term, this term depends only on the number of successive fusions that may be applied to a same node, not on the total number of fusions. The algorithm remains therefore efficient in practice and is used for illustrative purposes on ribosomal as well as on other types of RNAs.

...read moreread less

Posted Content•

Balanced Families of Perfect Hash Functions and Their Applications

[...]

Noga Alon¹, Shai Gutner¹•Institutions (1)

Tel Aviv University¹

28 May 2008-arXiv: Data Structures and Algorithms

TL;DR: In this paper, it was shown that for any constant δ > 1, a δ-balanced family of perfect hash functions can be constructed in polynomial time in a graph with n vertices.

...read moreread less

Abstract: The construction of perfect hash functions is a well-studied topic. In this paper, this concept is generalized with the following definition. We say that a family of functions from $[n]$ to $[k]$ is a $\delta$-balanced $(n,k)$-family of perfect hash functions if for every $S \subseteq [n]$, $|S|=k$, the number of functions that are 1-1 on $S$ is between $T/\delta$ and $\delta T$ for some constant $T>0$. The standard definition of a family of perfect hash functions requires that there will be at least one function that is 1-1 on $S$, for each $S$ of size $k$. In the new notion of balanced families, we require the number of 1-1 functions to be almost the same (taking $\delta$ to be close to 1) for every such $S$. Our main result is that for any constant $\delta > 1$, a $\delta$-balanced $(n,k)$-family of perfect hash functions of size $2^{O(k \log \log k)} \log n$ can be constructed in time $2^{O(k \log \log k)} n \log n$. Using the technique of color-coding we can apply our explicit constructions to devise approximation algorithms for various counting problems in graphs. In particular, we exhibit a deterministic polynomial time algorithm for approximating both the number of simple paths of length $k$ and the number of simple cycles of size $k$ for any $k \leq O(\frac{\log n}{\log \log \log n})$ in a graph with $n$ vertices. The approximation is up to any fixed desirable relative error.

...read moreread less

Collapse