scispace - formally typeset
Search or ask a question

Showing papers in "ACM Transactions on Algorithms in 2010"


Journal ArticleDOI
Kenneth L. Clarkson1
TL;DR: These results are tied together, stronger convergence results are reviewed, and several coreset bounds are generalized or strengthened.
Abstract: The problem of maximizing a concave function f(x) in the unit simplex Δ can be solved approximately by a simple greedy algorithm. For given k, the algorithm can find a point x(k) on a k-dimensional face of Δ, such that f(x(k) ≥ f(xa) − O(1/k). Here f(xa) is the maximum value of f in Δ, and the constant factor depends on f. This algorithm and analysis were known before, and related to problems of statistics and machine learning, such as boosting, regression, and density mixture estimation. In other work, coming from computational geometry, the existence of ϵ-coresets was shown for the minimum enclosing ball problem by means of a simple greedy algorithm. Similar greedy algorithms, which are special cases of the Frank-Wolfe algorithm, were described for other enclosure problems. Here these results are tied together, stronger convergence results are reviewed, and several coreset bounds are generalized or strengthened.

456 citations


Journal ArticleDOI
TL;DR: It is proved that given an undirected graph G, one can compute, in polynomial time in n, a graph G with at most 4-k vertices and an integer such that G has a feedback vertex set of size at most k.
Abstract: We prove that given an undirected graph G on n vertices and an integer k, one can compute, in polynomial time in n, a graph G′ with at most 4k2 vertices and an integer k′ such that G has a feedback vertex set of size at most k iff G′ has a feedback vertex set of size at most k′. This result improves a previous O(k11) kernel of Burrage et al., and a more recent cubic kernel of Bodlaender. This problem was communicated by Fellows.

204 citations


Journal ArticleDOI
TL;DR: An infinite family of (unweighted) tournaments for which the above algorithm, irrespective of how ties are broken) has an approximation ratio of 5 - ε.
Abstract: We consider the following simple algorithm for feedback arc set problem in weighted tournaments: order the vertices by their weighted indegrees We show that this algorithm has an approximation guarantee of 5 if the weights satisfy probability constraints (for any pair of vertices u and v, wuv+wvu=1) Special cases of the feedback arc set problem in such weighted tournaments include the feedback arc set problem in unweighted tournaments and rank aggregation To complement the upper bound, for any constant e>0, we exhibit an infinite family of (unweighted) tournaments for which the aforesaid algorithm (irrespective of how ties are broken) has an approximation ratio of 5-e

142 citations


Journal ArticleDOI
TL;DR: This article develops a couple of new techniques for constructing (α, β)-spanners and presents an additive (1,6)-spanner of size O, an economical agent that assigns costs and values to paths in the graph, and shows that this path buying algorithm can be parameterized in different ways to yield other sparseness-distortion tradeoffs.
Abstract: An (α, β)-spanner of an unweighted graph G is a subgraph H that distorts distances in G up to a multiplicative factor of α and an additive term β. It is well known that any graph contains a (multiplicative) (2k−1, 0)-spanner of size O(n1+1/k) and an (additive) (1,2)-spanner of size O(n3/2). However no other additive spanners are known to exist.In this article we develop a couple of new techniques for constructing (α, β)-spanners. Our first result is an additive (1,6)-spanner of size O(n4/3). The construction algorithm can be understood as an economical agent that assigns costs and values to paths in the graph, purchasing affordable paths and ignoring expensive ones, which are intuitively well approximated by paths already purchased. We show that this path buying algorithm can be parameterized in different ways to yield other sparseness-distortion tradeoffs. Our second result addresses the problem of which (α, β)-spanners can be computed efficiently, ideally in linear time. We show that, for any k, a (k,k−1)-spanner with size O(kn1+1/k) can be found in linear time, and, further, that in a distributed network the algorithm terminates in a constant number of rounds. Previous spanner constructions with similar performance had roughly twice the multiplicative distortion.

123 citations


Journal ArticleDOI
TL;DR: In this paper, an O(n log 2 n)-time linear-space algorithm was proposed to find the distances from a node s to all nodes in a directed planar graph with positive and negative arc lengths.
Abstract: We give an O(n log2n)-time, linear-space algorithm that, given a directed planar graph with positive and negative arc-lengths, and given a node s, finds the distances from s to all nodes.

117 citations


Journal ArticleDOI
TL;DR: In this paper, a generalization of the k-median problem with respect to an arbitrary dissimilarity measure D was studied, and a linear time (1+ϵ)-approximation algorithm was given for the problem in an arbitrary metric space with bounded doubling dimension.
Abstract: We study a generalization of the k-median problem with respect to an arbitrary dissimilarity measure D Given a finite set P of size n, our goal is to find a set C of size k such that the sum of errors D(P,C) = ∑p i P minc i C {D(p,c)} is minimized The main result in this article can be stated as follows: There exists a (1+ϵ)-approximation algorithm for the k-median problem with respect to D, if the 1-median problem can be approximated within a factor of (1+ϵ) by taking a random sample of constant size and solving the 1-median problem on the sample exactly This algorithm requires time n2O(mklog(mk/ϵ)), where m is a constant that depends only on ϵ and D Using this characterization, we obtain the first linear time (1+ϵ)-approximation algorithms for the k-median problem in an arbitrary metric space with bounded doubling dimension, for the Kullback-Leibler divergence (relative entropy), for the Itakura-Saito divergence, for Mahalanobis distances, and for some special cases of Bregman divergences Moreover, we obtain previously known results for the Euclidean k-median problem and the Euclidean k-means problem in a simplified manner Our results are based on a new analysis of an algorithm of Kumar et al [2004]

107 citations


Journal ArticleDOI
TL;DR: In this article, it was shown that for any fixed w ≥ 1, there is a polynomial-time algorithm that, given a hypergraph H with fractional hypertree width at most w, computes a fractional tree decomposition of width O(w3) for H.
Abstract: Fractional hypertree width is a hypergraph measure similar to tree width and hypertree width. Its algorithmic importance comes from the fact that, as shown in previous work, Constraint Satisfaction Problems (CSP) and various problems in database theory are polynomial-time solvable if the input contains a bounded-width fractional hypertree decomposition of the hypergraph of the constraints. In this article, we show that for every fixed w ≥ 1, there is a polynomial-time algorithm that, given a hypergraph H with fractional hypertree width at most w, computes a fractional hypertree decomposition of width O(w3) for H. This means that polynomial-time algorithms relying on bounded-width fractional hypertree decompositions no longer need to be given a decomposition explicitly in the input, since an appropriate decomposition can be computed in polynomial time. Therefore, if H is a class of hypergraphs with bounded fractional hypertree width, then a CSP restricted to instances whose structure is in H is polynomial-time solvable. This makes bounded fractional hypertree width the most general known hypergraph property that makes CSP, Boolean conjunctive queries, and conjunctive query containment polynomial-time solvable.

106 citations


Journal ArticleDOI
TL;DR: A simple algorithmic model for massive, unordered, distributed (mud) computation, as implemented by Google's MapReduce and Apache's Hadoop, and it is shown that in principle, mud algorithms are equivalent in power to symmetric streaming algorithms.
Abstract: A common approach for dealing with large datasets is to stream over the input in one pass, and perform computations using sublinear resources. For truly massive datasets, however, even making a single pass over the data is prohibitive. Therefore, streaming computations must be distributed over many machines. In practice, obtaining significant speedups using distributed computation has numerous challenges including synchronization, load balancing, overcoming processor failures, and data distribution. Successful systems in practice such as Google's MapReduce and Apache's Hadoop address these problems by only allowing a certain class of highly distributable tasks defined by local computations that can be applied in any order to the input.The fundamental question that arises is: How does the class of computational tasks supported by these systems differ from the class for which streaming solutions existqWe introduce a simple algorithmic model for massive, unordered, distributed (mud) computation, as implemented by these systems. We show that in principle, mud algorithms are equivalent in power to symmetric streaming algorithms. More precisely, we show that any symmetric (order-invariant) function that can be computed by a streaming algorithm can also be computed by a mud algorithm, with comparable space and communication complexity. Our simulation uses Savitch's theorem and therefore has superpolynomial time complexity. We extend our simulation result to some natural classes of approximate and randomized streaming algorithms. We also give negative results, using communication complexity arguments to prove that extensions to private randomness, promise problems, and indeterminate functions are impossible. We also introduce an extension of the mud model to multiple keys and multiple rounds.

103 citations


Journal ArticleDOI
TL;DR: This work proposes a new method for anonymizing data records, where quasi-identifiers of data records are first clustered and then cluster centers are published, and extends algorithms to allow an ε fraction of points to remain unclustered, that is, deleted from the anonymized publication.
Abstract: Publishing data for analysis from a table containing personal records, while maintaining individual privacy, is a problem of increasing importance today. The traditional approach of deidentifying records is to remove identifying fields such as social security number, name, etc. However, recent research has shown that a large fraction of the U.S. population can be identified using nonkey attributes (called quasi-identifiers) such as date of birth, gender, and zip code. The k-anonymity model protects privacy via requiring that nonkey attributes that leak information are suppressed or generalized so that, for every record in the modified table, there are at least k−1 other records having exactly the same values for quasi-identifiers. We propose a new method for anonymizing data records, where quasi-identifiers of data records are first clustered and then cluster centers are published. To ensure privacy of the data records, we impose the constraint that each cluster must contain no fewer than a prespecified number of data records. This technique is more general since we have a much larger choice for cluster centers than k-anonymity. In many cases, it lets us release a lot more information without compromising privacy. We also provide constant factor approximation algorithms to come up with such a clustering. This is the first set of algorithms for the anonymization problem where the performance is independent of the anonymity parameter k. We further observe that a few outlier points can significantly increase the cost of anonymization. Hence, we extend our algorithms to allow an e fraction of points to remain unclustered, that is, deleted from the anonymized publication. Thus, by not releasing a small fraction of the database records, we can ensure that the data published for analysis has less distortion and hence is more useful. Our approximation algorithms for new clustering objectives are of independent interest and could be applicable in other clustering scenarios as well.

90 citations


Journal ArticleDOI
TL;DR: In this paper, it was shown that the minimum length-bounded cut problem is NP-hard to approximate within a factor of 1.1377 for L ≥ 5 in the case of node-cuts and for L≥ 4 in the cases of edge-cuts.
Abstract: For a given number L, an L-length-bounded edge-cut (node-cut, respectively) in a graph G with source s and sink t is a set C of edges (nodes, respectively) such that no s-t-path of length at most L remains in the graph after removing the edges (nodes, respectively) in C. An L-length-bounded flow is a flow that can be decomposed into flow paths of length at most L. In contrast to classical flow theory, we describe instances for which the minimum L-length-bounded edge-cut (node-cut, respectively) is Θ(n2/3)-times (Θ(√n)-times, respectively) larger than the maximum L-length-bounded flow, where n denotes the number of nodes; this is the worst case. We show that the minimum length-bounded cut problem is NP-hard to approximate within a factor of 1.1377 for L≥ 5 in the case of node-cuts and for L≥ 4 in the case of edge-cuts. We also describe algorithms with approximation ratio O(minlL,n/Lr) ⊆ O√n in the node case and O(min lL,n2/L2,√mr ⊆ O2/3 in the edge case, where m denotes the number of edges. Concerning L-length-bounded flows, we show that in graphs with unit-capacities and general edge lengths it is NP-complete to decide whether there is a fractional length-bounded flow of a given value. We analyze the structure of optimal solutions and present further complexity results.

82 citations


Journal ArticleDOI
TL;DR: In this paper, the authors give a (2−1/t)-approximation algorithm for Minimum Vertex Cover, which also works when a t-interval representation of the given graph is absent.
Abstract: Multiple-interval graphs are a natural generalization of interval graphs where each vertex may have more then one interval associated with it. We initiate the study of optimization problems in multiple-interval graphs by considering three classical problems: Minimum Vertex Cover, Minimum Dominating Set, and Maximum Clique. We describe applications for each one of these problems, and then proceed to discuss approximation algorithms for them.Our results can be summarized as follows: Let t be the number of intervals associated with each vertex in a given multiple-interval graph. For Minimum Vertex Cover, we give a (2−1/t)-approximation algorithm which also works when a t-interval representation of our given graph is absent. Following this, we give a t2-approximation algorithm for Minimum Dominating Set which adapts well to more general variants of the problem. We then proceed to prove that Maximum Clique is NP-hard already for 3-interval graphs, and provide a (t2−t+1)/2-approximation algorithm for general values of t g 2, using bounds proven for the so-called transversal number of t-interval families.

Journal ArticleDOI
TL;DR: This is the first study of the efficiency of taxes in atomic congestion games and shows how to compute taxes in time polynomial in the size of the game by solving convex quadratic programs.
Abstract: We study congestion games where players aim to access a set of resources. Each player has a set of possible strategies and each resource has a function associating the latency it incurs to the players using it. Players are non--cooperative and each wishes to follow a strategy that minimizes her own latency with no regard to the global optimum. Previous work has studied the impact of this selfish behavior on system performance. In this article, we study the question of how much the performance can be improved if players are forced to pay taxes for using resources. Our objective is to extend the original game so that selfish behavior does not deteriorate performance. We consider atomic congestion games with linear latency functions and present both negative and positive results. Our negative results show that optimal system performance cannot be achieved even in very simple games. On the positive side, we show that there are ways to assign taxes that can improve the performance of linear congestion games by forcing players to follow strategies where the total latency suffered is within a factor of 2 of the minimum possible; this result is shown to be tight. Furthermore, even in cases where in the absence of taxes the system behavior may be very poor, we show that the total disutility of players (latency plus taxes) is not much larger than the optimal total latency. Besides existential results, we show how to compute taxes in time polynomial in the size of the game by solving convex quadratic programs. Similar questions have been extensively studied in the model of non-atomic congestion games. To the best of our knowledge, this is the first study of the efficiency of taxes in atomic congestion games.

Journal ArticleDOI
TL;DR: This work designs a deterministic algorithm with a competitive ratio of 7/4 for the one-dimensional case, and provides the first non-trivial deterministic lower bound, improve the randomized lower Bound, and prove the first lower bounds for higher dimensions.
Abstract: We continue the study of the online unit clustering problem, introduced by Chan and Zarrabi-Zadeh (Workshop on Approximation and Online Algorithms 2006, LNCS 4368, p. 121--131. Springer, 2006). We design a deterministic algorithm with a competitive ratio of 7/4 for the one-dimensional case. This is the first deterministic algorithm that beats the bound of 2. It also has a better competitive ratio than the previous randomized algorithms. Moreover, we provide the first non-trivial deterministic lower bound, improve the randomized lower bound, and prove the first lower bounds for higher dimensions.

Journal ArticleDOI
TL;DR: In this article, the authors considered the lower-bounded facility location problem with lower bound constraints for the number of clients assigned to a facility in the case that this facility is opened.
Abstract: We study the lower-bounded facility location problem which generalizes the classical uncapacitated facility location problem in that it comes with lower bound constraints for the number of clients assigned to a facility in the case that this facility is opened. This problem was introduced independently in the papers by Karger and Minkoff [2000] and by Guha et al. [2000], both of which give bicriteria approximation algorithms for it. These bicriteria algorithms come within a constant factor of the optimal solution cost, but they also violate the lower bound constraints by a constant factor. Our result in this article is the first true approximation algorithm for the lower-bounded facility location problem which respects the lower bound constraints and achieves a constant approximation ratio for the objective function. The main technical idea for the design of the algorithm is a reduction to the capacitated facility location problem, which has known constant-factor approximation algorithms.

Journal ArticleDOI
TL;DR: The first nontrivial lower bounds on time-space trade-offs for the selection problem are established, and deterministic lower bounds for I/O-efficient algorithms as well are got.
Abstract: We establish the first nontrivial lower bounds on time-space trade-offs for the selection problem. We prove that any comparison-based randomized algorithm for finding the median requires Ω(nlog logSn) expected time in the RAM model (or more generally in the comparison branching program model), if we have S bits of extra space besides the read-only input array. This bound is tight for all S > log n, and remains true even if the array is given in a random order. Our result thus answers a 16-year-old question of Munro and Raman l1996r, and also complements recent lower bounds that are restricted to sequential access, as in the multipass streaming model lChakrabarti et al. 2008br.We also prove that any comparison-based, deterministic, multipass streaming algorithm for finding the median requires Ω(nloga(n/s)+ nlogsn) worst-case time (in scanning plus comparisons), if we have s cells of space. This bound is also tight for all s >log2n. We get deterministic lower bounds for I/O-efficient algorithms as well.The proofs in this article are self-contained and do not rely on communication complexity techniques.

Journal ArticleDOI
TL;DR: A space lower bound of Ω(ε−2/log2 (ε) is shown, demonstrating that the algorithm is near-optimal in terms of its dependency on ε, and it is shown that generalizing to multiplicative-approximation of the kth-order entropy requires close to linear space for
Abstract: We describe a simple algorithm for approximating the empirical entropy of a stream of m values up to a multiplicative factor of (1+e) using a single pass, O(e−2 log (δ−1) log m) words of space, and O(log e−1 + log log δ−1 + log log m) processing time per item in the stream. Our algorithm is based upon a novel extension of a method introduced by Alon et al. [1999]. This improves over previous work on this problem. We show a space lower bound of Ω(e−2/log2 (e−1)), demonstrating that our algorithm is near-optimal in terms of its dependency on e.We show that generalizing to multiplicative-approximation of the kth-order entropy requires close to linear space for k≥1. In contrast we show that additive-approximation is possible in a single pass using only poly-logarithmic space. Lastly, we show how to compute a multiplicative approximation to the entropy of a random walk on an undirected graph.

Journal ArticleDOI
TL;DR: The Compressed Permuterm Index is proposed which solves the Tolerant Retrieval problem in time proportional to the length of the searched pattern, and space close to the kth order empirical entropy of the indexed dictionary.
Abstract: The Permuterm index lGarfield 1976r is a time-efficient and elegant solution to the string dictionary problem in which pattern queries may possibly include one wild-card symbol (called Tolerant Retrieval problem). Unfortunately the Permuterm index is space inefficient because it quadruples the dictionary size. In this article we propose the Compressed Permuterm Index which solves the Tolerant Retrieval problem in time proportional to the length of the searched pattern, and space close to the kth order empirical entropy of the indexed dictionary. We also design a dynamic version of this index that allows to efficiently manage insertion in, and deletion from, the dictionary of individual strings.The result is based on a simple variant of the Burrows-Wheeler Transform, defined on a dictionary of strings of variable length, that allows to efficiently solve the Tolerant Retrieval problem via known (dynamic) compressed indexes lNavarro and Makinen 2007r. We will complement our theoretical study with a significant set of experiments that show that the Compressed Permuterm Index supports fast queries within a space occupancy that is close to the one achievable by compressing the string dictionary via gzip or bzip. This improves known approaches based on Front-Coding lWitten et al. 1999r by more than 50p in absolute space occupancy, still guaranteeing comparable query time.

Journal ArticleDOI
TL;DR: This article focuses on an interesting subclass of allocation algorithms, the task-independent algorithms, and gives a lower bound of n+1/2, that holds for every (not only monotone) allocation algorithm that takes independent decisions.
Abstract: Scheduling on unrelated machines is one of the most general and classical variants of the task scheduling problem. Fractional scheduling is the LP-relaxation of the problem, which is polynomially solvable in the nonstrategic setting, and is a useful tool to design deterministic and randomized approximation algorithms.The mechanism design version of the scheduling problem was introduced by Nisan and Ronen. In this article, we consider the mechanism design version of the fractional variant of this problem. We give lower bounds for any fractional truthful mechanism. Our lower bounds also hold for any (randomized) mechanism for the integral case. In the positive direction, we propose a truthful mechanism that achieves approximation 3/2 for 2 machines, matching the lower bound. This is the first new tight bound on the approximation ratio of this problem, after the tight bound of 2, for 2 machines, obtained by Nisan and Ronen. For n machines, our mechanism achieves an approximation ratio of n+1/2.Motivated by the fact that all the known deterministic and randomized mechanisms for the problem assign each task independently from the others, we focus on an interesting subclass of allocation algorithms, the task-independent algorithms. We give a lower bound of n+1/2, that holds for every (not only monotone) allocation algorithm that takes independent decisions. Under this consideration, our truthful independent mechanism is the best that we can hope from this family of algorithms.

Journal ArticleDOI
TL;DR: This article studies randomized sublinear algorithms that approximate the Hamming distance between a given function and the closest monotone function, and presents algorithms for distance approximation to monotonicity of functions over one dimension, two dimensions, and the k-dimensional hypercube.
Abstract: In this article we study the problem of approximating the distance of a function f: [n]d → R to monotonicity where [n] = {1,…,n} and R is some fully ordered range. Namely, we are interested in randomized sublinear algorithms that approximate the Hamming distance between a given function and the closest monotone function. We allow both an additive error, parameterized by δ, and a multiplicative error.Previous work on distance approximation to monotonicity focused on the one-dimensional case and the only explicit extension to higher dimensions was with a multiplicative approximation factor exponential in the dimension d. Building on Goldreich et al. [2000] and Dodis et al. [1999], in which there are better implicit results for the case n=2, we describe a reduction from the case of functions over the d-dimensional hypercube [n]d to the case of functions over the k-dimensional hypercube [n]k, where 1≤ k≤ d. The quality of estimation that this reduction provides is linear in ⌈ d/k ⌉ and logarithmic in the size of the range | R | (if the range is infinite or just very large, then log | R | can be replaced by d log n). Using this reduction and a known distance approximation algorithm for the one-dimensional case, we obtain a distance approximation algorithm for functions over the d-dimensional hypercube, with any range R, which has a multiplicative approximation factor of O(dlog | R |).For the case of a binary range, we present algorithms for distance approximation to monotonicity of functions over one dimension, two dimensions, and the k-dimensional hypercube (for any k≥ 1). Applying these algorithms and the reduction described before, we obtain a variety of distance approximation algorithms for Boolean functions over the d-dimensional hypercube which suggest a trade-off between quality of estimation and efficiency of computation. In particular, the multiplicative error ranges between O(d) and O(1).

Journal ArticleDOI
TL;DR: In this paper, a polylogarithmic protocol for Byzantine agreement and leader election in the asynchronous full information model with a nonadaptive malicious adversary was proposed, and it is shown to tolerate up to (1/3 − ϵ) ⋅ n faulty processors.
Abstract: We resolve two long-standing open problems in distributed computation by describing polylogarithmic protocols for Byzantine agreement and leader election in the asynchronous full information model with a nonadaptive malicious adversary. All past protocols for asynchronous Byzantine agreement had been exponential, and no protocol for asynchronous leader election had been known. Our protocols tolerate up to (1/3 − ϵ) ⋅ n faulty processors, for any positive constant ϵ. They are Monte Carlo, succeeding with probability 1 − o(1) for Byzantine agreement, and constant probability for leader election. A key technical contribution of our article is a new approach for emulating Feige's lightest bin protocol, even with adversarial message scheduling.

Journal ArticleDOI
TL;DR: An automata-theoretic approach is described that is able to solve problems about the competitive ratio of online algorithms, and the memory they require, by reducing them to questions about determinization and approximated determinization of weighted automata.
Abstract: We describe an automata-theoretic approach for the competitive analysis of online algorithms. Our approach is based on weighted automata, which assign to each input word a cost in R≥0. By relating the “unbounded look ahead” of optimal offline algorithms with nondeterminism, and relating the “no look ahead” of online algorithms with determinism, we are able to solve problems about the competitive ratio of online algorithms, and the memory they require, by reducing them to questions about determinization and approximated determinization of weighted automata.

Journal ArticleDOI
TL;DR: An upper bound of 2 log 2 is established on the number of bits used in a label in a labeling scheme for the vertex connectivity function on general graphs.
Abstract: This article studies labeling schemes for the vertex connectivity function on general graphs. We consider the problem of assigning short labels to the nodes of any n-node graph is such a way that given the labels of any two nodes u and v, one can decide whether u and v are k-vertex connected in G, that is, whether there exist k vertex disjoint paths connecting u and v. This article establishes an upper bound of k2log n on the number of bits used in a label. The best previous upper bound for the label size of such a labeling scheme is 2klog n.

Journal ArticleDOI
TL;DR: In this paper, an O(min{√n⋅log k, √k,√k})-approximation algorithm for the k-forest problem was given.
Abstract: The k-forest problem is a common generalization of both the k-MST and the dense-k-subgraph problems. Formally, given a metric space on n vertices V, with m demand pairs ⊆ V × V and a “target” k≤ m, the goal is to find a minimum cost subgraph that connects at leastk pairs. In this paper, we give an O(min{√n⋅log k,√k})-approximation algorithm for k-forest, improving on the previous best ratio of O(min {n2/3,√m}log n) by Segev and Segev.We then apply our algsorithm for k-forest to obtain approximation algorithms for several Dial-a-Ride problems. The basic Dial-a-Ride problem is the following: given an n point metric space with m objects each with its own source and destination, and a vehicle capable of carrying at mostk objects at any time, find the minimum length tour that uses this vehicle to move each object from its source to destination. We want that the tour be non-preemptive: that is, each object, once picked up at its source, is dropped only at its destination. We prove that an α-approximation algorithm for the k-forest problem implies an O(α⋅log2n)-approximation algorithm for Dial-a-Ride. Using our results for k-forest, we get an O(min{√n,√k}⋅log2n)-approximation algorithm for Dial-a-Ride. The only previous result known for Dial-a-Ride was an O(√klog n)-approximation by Charikar and Raghavachari; our results give a different proof of a similar approximation guarantee—in fact, when the vehicle capacity k is large, we give a slight improvement on their results. The reduction from Dial-a-Ride to the k-forest problem is fairly robust, and allows us to obtain approximation algorithms (with the same guarantee) for some interesting generalizations of Dial-a-Ride.

Journal ArticleDOI
TL;DR: Here the hierarchical facility costs model multilevel service installation is considered, and a constant factor approximation algorithm is given, independent of the number of levels, for the case of identical costs on all facilities.
Abstract: We introduce a facility location problem with submodular facility cost functions, and give an O(log n) approximation algorithm for it. Then we focus on a special case of submodular costs, called hierarchical facility costs, and give a (4.237 + ϵ)-approximation algorithm using local search. The hierarchical facility costs model multilevel service installation. Shmoys et al. [2004] gave a constant factor approximation algorithm for a two-level version of the problem. Here we consider a multilevel problem, and give a constant factor approximation algorithm, independent of the number of levels, for the case of identical costs on all facilities.

Journal ArticleDOI
TL;DR: It is shown that a simple recursion describes the Hamilton cycle and that the cycle can be generated by an iterative algorithm that uses O(n) space.
Abstract: We show how to construct an explicit Hamilton cycle in the directed Cayley graph C→({σn, σn-1}: Sn), where σk is the rotation (1 2 … k). The existence of such cycles was shown by Jackson [1996] but the proof only shows that a certain directed graph is Eulerian, and Knuth [2005] asks for an explicit construction. We show that a simple recursion describes our Hamilton cycle and that the cycle can be generated by an iterative algorithm that uses O(n) space. Moreover, the algorithm produces each successive edge of the cycle in constant time; such algorithms are said to be loopless. Finally, our Hamilton cycle can be used to construct an explicit universal cycle for the (n-1)-permutations of a n-set, or as the basis of an efficient algorithm for generating every n-permutation of an n-set within a circular array or linked list.

Journal ArticleDOI
TL;DR: In this article, the authors presented an alternative to parametric search that applies to both the nongeodesic and geodesic Frechet optimization problems, which is based on a variant of red-blue intersections.
Abstract: We present an alternative to parametric search that applies to both the nongeodesic and geodesic Frechet optimization problems. This randomized approach is based on a variant of red-blue intersections and is appealing due to its elegance and practical efficiency when compared to parametric search.We introduce the first algorithm to compute the geodesic Frechet distance between two polygonal curves A and B inside a simple bounding polygon P. The geodesic Frechet decision problem is solved almost as fast as its nongeodesic sibling in O(N2 log k) time and O(k+N) space after O(k) preprocessing, where N is the larger of the complexities of A and B and k is the complexity of P. The geodesic Frechet optimization problem is solved by a randomized approach in O(k+N2 log kN log N) expected time and O(k+N2) space. This runtime is only a logarithmic factor larger than the standard nongeodesic Frechet algorithm lAlt and Godau 1995r. Results are also presented for the geodesic Frechet distance in a polygonal domain with obstacles and the geodesic Hausdorff distance for sets of points or sets of line segments inside a simple polygon P.

Journal ArticleDOI
TL;DR: An O(1)-approximation algorithm for the maximum leaf spanning arborescence problem, where opt is the number of leaves in an optimal spanningArborescence, is presented, which gives the bound for general directed graphs.
Abstract: We present an O(√opt)-approximation algorithm for the maximum leaf spanning arborescence problem, where opt is the number of leaves in an optimal spanning arborescence. The result is based upon an O(1)-approximation algorithm for a special class of directed graphs called willows. Incorporating the method for willow graphs as a subroutine in a local improvement algorithm gives the bound for general directed graphs.

Journal ArticleDOI
TL;DR: It is obtained that the class of planar graphs belongs to category (1), in contrast to that, outerplanar and series-parallel graphs belong to categories (2) and (3).
Abstract: Let C be a class of labeled connected graphs, and let Cn be a graph drawn uniformly at random from graphs in C that contain exactly n vertices. Denote by b(e; Cn) the number of blocks (i.e., maximal biconnected subgraphs) of Cn that contain exactly e vertices, and let lb(Cn) be the number of vertices in a largest block of Cn. We show that under certain general assumptions on C, Cn belongs with high probability to one of the following categories:(1) lb(Cn) ∼ cn, for some explicitly given c = c(C), and the second largest block is of order nα, where 1 > α = α(C), or(2) lb(Cn) = O(log n), that is, all blocks contain at most logarithmically many vertices.Moreover, in both cases we show that the quantity b(e; Cn) is concentrated for all e and we determine its expected value. As a corollary we obtain that the class of planar graphs belongs to category (1). In contrast to that, outerplanar and series-parallel graphs belong to category (2).

Journal ArticleDOI
TL;DR: The new notion of balanced families, which requires the number of 1-1 functions to be almost the same (taking Δ to be close to 1) for every such family of perfect hash functions, is generalized.
Abstract: The construction of perfect hash functions is a well-studied topic. In this article, this concept is generalized with the following definition. We say that a family of functions from [n] to [k] is a δ-balanced (n,k)-family of perfect hash functions if for every S ⊆ [n], v S v=k, the number of functions that are 1-1 on S is between T/δ and δ T for some constant T>0. The standard definition of a family of perfect hash functions requires that there will be at least one function that is 1-1 on S, for each S of size k. In the new notion of balanced families, we require the number of 1-1 functions to be almost the same (taking δ to be close to 1) for every such S. Our main result is that for any constant δ > 1, a δ-balanced (n,k)-family of perfect hash functions of size 2O(k log log k) log n can be constructed in time 2O(k log log k)n log n. Using the technique of color-coding we can apply our explicit constructions to devise approximation algorithms for various counting problems in graphs. In particular, we exhibit a deterministic polynomial-time algorithm for approximating both the number of simple paths of length k and the number of simple cycles of size k for any k ≤ O(log n/log log log n) in a graph with n vertices. The approximation is up to any fixed desirable relative error.

Journal ArticleDOI
TL;DR: This article begins the study of the problem of optimizing the minimum value in the presence of observable distributions, shows that this problem is NP-Hard, and presents greedy algorithms with good performance bounds.
Abstract: In several systems applications, parameters such as load are known only with some associated uncertainty, which is specified, or modeled, as a distribution over values. The performance of the system optimization and monitoring schemes can be improved by spending resources such as time or bandwidth in observing or resolving the values of these parameters. In a resource-constrained situation, deciding which parameters to observe in order to best optimize the expected system performance (or in general, optimize the expected value of a certain objective function) itself becomes an interesting optimization problem.In this article, we initiate the study of such problems that we term “model-driven optimization”. In particular, we study the problem of optimizing the minimum value in the presence of observable distributions. We show that this problem is NP-Hard, and present greedy algorithms with good performance bounds. The proof of the performance bounds are via novel sub-modularity arguments and connections to covering integer programs.