scispace - formally typeset
Search or ask a question

Showing papers in "Algorithmica in 2001"


Journal ArticleDOI
TL;DR: An approximation algorithm is developed for the problem of computing the dense k -vertex subgraph of a given graph, namely, the subgraph with the most edges, with approximation ratio O(nδ) , for some δ < 1/3 .
Abstract: This paper considers the problem of computing the dense k -vertex subgraph of a given graph, namely, the subgraph with the most edges. An approximation algorithm is developed for the problem, with approximation ratio O(n δ ) , for some δ < 1/3 .

596 citations


Journal ArticleDOI
TL;DR: In this paper, a new Reactive Local Search (RLS) algorithm is proposed for the solution of the maximum-clique problem, which is based on local search complemented by a feedback (history-sensitive) scheme to determine the amount of diversification.
Abstract: A new Reactive Local Search (\RLS ) algorithm is proposed for the solution of the Maximum-Clique problem. \RLS is based on local search complemented by a feedback (history-sensitive) scheme to determine the amount of diversification. The reaction acts on the single parameter that decides the temporary prohibition of selected moves in the neighborhood, in a manner inspired by Tabu Search. The performance obtained in computational tests appears to be significantly better with respect to all algorithms tested at the the second DIMACS implementation challenge. The worst-case complexity per iteration of the algorithm is O(max {n,m}) where n and m are the number of nodes and edges of the graph. In practice, when a vertex is moved, the number of operations tends to be proportional to its number of missing edges and therefore the iterations are particularly fast in dense graphs.

262 citations


Journal ArticleDOI
TL;DR: An O(n5log n) -time algorithm for determining whether for some translated copy the resemblance gets below a given ρ is presented, thus improving the previous result of Alt, Mehlhorn, Wagener, and Welzl by a factor of almost n.
Abstract: Let A and B be two sets of n objects in \reals d , and let Match be a (one-to-one) matching between A and B . Let min(Match ), max(Match ), and Σ(Match) denote the length of the shortest edge, the length of the longest edge, and the sum of the lengths of the edges of Match , respectively. Bottleneck matching— a matching that minimizes max(Match )— is suggested as a convenient way for measuring the resemblance between A and B . Several algorithms for computing, as well as approximating, this resemblance are proposed. The running time of all the algorithms involving planar objects is roughly O(n 1.5 ) . For instance, if the objects are points in the plane, the running time of the exact algorithm is O(n 1.5 log n ) . A semidynamic data structure for answering containment problems for a set of congruent disks in the plane is developed. This data structure may be of independent interest. Next, the problem of finding a translation of B that maximizes the resemblance to A under the bottleneck matching criterion is considered. When A and B are point-sets in the plane, an O(n 5 log n) -time algorithm for determining whether for some translated copy the resemblance gets below a given ρ is presented, thus improving the previous result of Alt, Mehlhorn, Wagener, and Welzl by a factor of almost n . This result is used to compute the smallest such ρ in time O(n 5 log 2 n ) , and an efficient approximation scheme for this problem is also given. The uniform matching problem (also called the balanced assignment problem, or the fair matching problem) is to find Match * U , a matching that minimizes max (Match)-min(Match) . A minimum deviation matching Match * D is a matching that minimizes (1/n)Σ(Match) - min(Match) . Algorithms for computing Match * U and Match * D in roughly O(n 10/3 ) time are presented. These algorithms are more efficient than the previous O(n 4 ) -time algorithms of Martello, Pulleyblank, Toth, and de Werra, and of Gupta and Punnen, who studied these problems for general bipartite graphs.

187 citations


Journal ArticleDOI
TL;DR: It is shown that a simple threat-based strategy is optimal and its competitive ratio is determined which yields, for realistic values of the problem parameters, surprisingly low competitive ratios.
Abstract: This paper is concerned with the time series search and one-way trading problems. In the (time series) search problem a player is searching for the maximum (or minimum) price in a sequence that unfolds sequentially, one price at a time. Once during this game the player can decide to accept the current price p in which case the game ends and the player's payoff is p . In the one-way trading problem a trader is given the task of trading dollars to yen. Each day, a new exchange rate is announced and the trader must decide how many dollars to convert to yen according to the current rate. The game ends when the trader trades his entire dollar wealth to yen and his payoff is the number of yen acquired.

179 citations


Journal ArticleDOI
TL;DR: A new quite general model for branching dynamical systems is introduced and the contraction method can be applied in this model and this model includes many classical examples of random trees and gives a general frame for further applications.
Abstract: In this paper we give an introduction to the analysis of algorithms by the contraction method. By means of this method several interesting classes of recursions can be analyzed as particular cases of our general framework. We introduce the main steps of this technique which is based on contraction properties of the algorithm with respect to suitable probability metrics. Typically the limiting distribution is characterized as a fixed point of a limiting operator on the class of probability distributions. We explain this method in the context of several “divide and conquer” algorithms. In the second part of the paper we introduce a new quite general model for branching dynamical systems and explain that the contraction method can be applied in this model. This model includes many classical examples of random trees and gives a general frame for further applications.

177 citations


Journal ArticleDOI
TL;DR: This paper proves that if the input polygon has no holes, there is a constant δ >0 such that no polynomial time algorithm can achieve an approximation ratio of 1+δ, for each of these guard problems, and shows inapproximability for the POINT GUARD problem for polygons with holes.
Abstract: Past research on art gallery problems has concentrated almost exclusively on bounds on the numbers of guards needed in the worst case in various settings. On the complexity side, fewer results are available. For instance, it has long been known that placing a smallest number of guards for a given input polygon is NP -hard. In this paper we initiate the study of the approximability of several types of art gallery problems.

166 citations


Journal ArticleDOI
TL;DR: This paper considers the problem of efficiently serving a sequence of requests presented in an on-line fashion located at points of a metric space, called the On-Line Travelling Salesman Problem, and derives a lower bound on the competitive ratio of 2 on the real line.
Abstract: In this paper the problem of efficiently serving a sequence of requests presented in an on-line fashion located at points of a metric space is considered. We call this problem the On-Line Travelling Salesman Problem (OLTSP). It has a variety of relevant applications in logistics and robotics. We consider two versions of the problem. In the first one the server is not required to return to the departure point after all presented requests have been served. For this problem we derive a lower bound on the competitive ratio of 2 on the real line. Besides, a 2.5 -competitive algorithm for a wide class of metric spaces, and a 7/3 -competitive algorithm for the real line are provided. For the other version of the problem, in which returning to the departure point is required, we present an optimal 2 -competitive algorithm for the above-mentioned general class of metric spaces. If in this case the metric space is the real line we present a 1.75 -competitive algorithm that compares with a \approx 1.64 lower bound.

154 citations


Journal ArticleDOI
TL;DR: A hierarchical data structure for representing a digital terrain which contains approximations of the terrain at different levels of detail based on triangulations of the underlying two-dimensional space using right-angled triangles is described.
Abstract: We describe a hierarchical data structure for representing a digital terrain (height field) which contains approximations of the terrain at different levels of detail. The approximations are based on triangulations of the underlying two-dimensional space using right-angled triangles. The methods we discuss permit a single approximation to have a varying level of approximation accuracy across the surface. Thus, for example, the area close to an observer may be represented with greater detail than areas which lie outside their field of view. We discuss the application of this hierarchical data structure to the problem of interactive terrain visualization. We point out some of the advantages of this method in terms of memory usage and speed.

153 citations


Journal ArticleDOI
TL;DR: An analysis is given of three major representations of tries in the form of array-tries, list tries, and bst-Tries (“ternary search tries”) to determine the probabilistic behaviour of the main parameters.
Abstract: Digital trees, also known as tries, are a general purpose flexible data structure that implements dictionaries built on sets of words. An analysis is given of three major representations of tries in the form of array-tries, list tries, and bst-tries (“ternary search tries”). The size and the search costs of the corresponding representations are analysed precisely in the average case, while a complete distributional analysis of the height of tries is given. The unifying data model used is that of dynamical sources and it encompasses classical models like those of memoryless sources with independent symbols, of finite Markov chains, and of nonuniform densities. The probabilistic behaviour of the main parameters, namely, size, path length, or height, appears to be determined by two intrinsic characteristics of the source: the entropy and the probability of letter coincidence. These characteristics are themselves related in a natural way to spectral properties of specific transfer operators of the Ruelle type.

130 citations


Journal ArticleDOI
TL;DR: A method of selection based on the typification principle that creates a result with fewer objects, but preserves the initial pattern of distribution in order to preserve similarities and differences between the groups with regard to density, size and orientation of buildings.
Abstract: Cartographic generalization aims to represent geographical information on a map whose specifications are different from those of the original database. Generalization often implies scale reduction, which generates legibility problems. To be readable at smaller scale, geographical objects often need to be enlarged, which generates problems of overlapping features or map congestion. To manage this problem with respect to buildings, we present a method of selection based on the typification principle that creates a result with fewer objects, but preserves the initial pattern of distribution. For this we use a graph of proximity on the building set, which is analysed and segmented with respect to various criteria, taken from gestalt theory. This analysis provides geographical information that is attached to each group of buildings such as the mean size of buildings, shape of the group, and density. This information is independent of scale. The information from the analysis stage is used to define methods to represent them at the target scale. The aim is to preserve the pattern as far as possible, preserve similarities and differences between the groups with regard to density, size and orientation of buildings. We present some results that have been obtained using the platform Stratege, developed in the COGIT laboratory at the Institut Geographique National, Paris.

129 citations


Journal ArticleDOI
Guy Kortsarz1
TL;DR: It is proved that for every fixed k, approximation of the spanner problem is at least as hard as approximating the set-cover problem.
Abstract: A k -spanner of a connected graph G=(V,E) is a subgraph G' consisting of all the vertices of V and a subset of the edges, with the additional property that the distance between any two vertices in G' is larger than the distance in G by no more than a factor of k . This paper concerns the hardness of finding spanners with a number of edges close to the optimum. It is proved that for every fixed k , approximating the spanner problem is at least as hard as approximating the set-cover problem. We also consider a weighted version of the spanner problem, and prove an essential difference between the approximability of the case k=2 and the case k\geq 5 .

Journal ArticleDOI
TL;DR: This work describes an efficient algorithm to multicolor optimally any weighted even or odd length cycle representing a cellular network, and demonstrates an approximation algorithm which guarantees that no more than 4/3 times the minimum number of required colors are used.
Abstract: A cellular network is generally modeled as a subgraph of the triangular lattice. In the static frequency assignment problem, each vertex of the graph is a base station in the network, and has associated with it an integer weight that represents the number of calls that must be served at the vertex by assigning distinct frequencies per call. The edges of the graph model interference constraints for frequencies assigned to neighboring stations. The static frequency assignment problem can be abstracted as a graph multicoloring problem. We describe an efficient algorithm to multicolor optimally any weighted even or odd length cycle representing a cellular network. This result is further extended to any outerplanar graph. For the problem of multicoloring an arbitrary connected subgraph of the triangular lattice, we demonstrate an approximation algorithm which guarantees that no more than 4/3 times the minimum number of required colors are used. Further, we show that this algorithm can be implemented in a distributed manner, where each station needs to have knowledge only of the weights at a small neighborhood.

Journal ArticleDOI
TL;DR: In this paper, a bicriteria approximation algorithm was proposed for the Steiner tree problem with two objectives: the total cost of the edges and nodes in the network and the maximum degree of any node in the graph.
Abstract: We study network-design problems with two different design objectives: the total cost of the edges and nodes in the network and the maximum degree of any node in the network. A prototypical example is the degree-constrained node-weighted Steiner tree problem: We are given an undirected graph G(V,E), with a non-negative integral function d that specifies an upper bound d(v) on the degree of each vertex v∈V in the Steiner tree to be constructed, nonnegative costs on the nodes, and a subset of k nodes called terminals. The goal is to construct a Steiner tree T containing all the terminals such that the degree of any node v in T is at most the specified upper bound d(v) and the total cost of the nodes in T is minimum. Our main result is a bicriteria approximation algorithm whose output is approximate in terms of both the degree and cost criteria—the degree of any node v∈V in the output Steiner tree is O(d(v)log k) and the cost of the tree is O(log k) times that of a minimum-cost Steiner tree that obeys the degree bound d(v) for each node v. Our result extends to the more general problem of constructing one-connected networks such as generalized Steiner forests. We also consider the special case in which the edge costs obey the triangle inequality and present simple approximation algorithms with better performance guarantees.

Journal ArticleDOI
TL;DR: Two variants of the classic knapsack problem, the CMKP and the FPP, are shown to provide efficient solutions for two fundamental problems arising in multimedia storage subsystems.
Abstract: We study two variants of the classic knapsack problem, in which we need to place items of different types in multiple knapsacks; each knapsack has a limited capacity, and a bound on the number of different types of items it can hold: in the class-constrained multiple knapsack problem (CMKP) we wish to maximize the total number of packed items; in the fair placement problem (FPP) our goal is to place the same (large) portion from each set. We look for a perfect placement, in which both problems are solved optimally. We first show that the two problems are NP-hard; we then consider some special cases, where a perfect placement exists and can be found in polynomial time. For other cases, we give approximate solutions. Finally, we give a nearly optimal solution for the CMKP. Our results for the CMKP and the FPP are shown to provide efficient solutions for two fundamental problems arising in multimedia storage subsystems.

Journal ArticleDOI
Uwe Rösler1
TL;DR: This paper develops general tools for the analysis of stochastic divide and conquer algorithms and analyse the average performance and the running time distribution of the (2k + 1)-median version of Quicksort.
Abstract: This paper develops general tools for the analysis of stochastic divide and conquer algorithms. We concentrate on the average performance and the distribution of the running time of the algorithm. As a special example we analyse the average performance and the running time distribution of the (2k + 1)-median version of Quicksort.

Journal ArticleDOI
TL;DR: The maximum uniquely restricted matching problem is shown to be NP-complete for bipartite graphs, split graphs, and hence for chordal graphs and comparability graphs, but can be solved in linear time for thresholds, proper interval graphs, cacti and block graphs.
Abstract: A matching in a graph is a set of edges no two of which share a common vertex. In this paper we introduce a new, specialized type of matching which we call uniquely restricted matchings, originally motivated by the problem of determining a lower bound on the rank of a matrix having a specified zero/ non-zero pattern. A uniquely restricted matching is defined to be a matching M whose saturated vertices induce a subgraph which has only one perfect matching, namely M itself. We introduce the two problems of recognizing a uniquely restricted matching and of finding a maximum uniquely restricted matching in a given graph, and present algorithms and complexity results for certain special classes of graphs. We demonstrate that testing whether a given matching M is uniquely restricted can be done in O(|M||E|) time for an arbitrary graph G=(V,E) and in linear time for cacti, interval graphs, bipartite graphs, split graphs and threshold graphs. The maximum uniquely restricted matching problem is shown to be NP-complete for bipartite graphs, split graphs, and hence for chordal graphs and comparability graphs, but can be solved in linear time for threshold graphs, proper interval graphs, cacti and block graphs.

Journal ArticleDOI
TL;DR: This work obtains curious explicit evaluations for certain moments of the Airy distribution, including moments of orders -1, -3, -5 , etc., as well as +\frac13, -\frac53, - \frac 11 3 , etc.
Abstract: The Airy distribution (of the ``area'' type) occurs as a limit distribution of cumulative parameters in a number of combinatorial structures, like path length in trees, area below walks, displacement in parking sequences, and it is also related to basic graph and polyomino enumeration. We obtain curious explicit evaluations for certain moments of the Airy distribution, including moments of orders -1 , -3 , -5 , etc., as well as +\frac13 , -\frac53 , -\frac 11 3 , etc. and -\frac73 , -\frac 13 3 , -\frac 19 3 , etc. Our proofs are based on integral transforms of the Laplace and Mellin type and they rely essentially on ``non-probabilistic'' arguments like analytic continuation. A by-product of this approach is the existence of relations between moments of the Airy distribution, the asymptotic expansion of the Airy function \Ai(z) at +∈fty , and power symmetric functions of the zeros -αk of \Ai(z) .

Journal ArticleDOI
TL;DR: A quite general model of source that comes from dynamical systems theory is introduced, and the main tool is the generalized Ruelle operator, which can be viewed as a “generating” operator for fundamental intervals (associated to information sharing common prefixes).
Abstract: A quite general model of source that comes from dynamical systems theory is introduced. Within this model, some basic problems of algorithmic information theory contexts are analysed. The main tool is a new object, the generalized Ruelle operator, which can be viewed as a "generating" operator for fundamental intervals (associated to information sharing common prefixes). Its dominant spectral objects are linked with important parameters of the source, such as the entropy, and play a central role in all the results.

Journal ArticleDOI
TL;DR: It is shown that this crust has the properties of the original, and that the resulting skeleton has many practical uses, and is illustrated the usefulness of the combined diagram with various applications.
Abstract: We wish to extract the topology from scanned maps. In previous work [GNY] this was done by extracting a skeleton from the Voronoi diagram, but this required vertex labelling and was only useable for polygon maps. We wished to take the crust algorithm of Amenta et al. [ABE] and modify it to extract the skeleton from unlabelled vertices. We find that by reducing the algorithm to a local test on the original Voronoi diagram we may extract both a crust and a skeleton simultaneously, using a variant of the Quad-Edge structure of [GS]. We show that this crust has the properties of the original, and that the resulting skeleton has many practical uses. We illustrate the usefulness of the combined diagram with various applications.

Journal ArticleDOI
TL;DR: A set of rules that simplify the conflict graph without reducing the size of an optimal solution are presented, and the application of these rules with a simple heuristic yields near-optimal solutions.
Abstract: The general label-placement problem consists in labeling a set of features (points, lines, regions) given a set of candidates (rectangles, circles, ellipses, irregularly shaped labels) for each feature. The problem arises when annotating classical cartographical maps, diagrams, or graph drawings. The size of a labeling is the number of features that receive pairwise nonintersecting candidates. Finding an optimal solution, i.e., a labeling of maximum size, is NP-hard. We present an approach to attack the problem in its full generality. The key idea is to separate the geometric part from the combinatorial part of the problem. The latter is captured by the conflict graph of the candidates. We present a set of rules that simplify the conflict graph without reducing the size of an optimal solution. Combining the application of these rules with a simple heuristic yields near-optimal solutions. We study competing algorithms and do a thorough empirical comparison on point-labeling data. The new algorithm we suggest is fast, simple, and effective.

Journal ArticleDOI
TL;DR: This work has developed algorithms to compute approximations of shortest paths on non-convex polyhedra in both the unweighted and weighted domain and shows that the algorithms perform much better in practice and the accuracy is near-optimal.
Abstract: One common problem in computational geometry is that of computing shortest paths between two points in a constrained domain. In the context of Geographical Information Systems (GIS), terrains are often modeled as Triangular Irregular Networks (TIN) which are a special class on non-convex polyhedra. It is often necessary to compute shortest paths on the TIN surface which takes into account various weights according to the terrain features. We have developed algorithms to compute approximations of shortest paths on non-convex polyhedra in both the unweighted and weighted domain. The algorithms are based on placing Steiner points along the TIN edges and then creating a graph in which we apply Dijkstra's shortest path algorithm. For two points s and t on a non-convex polyhedral surface P , our analysis bounds the approximate weighted shortest path cost as || Π'(s,t)|| ≤ ||Π(s,t)|| + W |L| , where L denotes the longest edge length of \cal P and W denotes the largest weight of any face of P . The worst case time complexity is bounded by O(n 5 ) . An alternate algorithm, based on geometric spanners, is also provided and it ensures that ||Π' (s,t)|| ≤β(||Π(s,t)|| + W|L|) for some fixed constant β >1 , and it runs in O(n 3 log n) worst case time. We also present detailed experimental results which show that the algorithms perform much better in practice and the accuracy is near-optimal.

Journal ArticleDOI
TL;DR: The main focus of the work was on designing and engineering an algorithm that is completely reliable, easy to implement, and fast in practice, and based on different strategies (geometric hashing, bounding-volume trees) for speeding up the ear-clipping process in practice.
Abstract: We discuss a triangulation algorithm that is based on repeatedly clipping ears of a polygon. The main focus of our work was on designing and engineering an algorithm that is (1) completely reliable, (2) easy to implement, and (3) fast in practice. The algorithm was implemented in ANSI C, based on floating-point arithmetic. Due to a series of heuristics that get applied as a back-up for the standard ear-clipping process if the code detects deficiencies in the input polygon, our triangulation code can handle any type of polygonal input data, be it simple or not. Based on our implementation we report on different strategies (geometric hashing, bounding-volume trees) for speeding up the ear-clipping process in practice. The code has been tuned accordingly, and cpu-time statistics document that it tends to be faster than other popular triangulation codes. All engineering details that ensure the reliability and efficiency of the triangulation code are described in full detail. We also report experimental data on how different strategies for avoiding sliver triangles affect the cpu-time consumption of the algorithm. Our code, named FIST as an acronym for fast industrial-strength triangulation, forms the core of a package for triangulating the faces of three-dimensional polyhedra, and it has been successfully incorporated into several industrial graphics packages, including an implementation for Java 3D by Sun Microsystems.

Journal ArticleDOI
TL;DR: A randomized algorithm in deterministic time O(Nlog M) for estimating the score vector of matches between a text string of length N and a patternstring of length M, i.e., the vector obtained when the pattern is slid along the text, and the number of matches is counted for each position.
Abstract: We give a randomized algorithm in deterministic time O(Nlog M) for estimating the score vector of matches between a text string of length N and a pattern string of length M , i.e., the vector obtained when the pattern is slid along the text, and the number of matches is counted for each position. A direct application is approximate string matching. The randomized algorithm uses convolution to find an estimator of the scores; the variance of the estimator is particularly small for scores that are close to M , i.e., for approximate occurrences of the pattern in the text. No assumption is made about the probabilistic characteristics of the input, or about the size of the alphabet. The solution extends to string matching with classes, class complements, ``never match'' and ``always match'' symbols, to the weighted case and to higher dimensions.

Journal ArticleDOI
TL;DR: This work gives a 2/3 approximation algorithm for one version of the geometric dispersion problem, which is strongly polynomial in the size of the input, i.e., its running time does not depend on the area of P .
Abstract: We consider problems of distributing a number of points within a polygonal region P , such that the points are ``far away'' from each other. Problems of this type have been considered before for the case where the possible locations form a discrete set. Dispersion problems are closely related to packing problems. While Hochbaum and Maass [20] have given a polynomial-time approximation scheme for packing, we show that geometric dispersion problems cannot be approximated arbitrarily well in polynomial time, unless P = NP. A special case of this observation solves an open problem by Rosenkrantz et al. [31]. We give a 2/3 approximation algorithm for one version of the geometric dispersion problem. This algorithm is strongly polynomial in the size of the input, i.e., its running time does not depend on the area of P . We also discuss extensions and open problems.

Journal ArticleDOI
TL;DR: In this paper, a sequence of new linear-time, bounded-space, on-line bin packing algorithms, the K -Bounded Best Fit algorithms (BBF ), were presented.
Abstract: We present a sequence of new linear-time, bounded-space, on-line bin packing algorithms, the K -Bounded Best Fit algorithms (BBF K ). They are based on the Θ(n log n) Best Fit algorithm in much the same way as the Next-K Fit algorithms are based on the Θ(n log n) First Fit algorithm. Unlike the Next-K Fit algorithms, whose asymptotic worst-case ratios approach the limiting value of \frac 17 10 from above as K \rightarrow ∈fty but never reach it, these new algorithms have worst-case ratio \frac 17 10 for all K \geq 2 . They also have substantially better average performance than their bounded-space competition, as we have determined based on extensive experimental results summarized here for instances with item sizes drawn independently and uniformly from intervals of the form (0, u] , 0 < u ≤ 1 . Indeed, for each u < 1 , it appears that there exists a fixed memory bound K(u) such that BBF K(u) obtains significantly better packings on average than does the First Fit algorithm, even though the latter requires unbounded storage and has a significantly greater running time. For u = 1 , BBF K can still outperform First Fit (and essentially equal Best Fit) if K is allowed to grow slowly. We provide both theoretical and experimental results concerning the growth rates required.

Journal ArticleDOI
TL;DR: A method for clustering geo-referenced data suitable for applications in spatial data mining, based on the medoid method, which incorporates both proximity and density information to achieve high-quality clusters in subquadratic time.
Abstract: In this paper we present a method for clustering geo-referenced data suitable for applications in spatial data mining, based on the medoid method. The medoid method is related to k -MEANS, with the restriction that cluster representatives be chosen from among the data elements. Although the medoid method in general produces clusters of high quality, especially in the presence of noise, it is often criticized for the Ω(n 2 ) time that it requires. Our method incorporates both proximity and density information to achieve high-quality clusters in subquadratic time; it does not require that the user specify the number of clusters in advance. The time bound is achieved by means of a fast approximation to the medoid objective function, using Delaunay triangulations to store proximity information.

Journal ArticleDOI
TL;DR: It is proved that appropriately normalized phrase length in all three models tends to the standard normal distribution, which leads to bounds on the average redundancy of the Lempel—Ziv code.
Abstract: For a Markovian source, we analyze the Lempel—Ziv parsing scheme that partitions sequences into phrases such that a new phrase is the shortest phrase not seen in the past. We consider three models: In the Markov Independent model, several sequences are generated independently by Markovian sources, and the ith phrase is the shortest prefix of the ith sequence that was not seen before as a phrase (i.e., a prefix of previous (i-1) sequences). In the other two models, only a single sequence is generated by a Markovian source. In the second model, called the Gilbert—Kadota model, a fixed number of phrases is generated according to the Lempel—Ziv algorithm, thus producing a sequence of a variable (random) length. In the last model, known also as the Lempel—Ziv model, a string of fixed length is partitioned into a variable (random) number of phrases. These three models can be efficiently represented and analyzed by digital search trees that are of interest to other algorithms such as sorting, searching, and pattern matching. In this paper we concentrate on analyzing the average profile (i.e., the average number of phrases of a given length), the typical phrase length, and the length of the last phrase. We obtain asymptotic expansions for the mean and the variance of the phrase length, and we prove that appropriately normalized phrase length in all three models tends to the standard normal distribution, which leads to bounds on the average redundancy of the Lempel—Ziv code. For the Markov Independent model, this finding is established by analytic methods (i.e., generating functions, Mellin transform, and depoissonization), while for the other two models we use a combination of analytic and probabilistic analyses.

Journal ArticleDOI
TL;DR: This work shows an excellent example of a complex and theoretical analysis of algorithms used for design and for practical algorithm engineering, instead of the common practice of first designing an algorithm and then analyzing it.
Abstract: We study a recent algorithm for fast on-line approximate string matching. This is the problem of searching a pattern in a text allowing errors in the pattern or in the text. The algorithm is based on a very fast kernel which is able to search short patterns using a nondeterministic finite automaton, which is simulated using bit-parallelism. A number of techniques to extend this kernel for longer patterns are presented in that work. However, the techniques can be integrated in many ways and the optimal interplay among them is by no means obvious. The solution to this problem starts at a very low level, by obtaining basic probabilistic information about the problem which was not previously known, and ends integrating analytical results with empirical data to obtain the optimal heuristic. The conclusions obtained via analysis are experimentally confirmed. We also improve many of the techniques and obtain a combined heuristic which is faster than the original work. This work shows an excellent example of a complex and theoretical analysis of algorithms used for design and for practical algorithm engineering, instead of the common practice of first designing an algorithm and then analyzing it.

Journal ArticleDOI
TL;DR: In this article, the authors consider the online load balancing problem where there are m identical machines (servers) and a sequence of jobs and show that for the sum of the squares the greedy algorithm performs within 4/3 of the optimum, and no on-line algorithm achieves a better competitive ratio.
Abstract: We consider the on-line load balancing problem where there are m identical machines (servers) and a sequence of jobs. The jobs arrive one by one and should be assigned to one of the machines in an on-line fashion. The goal is to minimize the sum (over all machines) of the squares of the loads, instead of the traditional maximum load. We show that for the sum of the squares the greedy algorithm performs within 4/3 of the optimum, and no on-line algorithm achieves a better competitive ratio. Interestingly, we show that the performance of Greedy is not monotone in the number of machines. More specifically, the competitive ratio is 4/3 for any number of machines divisible by 3 but strictly less than 4/3 in all the other cases (although it approaches 4/3 for a large number of machines). To prove that Greedy is optimal, we show a lower bound of 4/3 for any algorithm for three machines. Surprisingly, we provide a new on-line algorithm that performs within 4/3 -δ of the optimum, for some fixed δ>0 , for any sufficiently large number of machines. This implies that the asymptotic competitive ratio of our new algorithm is strictly better than the competitive ratio of any possible on-line algorithm. Such phenomena is not known to occur for the classic maximum load problem. Minimizing the sum of the squares is equivalent to minimizing the load vector with respect to the l 2 norm. We extend our techniques and analyze the exact competitive ratio of Greedy with respect to the l p norm. This ratio turns out to be 2 - Θ(( ln p)/p) . We show that Greedy is optimal for two machines but design an algorithm whose asymptotic competitive ratio is better than the ratio of Greedy.

Journal ArticleDOI
TL;DR: In this paper, a linear time algorithm was proposed for finding an optimal edge ranking of a tree, improving the time complexity to linear, which is the fastest known algorithm for edge ranking.
Abstract: Given a tree, finding an optimal node ranking and finding an optimal edge ranking are interesting computational problems. The former problem already has a linear time algorithm in the literature. For the latter, only recently polynomial time algorithms have been revealed, and the best known algorithm requires more than quadratic time. In this paper we present a new approach for finding an optimal edge ranking of a tree, improving the time complexity to linear.