scispace - formally typeset
Search or ask a question

Showing papers in "SIAM Journal on Computing in 2003"


Journal ArticleDOI
TL;DR: This work proposes a fully functional identity-based encryption (IBE) scheme based on bilinear maps between groups and gives precise definitions for secure IBE schemes and gives several applications for such systems.
Abstract: We propose a fully functional identity-based encryption (IBE) scheme. The scheme has chosen ciphertext security in the random oracle model assuming a variant of the computational Diffie--Hellman problem. Our system is based on bilinear maps between groups. The Weil pairing on elliptic curves is an example of such a map. We give precise definitions for secure IBE schemes and give several applications for such systems.

5,110 citations


Journal ArticleDOI
TL;DR: A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs.
Abstract: In the multiarmed bandit problem, a gambler must decide which arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to find the best one) and exploitation (playing the arm believed to give the best payoff). Past solutions for the bandit problem have almost always relied on assumptions about the statistics of the slot machines. In this work, we make no statistical assumptions whatsoever about the nature of the process generating the payoffs of the slot machines. We give a solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs. In a sequence of T plays, we prove that the per-round payoff of our algorithm approaches that of the best arm at the rate O(T-1/2). We show by a matching lower bound that this is the best possible. We also prove that our algorithm approaches the per-round payoff of any set of strategies at a similar rate: if the best strategy is chosen from a pool of N strategies, then our algorithm approaches the per-round payoff of the strategy at the rate O((log N1/2 T-1/2). Finally, we apply our results to the problem of playing an unknown repeated matrix game. We show that our algorithm approaches the minimax payoff of the unknown game at the rate O(T-1/2).

2,370 citations


Journal ArticleDOI
TL;DR: The proposed data structure for representing all distances in a graph is distributed in the sense that it may be viewed as assigning labels to the vertices, such that a query involving vertices u and v may be answered using only the labels of u andV.
Abstract: Reachability and distance queries in graphs are fundamental to numerous applications, ranging from geographic navigation systems to Internet routing. Some of these applications involve huge graphs and yet require fast query answering. We propose a new data structure for representing all distances in a graph. The data structure is distributed in the sense that it may be viewed as assigning labels to the vertices, such that a query involving vertices u and v may be answered using only the labels of u and v. Our labels are based on 2-hop covers of the shortest paths, or of all paths, in a graph. For shortest paths, such a cover is a collection S of shortest paths such that, for every two vertices u and v, there is a shortest path from u to v that is a concatenation of two paths from S. We describe an efficient algorithm for finding an almost optimal 2-hop cover of a given collection of paths. Our approach is general and can be applied to directed or undirected graphs, exact or approximate shortest paths, or to reachability queries. We study the proposed data structure using a combination of theoretical and experimental means. We implemented our algorithm and checked the size of the resulting data structure on several real-life networks from different application areas. Our experiments show that the total size of the labels is typically not much larger than the network itself, and is usually considerably smaller than an explicit representation of the transitive closure of the network.

457 citations


Journal ArticleDOI
TL;DR: A space-efficient, one-pass algorithm for approximating the L1-difference between two functions, when the function values ai and bi are given as data streams, and their order is chosen by an adversary.
Abstract: Massive data sets are increasingly important in a wide range of applications, including observational sciences, product marketing, and the monitoring and operations of large systems. In network operations, raw data typically arrive in streams, and decisions must be made by algorithms that make one pass over each stream, throw much of the raw data away, and produce "synopses" or "sketches" for further processing. Moreover, network-generated massive data sets are often distributed: Several different, physically separated network elements may receive or generate data streams that, together, comprise one logical data set; to be of use in operations, the streams must be analyzed locally and their synopses sent to a central operations facility. The enormous scale, distributed nature, and one-pass processing requirement on the data sets of interest must be addressed with new algorithmic techniques. We present one fundamental new technique here: a space-efficient, one-pass algorithm for approximating the L1-difference $\sum_i|a_i-b_i|$ between two functions, when the function values ai and bi are given as data streams, and their order is chosen by an adversary. Our main technical innovation, which may be of interest outside the realm of massive data stream algorithmics, is a method of constructing families $\{V_j(s)\}$ of limited-independence random variables that are range-summable, by which we mean that $\sum_{j=0}^{c-1} V_j(s)$ is computable in time polylog(c) for all seeds s. Our L1-difference algorithm can be viewed as a "sketching" algorithm, in the sense of [Broder et al., J. Comput. System Sci., 60 (2000), pp. 630--659], and our technique performs better than that of Broder et al. when used to approximate the symmetric difference of two sets with small symmetric difference.

226 citations


Journal ArticleDOI
TL;DR: D domatic number is made the first natural maximization problem (known to the authors) that is provably approximable to within polylogarithmic factors but no better.
Abstract: A set of vertices in a graph is a dominating set if every vertex outside the set has a neighbor in the set. The domatic number problem is that of partitioning the vertices of a graph into the maximum number of disjoint dominating sets. Let n denote the number of vertices, $\delta$ the minimum degree, and $\Delta$ the maximum degree. We show that every graph has a domatic partition with $(1 - o(1))(\delta + 1)/\ln n$ dominating sets and, moreover, that such a domatic partition can be found in polynomial-time. This implies a $(1 + o(1))\ln n$-approximation algorithm for domatic number, since the domatic number is always at most $\delta + 1$. We also show this to be essentially best possible. Namely, extending the approximation hardness of set cover by combining multiprover protocols with zero-knowledge techniques, we show that for every $\epsilon > 0$, a $(1 - \epsilon)\ln n$-approximation implies that $NP \subseteq DTIME(n^{O(\log\log n)})$. This makes domatic number the first natural maximization problem (known to the authors) that is provably approximable to within polylogarithmic factors but no better. We also show that every graph has a domatic partition with $(1 - o(1))(\delta + 1)/\ln \Delta$ dominating sets, where the "o(1)" term goes to zero as $\Delta$ increases. This can be turned into an efficient algorithm that produces a domatic partition of $\Omega(\delta/\ln \Delta)$ sets.

195 citations


Journal ArticleDOI
TL;DR: A theory of resource-bounded dimension is developed using gales, which are natural generalizations of martingales, and when the resource bound $\Delta$ (a parameter of the theory) is unrestricted, the resulting dimension is precisely the classical Hausdorff dimension.
Abstract: A theory of resource-bounded dimension is developed using gales, which are natural generalizations of martingales. When the resource bound $\Delta$ (a parameter of the theory) is unrestricted, the resulting dimension is precisely the classical Hausdorff dimension (sometimes called "fractal dimension"). Other choices of the parameter $\Delta$ yield internal dimension theories in E, E2, ESPACE, and other complexity classes, and in the class of all decidable problems. In general, if $\mathcal{C}$ is such a class, then every set X of languages has a dimension in $\mathcal{C}$, which is a real number $\dim (X \mid \mathcal{C}) \in [0, 1]$. Along with the elements of this theory, two preliminary applications are presented: For every real number $0 \le \alpha \le \frac 1 2$, the set ${\rm FREQ}(\le \alpha)$, consisting of all languages that asymptotically contain at most $\alpha$ of all strings, has dimension $\mathcal{H}(\alpha)$---the binary entropy of $\alpha$---in E and in E2. For every real number $0 \le \alpha \le 1$, the set ${\rm SIZE}(\alpha \frac {2^n} n)$, consisting of all languages decidable by Boolean circuits of at most $\alpha \frac {2^n} n$ gates, has dimension $\alpha$ in ESPACE.

167 citations


Journal ArticleDOI
TL;DR: This work addresses the challenge of computing the similarity of two strings in subquadratic time for metrics which use a scoring matrix of unrestricted weights and presents an algorithm for comparing two {run-length} encoded strings of length m and n, compressed into m' and n' runs, respectively, in O(m'n + n'm) complexity.
Abstract: Given two strings of size $n$ over a constant alphabet, the classical algorithm for computing the similarity between two sequences [D. Sankoff and J. B. Kruskal, eds., {Time Warps, String Edits, and Macromolecules}; Addison--Wesley, Reading, MA, 1983; T. F. Smith and M. S. Waterman, { J.\ Molec.\ Biol., 147 (1981), pp. 195--197] uses a dynamic programming matrix and compares the two strings in O(n2) time. We address the challenge of computing the similarity of two strings in subquadratic time for metrics which use a scoring matrix of unrestricted weights. Our algorithm applies to both {local} and {global} similarity computations. The speed-up is achieved by dividing the dynamic programming matrix into variable sized blocks, as induced by Lempel--Ziv parsing of both strings, and utilizing the inherent periodic nature of both strings. This leads to an $O(n^2 / \log n)$, algorithm for an input of constant alphabet size. For most texts, the time complexity is actually $O(h n^2 / \log n)$, where $h \le 1$ is the entropy of the text. We also present an algorithm for comparing two {run-length} encoded strings of length m and n, compressed into m' and n' runs, respectively, in O(m'n + n'm) complexity. This result extends to all distance or similarity scoring schemes that use an additive gap penalty.

156 citations


Journal ArticleDOI
TL;DR: The external interval tree is presented, an optimal external memory data structure for answering stabbing queries on a set of dynamically maintained intervals that uses a weight-balancing technique for efficient worst-case manipulation of balanced trees.
Abstract: In this paper we present the external interval tree, an optimal external memory data structure for answering stabbing queries on a set of dynamically maintained intervals. The external interval tree can be used in an optimal solution to the dynamic interval management problem, which is a central problem for object-oriented and temporal databases and for constraint logic programming. Part of the structure uses a weight-balancing technique for efficient worst-case manipulation of balanced trees, which is of independent interest. The external interval tree, as well as our new balancing technique, have recently been used to develop several efficient external data structures.

154 citations


Journal ArticleDOI
TL;DR: It is shown that duality of two monotone CNFs can be disproved with limited nondeterminism and feasible in polynomial time with O(log2 n/\log log n) suitably guessed bits.
Abstract: We consider the problem of dualizing a monotone CNF (equivalently, computing all minimal transversals of a hypergraph) whose associated decision problem is a prominent open problem in NP-completeness. We present a number of new polynomial time, respectively, output-polynomial time results for significant cases, which largely advance the tractability frontier and improve on previous results. Furthermore, we show that duality of two monotone CNFs can be disproved with limited nondeterminism. More precisely, this is feasible in polynomial time with O(log2 n/\log log n) suitably guessed bits. This result sheds new light on the complexity of this important problem.

149 citations


Journal ArticleDOI
TL;DR: This paper presents an algebraic characterization of the important class of discrete cosine and sine transforms as decomposition matrices of certain regular modules associated with four series of Chebyshev polynomials.
Abstract: It is known that the discrete Fourier transform (DFT) used in digital signal processing can be characterized in the framework of the representation theory of algebras, namely, as the decomposition matrix for the regular module ${\mathbb{C}}[Z_n] = {\mathbb{C}}[x]/(x^n - 1)$. This characterization provides deep insight into the DFT and can be used to derive and understand the structure of its fast algorithms. In this paper we present an algebraic characterization of the important class of discrete cosine and sine transforms as decomposition matrices of certain regular modules associated with four series of Chebyshev polynomials. Then we derive most of their known algorithms by pure algebraic means. We identify the mathematical principle behind each algorithm and give insight into its structure. Our results show that the connection between algebra and digital signal processing is stronger than previously understood.

128 citations


Journal ArticleDOI
TL;DR: It follows that for those relaxations known to be efficiently computable, namely, for r=O(1), the value of the relaxation is comparable to the theta function.
Abstract: Lov{asz and Schrijver [SIAM J. Optim., 1 (1991), pp. 166--190] devised a lift-and-project method that produces a sequence of convex relaxations for the problem of finding in a graph an independent set (or a clique) of maximum size. Each relaxation in the sequence is tighter than the one before it, while the first relaxation is already at least as strong as the Lov{asz theta function [IEEE Trans. Inform. Theory, 25 (1979), pp. 1--7]. We show that on a random graph Gn,1/2, the value of the rth relaxation in the sequence is roughly \rule{0pt}{7pt}$\smash{\sqrt{\rule{0pt}{7pt}\smash{n/2^r}}}$, almost surely. It follows that for those relaxations known to be efficiently computable, namely, for r=O(1), the value of the relaxation is comparable to the theta function. Furthermore, a perfectly tight relaxation is almost surely obtained only at the $r=\Theta(\log n)$ relaxation in the sequence.

Journal ArticleDOI
TL;DR: In this article, the authors introduce a new variant of the k-median problem called the online median problem, which is called the Online Median Problem (OMP), where facilities are placed one at a time, a facility cannot be moved once it is placed, and the total number of facilities to be placed, k, is not known in advance.
Abstract: We introduce a natural variant of the (metric uncapacitated) k-median problem that we call the online median problem Whereas the k-median problem involves optimizing the simultaneous placement of k facilities, the online median problem imposes the following additional constraints: the facilities are placed one at a time, a facility cannot be moved once it is placed, and the total number of facilities to be placed, k, is not known in advance The objective of an online median algorithm is to minimize the competitive ratio, that is, the worst-case ratio of the cost of an online placement to that of an optimal offline placement Our main result is a constant-competitive algorithm for the online median problem running in time that is linear in the input size In addition, we present a related, though substantially simpler, constant-factor approximation algorithm for the (metric uncapacitated) facility location problem that runs in time linear in the input size The latter algorithm is similar in spirit to the recent primal-dual-based facility location algorithm of Jain and Vazirani, but our approach is more elementary and yields an improved running time While our primary focus is on problems which ask us to minimize the weighted average service distance to facilities, we also show that our results can be generalized to hold, to within constant factors, for more general objective functions For example, we show that all of our approximation results hold, to within constant factors, for the k-means objective function

Journal ArticleDOI
TL;DR: The scaling scheme with the push/relabel framework is combined to yield a faster combinatorial algorithm for submodular function minimization that improves over the previously best known bound by essentially a linear factor in the size of the underlying ground set.
Abstract: Combinatorial strongly polynomial algorithms for minimizing submodular functions have been developed by Iwata, Fleischer, and Fujishige (IFF) and by Schrijver. The IFF algorithm employs a scaling scheme for submodular functions, whereas Schrijver's algorithm achieves strongly polynomial bound with the aid of distance labeling. Subsequently, Fleischer and Iwata have described a push/relabel version of Schrijver's algorithm to improve its time complexity. This paper combines the scaling scheme with the push/relabel framework to yield a faster combinatorial algorithm for submodular function minimization. The resulting algorithm improves over the previously best known bound by essentially a linear factor in the size of the underlying ground set.

Journal ArticleDOI
TL;DR: A natural generalization of the algorithm for the abelian case to the nonabelian case is analyzed and it is shown that the algorithm determines the normal core of a hidden subgroup: in particular, normal subgroups can be determined.
Abstract: The hidden subgroup problem is the foundation of many quantum algorithms. An efficient solution is known for the problem over abelian groups, employed by both Simon's algorithm and Shor's factoring and discrete log algorithms. The nonabelian case, however, remains open; an efficient solution would give rise to an efficient quantum algorithm for graph isomorphism. We fully analyze a natural generalization of the algorithm for the abelian case to the nonabelian case and show that the algorithm determines the normal core of a hidden subgroup: in particular, normal subgroups can be determined. We show, however, that this immediate generalization of the abelian algorithm does not efficiently solve graph isomorphism.

Journal ArticleDOI
TL;DR: This work presents exact algorithms for reconstructing the ancestral doubled genome in linear time, minimizing the number of rearrangement mutations required to derive the observed order of genes along the present-day chromosomes.
Abstract: The genome can be modeled as a set of strings (chromosomes) of distinguished elements called genes. Genome duplication is an important source of new gene functions and novel physiological pathways. Originally (ancestrally), a duplicated genome contains two identical copies of each chromosome, but through the genomic rearrangement mutational processes of reciprocal translocation (prefix and/or suffix exchanges between chromosomes) and substring reversals, this simple doubled structure is disrupted. At the time of observation, each of the chromosomes resulting from the accumulation of rearrangements can be decomposed into a succession of conserved segments, such that each segment appears exactly twice in the genome. We present exact algorithms for reconstructing the ancestral doubled genome in linear time, minimizing the number of rearrangement mutations required to derive the observed order of genes along the present-day chromosomes. Somewhat different techniques are required for a translocations-only model, a translocations/reversals model, both of these in the multichromosomal context (eukaryotic nuclear genomes), and a reversals-only model for single chromosome prokaryotic and organellar genomes. We apply these methods to the yeast genome, which is thought to have doubled, and to the liverwort mitochondrial genome, whose duplicate genes are unlikely to have arisen by genome doubling.

Journal ArticleDOI
TL;DR: This paper shows how to force a lower bound of $\sqrt{3\,}-\epsilon $ for any positive $\ep silon $ and reduces the gap between the performance of known algorithms.
Abstract: The problem considered here is the same as the one discussed in [G. Galambos and G. J. Woeginger, eds., SIAM J. Comput., 22 (1993), pp. 349--355]. It is an m-machine online scheduling problem in which we wish to minimize the competitive ratio for the makespan objective. In this paper, we show that $\sqrt{3}$ is a lower bound on this competitive ratio for m=4. In particular, we show how to force a lower bound of $\sqrt{3\,}-\epsilon $ for any positive $\epsilon $. This reduces the gap between the performance of known algorithms [S. Albers, in Proceedings of the 29th Annual ACM Symposium on Theory of Computing, ACM, New York, 1997, pp. 130--139] and the lower bound. The method used introduces an approach to building the task master's strategy.

Journal ArticleDOI
TL;DR: It is shown that any concurrent zero-knowledge protocol for a nontrivial language must use at least $\tilde\Omega(\log n)$ rounds of interaction, which is the first bound to rule out the possibility of constant-round concurrentzero-knowledge when proven via black-box simulation.
Abstract: We show that any concurrent zero-knowledge protocol for a nontrivial language (i.e., for a language outside ${\cal BPP}$), whose security is proven via black-box simulation, must use at least $\tilde\Omega(\log n)$ rounds of interaction. This result achieves a substantial improvement over previous lower bounds and is the first bound to rule out the possibility of constant-round concurrent zero-knowledge when proven via black-box simulation. Furthermore, the bound is polynomially related to the number of rounds in the best known concurrent zero-knowledge protocol for languages in ${\cal NP}$ (which is established via black-box simulation).

Journal ArticleDOI
TL;DR: The main result is a lower bound of $\Omega(m^2 \log m)$ for the size of any arithmetic circuit for the product of two matrices, over the real or complex numbers, as long as the circuit does not use products with field elements larger than 1.
Abstract: Our main result is a lower bound of $\Omega(m^2 \log m)$ for the size of any arithmetic circuit for the product of two matrices, over the real or complex numbers, as long as the circuit does not use products with field elements of absolute value larger than 1 (where m × m is the size of each matrix). That is, our lower bound is superlinear in the number of inputs and is applied for circuits that use addition gates, product gates, and products with field elements of absolute value up to 1. We also prove size-depth tradeoffs for such circuits: We show that if a circuit, as above, is of depth d, then its size is $\Omega(m^{2+ 1/O(d)})$.

Journal ArticleDOI
TL;DR: An approximation algorithm is presented for the problem of finding a minimum-cost k-vertex connected spanning subgraph, assuming that the number of vertices is at least 6k 2, and the approximation guarantee is six times the kth harmonic number.
Abstract: We present an approximation algorithm for the problem of finding a minimum-cost k-vertex connected spanning subgraph, assuming that the number of vertices is at least 6k 2 The approximation guarantee is six times the kth harmonic number (which is O(log k)), and this is also an upper bound on the integrality ratio for a standard linear programming relaxation

Journal ArticleDOI
TL;DR: The first main result is that if a macro tree translation is of linear size increase, then the translation is MSO definable (i.e., definable in monadic second-order logic) and if it is, then an equivalent MSO transducer can be constructed.
Abstract: The first main result is that if a macro tree translation is of linear size increase, i.e., if the size of every output tree is linearly bounded by the size of the corresponding input tree, then the translation is MSO definable (i.e., definable in monadic second-order logic). This gives a new characterization of the MSO definable tree translations in terms of macro tree transducers: they are exactly the macro tree translations of linear size increase. The second main result is that given a macro tree transducer, it can be decided whether or not its translation is MSO definable, and if it is, then an equivalent MSO transducer can be constructed. Similar results hold for attribute grammars, which define a subclass of the macro tree translations.

Journal ArticleDOI
TL;DR: An exponential gap between quantum and classical sampling complexity for the set-disjointness function is established and several variants of a definition of sampling complexity are given.
Abstract: Sampling is an important primitive in probabilistic and quantum algorithms. In the spirit of communication complexity, given a function $f: X \times Y \rightarrow \{0,1\}$ and a probability distribution ${\cal D}$ over $X \times Y$, we define the sampling complexity of $(f, {\cal D})$ as the minimum number of bits that Alice and Bob must communicate for Alice to pick $x \in X$ and Bob to pick $y \in Y$ as well as a value $z$ such that the resulting distribution of $(x,y,z)$ is close to the distribution $({\cal D}, f({\cal D}))$. In this paper we initiate the study of sampling complexity, in both the classical and quantum models. We give several variants of a definition. We completely characterize some of these variants and give upper and lower bounds on others. In particular, this allows us to establish an exponential gap between quantum and classical sampling complexity for the set-disjointness function.

Journal ArticleDOI
TL;DR: An algorithm that constructs asymptotically close to optimal schedules is presented and how to achieve the largest known n for all values of h is shown.
Abstract: The windows scheduling problem is defined by the positive integers n, h, and w1, ...,wn. There are n pages where the window wi is associated with page i, and h is the number of slotted channels available for broadcasting the pages. A schedule that solves the problem assigns pages to slots such that the gap between any two consecutive appearances of page i is at most wi slots. We investigate two optimization problems. (i) The optimal windows scheduling problem: given w1, ..., wn find a schedule in which h is minimized. (ii) The optimal harmonic windows scheduling problem: given h find a schedule for the windows wi = i in which n is maximized. The former is a formulation of the problem of minimizing the bandwidth in push systems that support guaranteed delay, and the latter is a formulation of the problem of minimizing the startup delay in media-on-demand systems. For the optimal windows scheduling problem we present an algorithm that constructs asymptotically close to optimal schedules, and for the optimal harmonic windows scheduling problem we show how to achieve the largest known n's for all values of h.

Journal ArticleDOI
TL;DR: The nondeterministic quantum algorithms for Boolean functions f have positive acceptance probability on input x iff f(x)=1, which implies that the quantum communication complexities of the equality and disjointness functions are n+1 if the authors do not allow any error probability.
Abstract: We study nondeterministic quantum algorithms for Boolean functions f. Such algorithms have positive acceptance probability on input x iff f(x)=1. In the setting of query complexity, we show that the nondeterministic quantum complexity of a Boolean function is equal to its "nondeterministic polynomial" degree. We also prove a quantum-vs.-classical gap of 1 vs. n for nondeterministic query complexity for a total function. In the setting of communication complexity, we show that the nondeterministic quantum complexity of a two-party function is equal to the logarithm of the rank of a nondeterministic version of the communication matrix. This implies that the quantum communication complexities of the equality and disjointness functions are n+1 if we do not allow any error probability. We also exhibit a total function in which the nondeterministic quantum communication complexity is exponentially smaller than its classical counterpart.

Journal ArticleDOI
TL;DR: The phylogenetic kth root problem (PRk) as mentioned in this paper is an evolutionary tree reconstruction problem where the goal is to find a phylogeny tree T from a given graph G such that T has no degree-2 internal nodes, and the external nodes (i.e., leaves) of T are exactly the elements of G. The complexity of the problem is open.
Abstract: Given a set of species and their similarity data, an important problem in evolutionary biology is how to reconstruct a phylogeny (also called evolutionary tree) so that species are close in the phylogeny if and only if they have high similarity Assume that the similarity data are represented as a graph G = (V, E), where each vertex represents a species and two vertices are adjacent if they represent species of high similarity The phylogeny reconstruction problem can then be abstracted as the problem of finding a (phylogenetic) tree T from the given graph G such that (1) T has no degree-2 internal nodes, (2) the external nodes (ie, leaves) of T are exactly the elements of V, and (3) $(u, v) \in E$ if and only if $d_T(u, v) \le k$ for some fixed threshold k, where dT(u,v) denotes the distance between u and v in tree T This is called the phylogenetic kth root problem (PRk), and such a tree T, if it exists, is called a phylogenetic kth root of graph G The computational complexity of PRk} is open, except

Journal ArticleDOI
TL;DR: A polynomial time approximation scheme to settle the problem of finding a substring s of length L that distinguishes the bad strings from good strings.
Abstract: Consider two sets of strings, ${\cal B}$ (bad genes) and ${\cal G}$ (good genes), as well as two integers $d_b$ and $d_g$ ($d_b\leq d_g$). A frequently occurring problem in computational biology (and other fields) is to find a (distinguishing) substring s of length L that distinguishes the bad strings from good strings, i.e., such that for each string $s_i\in {\cal B}$ there exists a length-L substring ti of si with $d(s, t_i)\leq d_b$ (close to bad strings), and for every substring ui of length L of every string $g_i\in {\cal G}$, $d(s, u_i)\geq d_g$ (far from good strings).We present a polynomial time approximation scheme to settle the problem; i.e., for any constant $\epsilon >0$, the algorithm finds a string s of length L such that for every $s_i\in {\cal B}$ there is a length-L substring ti of si with $d(t_i, s)\leq (1+\epsilon) d_b$, and for every substring ui of length L of every $g_i\in {\cal G}$, $d(u_i, s)\geq (1-\epsilon) d_g$ if a solution to the original pair ($d_b\leq d_g$) exists. Since the...

Journal ArticleDOI
TL;DR: The known algorithms for computing a selected entry of the extended Euclidean algorithm for integers and, consequently, for the modular and numerical rational number reconstruction problems are accelerated.
Abstract: We accelerate the known algorithms for computing a selected entry of the extended Euclidean algorithm for integers and, consequently, for the modular and numerical rational number reconstruction problems. The acceleration is from quadratic to nearly linear time, matching the known complexity bound for the integer gcd, which our algorithm computes as a special case.

Journal ArticleDOI
TL;DR: New algorithms for the variable-sized online bin packing problem are presented and upper bounds for them are shown which improve on the best previous upper bounds and show the first general lower bounds for this problem.
Abstract: In the variable-sized online bin packing problem, one has to assign items to bins one by one. The bins are drawn from some fixed set of sizes, and the goal is to minimize the sum of the sizes of the bins used. We present new algorithms for this problem and show upper bounds for them which improve on the best previous upper bounds. We also show the first general lower bounds for this problem. The case in which bins of two sizes, 1 and $\alpha \in (0,1)$, are used is studied in detail. This investigation leads us to the discovery of several interesting fractal-like curves.

Journal ArticleDOI
TL;DR: This work presents a deterministic algorithm that verifies whether a given m by m bipartite graph G is regular, in the sense of Szemeredi, and makes use of linear-sized expanders to accomplish a suitable form of deterministic sampling.
Abstract: We present a deterministic algorithm ${\cal A}$ that, in O(m2) time, verifies whether a given m by m bipartite graph G is regular, in the sense of Szemeredi [Regular partitions of graphs, in Problemes Combinatoires et Theorie des Graphes (Orsay, 1976), Colloques Internationaux CNRS 260, CNRS, Paris, 1978, pp. 399--401]. In the case in which G is not regular enough, our algorithm outputs a witness to this irregularity. Algorithm ${\cal A}$ may be used as a subroutine in an algorithm that finds an $\varepsilon$-regular partition of a given n-vertex graph $\Gamma$ in time O(n2). This time complexity is optimal, up to a constant factor, and improves upon the bound O(M(n)), proved by Alon et al. [The algorithmic aspects of the regularity lemma, J. Algorithms, 16 (1994), pp. 80--109], where M(n)=O(n2.376) is the time required to square a 0--1 matrix over the integers. Our approach is elementary, except that it makes use of linear-sized expanders to accomplish a suitable form of deterministic sampling.

Journal ArticleDOI
TL;DR: A quasi-polynomial time approximation scheme (QPTAS) for this problem when the instance is a weighted tree, when the nodes lie in $\mathbb{R}^d$ for some fixed d, and for planar graphs is presented.
Abstract: The minimum latency problem, also known as the traveling repairman problem, is a variant of the traveling salesman problem in which the starting node of the tour is given and the goal is to minimize the sum of the arrival times at the other nodes. We present a quasi-polynomial time approximation scheme (QPTAS) for this problem when the instance is a weighted tree, when the nodes lie in $\mathbb{R}^d$ for some fixed d, and for planar graphs. We also present a polynomial time constant factor approximation algorithm for the general metric case. The currently best polynomial time approximation algorithm for general metrics, due to Goemans and Kleinberg, computes a 3.59-approximation.

Journal ArticleDOI
TL;DR: New techniques to combine algorithms for unfair metrical task systems are introduced and these techniques are applied to obtain improved randomized online algorithms for metricaltask systems on arbitrary metric spaces.
Abstract: Unfair metrical task systems are a generalization of online metrical task systems. In this paper we introduce new techniques to combine algorithms for unfair metrical task systems and apply these techniques to obtain improved randomized online algorithms for metrical task systems on arbitrary metric spaces.