scispace - formally typeset
Search or ask a question

Showing papers in "SIAM Journal on Computing in 2005"


Journal ArticleDOI
TL;DR: It is shown that any set of relations used to specify the allowed forms of constraints can be associated with a finite universal algebra and how the computational complexity of the corresponding constraint satisfaction problem is connected to the properties of this algebra is explored.
Abstract: Many natural combinatorial problems can be expressed as constraint satisfaction problems. This class of problems is known to be NP-complete in general, but certain restrictions on the form of the constraints can ensure tractability. Here we show that any set of relations used to specify the allowed forms of constraints can be associated with a finite universal algebra and we explore how the computational complexity of the corresponding constraint satisfaction problem is connected to the properties of this algebra. Hence, we completely translate the problem of classifying the complexity of restricted constraint satisfaction problems into the language of universal algebra. We introduce a notion of "tractable algebra," and investigate how the tractability of an algebra relates to the tractability of the smaller algebras which may be derived from it, including its subalgebras and homomorphic images. This allows us to reduce significantly the types of algebras which need to be classified. Using our results we also show that if the decision problem associated with a given collection of constraint types can be solved efficiently, then so can the corresponding search problem. We then classify all finite strictly simple surjective algebras with respect to tractability, obtaining a dichotomy theorem which generalizes Schaefer's dichotomy for the generalized satisfiability problem. Finally, we suggest a possible general algebraic criterion for distinguishing the tractable and intractable cases of the constraint satisfaction problem.

674 citations


Journal ArticleDOI
TL;DR: The result presents for the first time an efficient index whose size is provably linear in the size of the text in the worst case, and for many scenarios, the space is actually sublinear in practice.
Abstract: The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for space-efficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text $T$ consisting of $n$ symbols drawn from a fixed alphabet $\Sigma$. The text $T$ can be represented in $n \lg |\Sigma|$ bits by encoding each symbol with $\lg |\Sigma|$ bits. The goal is to support fast online queries for searching any string pattern $P$ of $m$ symbols, with $T$ being fully scanned only once, namely, when the index is created at preprocessing time. The text indexing schemes published in the literature are greedy in terms of space usage: they require $\Omega(n \lg n)$ additional bits of space in the worst case. For example, in the standard unit cost RAM, suffix trees and suffix arrays need $\Omega(n)$ memory words, each of $\Omega(\lg n)$ bits. These indexes are larger than the text itself by a multiplicative factor of $\Omega(\smash{\lg_{|\Sigma|} n})$, which is significant when $\Sigma$ is of constant size, such as in \textsc{ascii} or \textsc{unicode}. On the other hand, these indexes support fast searching, either in $O(m \lg |\Sigma|)$ time or in $O(m + \lg n)$ time, plus an output-sensitive cost $O(\mathit{occ})$ for listing the $\mathit{occ}$ pattern occurrences. We present a new text index that is based upon compressed representations of suffix arrays and suffix trees. It achieves a fast $\smash{O(m /\lg_{|\Sigma|} n + \lg_{|\Sigma|}^\epsilon n)}$ search time in the worst case, for any constant $0 < \epsilon \leq 1$, using at most $\smash{\bigl(\epsilon^{-1} + O(1)\bigr) \, n \lg |\Sigma|}$ bits of storage. Our result thus presents for the first time an efficient index whose size is provably linear in the size of the text in the worst case, and for many scenarios, the space is actually sublinear in practice. As a concrete example, the compressed suffix array for a typical 100 MB \textsc{ascii} file can require 30--40 MB or less, while the raw suffix array requires 500 MB. Our theoretical bounds improve \emph{both} time and space of previous indexing schemes. Listing the pattern occurrences introduces a sublogarithmic slowdown factor in the output-sensitive cost, giving $O(\mathit{occ} \, \smash{\lg_{|\Sigma|}^\epsilon n})$ time as a result. When the patterns are sufficiently long, we can use auxiliary data structures in $O(n \lg |\Sigma|)$ bits to obtain a total search bound of $O(m /\lg_{|\Sigma|} n + \mathit{occ})$ time, which is optimal.

559 citations


Journal ArticleDOI
TL;DR: A polynomial time approximation scheme (PTAS) for MKP, which appears to be the strongest special case of GAP that is not APX-hard, and a PTAS-preserving reduction from an arbitrary instance of MKP to an instance with distinct sizes and profits.
Abstract: The multiple knapsack problem (MKP) is a natural and well-known generalization of the single knapsack problem and is defined as follows. We are given a set of $n$ items and $m$ bins (knapsacks) such that each item $i$ has a profit $p(i)$ and a size $s(i)$, and each bin $j$ has a capacity $c(j)$. The goal is to find a subset of items of maximum profit such that they have a feasible packing in the bins. MKP is a special case of the generalized assignment problem (GAP) where the profit and the size of an item can vary based on the specific bin that it is assigned to. GAP is APX-hard and a 2-approximation, for it is implicit in the work of Shmoys and Tardos [Math. Program. A, 62 (1993), pp. 461-474], and thus far, this was also the best known approximation for MKP\@. The main result of this paper is a polynomial time approximation scheme (PTAS) for MKP\@. Apart from its inherent theoretical interest as a common generalization of the well-studied knapsack and bin packing problems, it appears to be the strongest special case of GAP that is not APX-hard. We substantiate this by showing that slight generalizations of MKP are APX-hard. Thus our results help demarcate the boundary at which instances of GAP become APX-hard. An interesting aspect of our approach is a PTAS-preserving reduction from an arbitrary instance of MKP to an instance with $O(\log n)$ distinct sizes and profits.

333 citations


Journal ArticleDOI
TL;DR: A quantum algorithm for the dihedral hidden subgroup problem (DHSP) with time and query complexity $2^{O(\sqrt{\log\ N})}$.
Abstract: We present a quantum algorithm for the dihedral hidden subgroup problem (DHSP) with time and query complexity $2^{O(\sqrt{\log\ N})}$. In this problem an oracle computes a function $f$ on the dihedral group $D_N$ which is invariant under a hidden reflection in $D_N$. By contrast, the classical query complexity of DHSP is $O(\sqrt{N})$. The algorithm also applies to the hidden shift problem for an arbitrary finitely generated abelian group. The algorithm begins as usual with a quantum character transform, which in the case of $D_N$ is essentially the abelian quantum Fourier transform. This yields the name of a group representation of $D_N$, which is not by itself useful, and a state in the representation, which is a valuable but indecipherable qubit. The algorithm proceeds by repeatedly pairing two unfavorable qubits to make a new qubit in a more favorable representation of $D_N$. Once the algorithm obtains certain target representations, direct measurements reveal the hidden subgroup.

277 citations


Journal ArticleDOI
TL;DR: In this paper, the authors studied the complexity of tile self-assembly under various generalizations of the tile selfassembly model and provided a lower bound of Ω( √ n 1/k) for the standard model.
Abstract: In this paper, we study the complexity of self-assembly under models that are natural generalizations of the tile self-assembly model. In particular, we extend Rothemund and Winfree's study of the tile complexity of tile self-assembly [Proceedings of the 32nd Annual ACM Symposium on Theory of Computing, Portland, OR, 2000, pp. 459--468]. They provided a lower bound of $\Omega(\frac{\log N}{\log\log N})$ on the tile complexity of assembling an $N\times N$ square for almost all N. Adleman et al. [Proceedings of the 33rd Annual ACM Symposium on Theory of Computing, Heraklion, Greece, 2001, pp. 740--748] gave a construction which achieves this bound. We consider whether the tile complexity for self-assembly can be reduced through several natural generalizations of the model. One of our results is a tile set of size $O(\sqrt{\log N})$ which assembles an $N\times N$ square in a model which allows flexible glue strength between nonequal glues. This result is matched for almost all N by a lower bound dictated by Kolmogorov complexity. For three other generalizations, we show that the $\Omega(\frac{\log N}{\log\log N})$ lower bound applies to $N\times N$ squares. At the same time, we demonstrate that there are some other shapes for which these generalizations allow reduced tile sets. Specifically, for thin rectangles with length N and width k, we provide a tighter lower bound of $\Omega(\frac{N^{1/k}}{k})$ for the standard model, yet we also give a construction which achieves $O(\frac{\log N}{\log\log N})$ complexity in a model in which the temperature of the tile system is adjusted during assembly. We also investigate the problem of verifying whether a given tile system uniquely assembles into a given shape; we show that this problem is NP-hard for three of the generalized models.

225 citations


Journal ArticleDOI
TL;DR: This paper describes an extension of the classical Perceptron algorithm, called second-order perceptron, and analyzes its performance within the mistake bound model of on-line learning.
Abstract: Kernel-based linear-threshold algorithms, such as support vector machines and Perceptron-like algorithms, are among the best available techniques for solving pattern classification problems. In this paper, we describe an extension of the classical Perceptron algorithm, called second-order Perceptron, and analyze its performance within the mistake bound model of on-line learning. The bound achieved by our algorithm depends on the sensitivity to second-order data information and is the best known mistake bound for (efficient) kernel-based linear-threshold classifiers to date. This mistake bound, which strictly generalizes the well-known Perceptron bound, is expressed in terms of the eigenvalues of the empirical data correlation matrix and depends on a parameter controlling the sensitivity of the algorithm to the distribution of these eigenvalues. Since the optimal setting of this parameter is not known a priori, we also analyze two variants of the second-order Perceptron algorithm: one that adaptively sets the value of the parameter in terms of the number of mistakes made so far, and one that is parameterless, based on pseudoinverses.

221 citations


Journal ArticleDOI
TL;DR: This paper proves the correctness of the gravitational algorithm in the fully asynchronous model and analyzes its convergence rate and establishes its convergence in the presence of crash faults.
Abstract: This paper considers the convergence problem in autonomous mobile robot systems. A natural algorithm for the problem requires the robots to move towards their center of gravity. This paper proves the correctness of the gravitational algorithm in the fully asynchronous model. It also analyzes its convergence rate and establishes its convergence in the presence of crash faults.

217 citations


Journal ArticleDOI
TL;DR: These are the first known PTASs for $\mathcal{NP}$-hard optimization problems on disk graphs based on a novel recursive subdivision of the plane that allows applying a shifting strategy on different levels simultaneously, so that a dynamic programming approach becomes feasible.
Abstract: A disk graph is the intersection graph of a set of disks with arbitrary diameters in the plane. For the case that the disk representation is given, we present polynomial-time approximation schemes (PTASs) for the maximum weight independent set problem (selecting disjoint disks of maximum total weight) and for the minimum weight vertex cover problem in disk graphs. These are the first known PTASs for $\mathcal{NP}$-hard optimization problems on disk graphs. They are based on a novel recursive subdivision of the plane that allows applying a shifting strategy on different levels simultaneously, so that a dynamic programming approach becomes feasible. The PTASs for disk graphs represent a common generalization of previous results for planar graphs and unit disk graphs. They can be extended to intersection graphs of other "disk-like" geometric objects (such as squares or regular polygons), also in higher dimensions.

202 citations


Journal ArticleDOI
TL;DR: Improved combinatorial approximation algorithms for the uncapacitated facility location problem and a variant of the capacitated facility locations problem is considered and improved approximation algorithms are presented for this.
Abstract: We present improved combinatorial approximation algorithms for the uncapacitated facility location problem. Two central ideas in most of our results are cost scaling and greedy improvement. We present a simple greedy local search algorithm which achieves an approximation ratio of $2.414+\epsilon$ in $\tilde{O}(n^2/\epsilon)$ time. This also yields a bicriteria approximation tradeoff of $(1+\gamma,1+2/\gamma)$ for facility cost versus service cost which is better than previously known tradeoffs and close to the best possible. Combining greedy improvement and cost scaling with a recent primal-dual algorithm for facility location due to Jain and Vazirani, we get an approximation ratio of $1.853$ in $\tilde{O}(n^3)$ time. This is very close to the approximation guarantee of the best known algorithm which is linear programming (LP)-based. Further, combined with the best known LP-based algorithm for facility location, we get a very slight improvement in the approximation factor for facility location, achieving $1.728$. We also consider a variant of the capacitated facility location problem and present improved approximation algorithms for this.

202 citations


Journal ArticleDOI
TL;DR: In this article, it was shown that the clique-width of a graph G with treewidth k is at most 3 * 2k - 1 and, more importantly, that there is an exponential lower bound on this relationship.
Abstract: Treewidth is generally regarded as one of the most useful parameterizations of a graph's construction. Clique-width is a similar parameterization that shares one of the powerful properties of treewidth, namely: if a graph is of bounded treewidth (or clique-width), then there is a polynomial time algorithm for any graph problem expressible in monadic second order logic, using quantifiers on vertices (in the case of clique-width you must assume a clique-width parse expression is given). In studying the relationship between treewidth and clique-width, Courcelle and Olariu [Discrete Appl. Math., 101 (2000), pp. 77--114] showed that any graph of bounded treewidth is also of bounded clique-width; in particular, for any graph G with treewidth k, the clique-width of G is at most 4 * 2k - 1 + 1. In this paper, we improve this result by showing that the clique-width of G is at most 3 * 2k - 1 and, more importantly, that there is an exponential lower bound on this relationship. In particular, for any k, there is a graph G with treewidth equal to k, where the clique-width of G is at least $2^{\lfloor k/2\rfloor - 1}$.

185 citations


Journal ArticleDOI
TL;DR: This work provides an optimal two-stage algorithm that uses a number of tests of the same order as the information-theoretic lower bound on the problem and provides efficient algorithms for the case in which there is a Bernoulli probability distribution on the possible sets.
Abstract: Group testing refers to the situation in which one is given a set of objects ${\cal O}$, an unknown subset ${\cal P}\subseteq {\cal O}$, and the task of determining ${\cal P}$ by asking queries of the type "does ${\cal P}$ intersect $\cal Q$?," where $\cal Q$ is a subset of ${\cal O}$ Group testing is a basic search paradigm that occurs in a variety of situations such as quality control testing, searching in storage systems, multiple access communications, and data compression, among others Group testing procedures have been recently applied in computational molecular biology, where they are used for screening libraries of clones with hybridization probes and sequencing by hybridization Motivated by particular features of group testing algorithms used in biological screening, we study the efficiency of two-stage group testing procedures Our main result is the first optimal two-stage algorithm that uses a number of tests of the same order as the information-theoretic lower bound on the problem We also provide efficient algorithms for the case in which there is a Bernoulli probability distribution on the possible sets ${\cal P}$, and an optimal algorithm for the case in which the outcome of tests may be unreliable because of the presence of "inhibitory" items in ${\cal O}$ Our results depend on a combinatorial structure introduced in this paper We believe that it will prove useful in other contexts, too

Journal ArticleDOI
TL;DR: It is proved that the integrality ratio of the metric relaxation is at least c√lgk for a positive c for infinitely many k and the results improve some of the results of Kleinberg and Tardos and they further the understanding on how to use metric relaxations.
Abstract: In the 0-extension problem, we are given a weighted graph with some nodes marked as terminals and a semimetric on the set of terminals Our goal is to assign the rest of the nodes to terminals so as to minimize the sum, over all edges, of the product of the edge's weight and the distance between the terminals to which its endpoints are assigned This problem generalizes the multiway cut problem of Dahlhaus et al [SIAM J Comput}, 23 (1994), pp 864--894] and is closely related to the metric labeling problem introduced by Kleinberg and Tardos [Proceedings of the 40th IEEE Annual Symposium on Foundations of Computer Science, New York, 1999, pp 14--23] We present approximation algorithms for {\sc 0-Extension} In arbitrary graphs, we present a O(log k)-approximation algorithm, k being the number of terminals We also give O(1)-approximation guarantees for weighted planar graphs Our results are based on a natural metric relaxation of the problem previously considered by Karzanov [European J Combin, 19 (1998), pp 71--101] It is similar in flavor to the linear programming relaxation of Garg, Vazirani, and Yannakakis [SIAM J Comput}, 25 (1996), pp 235--251] for the multicut problem, and similar to relaxations for other graph partitioning problems We prove that the integrality ratio of the metric relaxation is at least $c \sqrt{\lg k}$ for a positive c for infinitely many k Our results improve some of the results of Kleinberg and Tardos, and they further our understanding on how to use metric relaxations

Journal ArticleDOI
TL;DR: It is shown how estimates on the number of components in various subgraphs of G can be used to estimate the weight of its MST, and a nearly matching lower bound of $\Omega( dw \varepsilon^{-2} )$ is proved on the probe and time complexity of any approximation algorithm for MST weight.
Abstract: We present a probabilistic algorithm that, given a connected graph G (represented by adjacency lists) of average degree d, with edge weights in the set{1,...,w}, and given a parameter $0<\eps<1/2$, estimates in time $O( dw \varepsilon^{-2} \log{\frac{dw}\varepsilon})$ the weight of the minimum spanning tree (MST) of G with a relative error of at most $\eps$. Note that the running time does not depend on the number of vertices in G. We also prove a nearly matching lower bound of $\Omega( dw \varepsilon^{-2} )$ on the probe and time complexity of any approximation algorithm for MST weight. The essential component of our algorithm is a procedure for estimating in time $O(d\eps^{-2}\log \frac{d}\varepsilon)$ the number of connected components of an unweighted graph to within an additive error of $\varepsilon n$. (This becomes $O(\eps^{-2}\log \frac{1}\varepsilon)$ for $d=O(1)$.) The time bound is shown to be tight up to within the $\log \frac{d}\varepsilon$ factor. Our connected-components algorithm picks $O(1/\varepsilon^2)$ vertices in the graph and then grows "local spanning trees" whose sizes are specified by a stochastic process. From the local information collected in this way, the algorithm is able to infer, with high confidence, an estimate of the number of connected components. We then show how estimates on the number of components in various subgraphs of G can be used to estimate the weight of its MST.

Journal ArticleDOI
TL;DR: A distributed algorithm is presented which constructs a minimum-weight spanning tree in O(log log n) communication rounds, where in each round any process can send a message to every other process.
Abstract: We consider a simple model for overlay networks, where all n processes are connected to all other processes, and each message contains at most O(log n) bits. For this model, we present a distributed algorithm which constructs a minimum-weight spanning tree in O(log log n) communication rounds, where in each round any process can send a message to every other process. If message size is $\Theta(n^\epsilon)$ for some $\epsilon>0$, then the number of communication rounds is $O(\log{1\over\epsilon})$.

Journal ArticleDOI
TL;DR: A flow labeling scheme using O(log ṡ logŵ-vertex graphs with maximum (integral) capacity and is shown to be asymptotically optimal.
Abstract: This paper studies labeling schemes for flow and connectivity functions. A flow labeling scheme using $O(\log n\cdot\log {\hat{\omega}}+\log^2 n)$-bit labels is presented for general n-vertex graphs with maximum (integral) capacity ${\hat{\omega}}$. This is shown to be asymptotically optimal. For edge-connectivity, this yields a tight bound of $\Theta(\log^2 n)$ bits. A k-vertex connectivity labeling scheme is then given for general n-vertex graphs using at most 3 log n bits for k = 2, 5 log n bits for k = 3, and 2k log n bits for k > 3. Finally, a lower bound of $\Omega (k\log n)$ is established for k -vertex connectivity on n-vertex graphs, where k is polylogarithmic in n.

Journal ArticleDOI
TL;DR: In this article, a multilayered probabilistically checkable proof (PCP) construction that extends the Raz verifier is presented, which enables them to prove that the problem is NP-hard to approximate within a factor of (k-1-πsilon) for arbitrary constants ω > 0 and ω ≥ 3.
Abstract: Given a k-uniform hypergraph, the Ek-Vertex-Cover problem is to find the smallest subset of vertices that intersects every hyperedge. We present a new multilayered probabilistically checkable proof (PCP) construction that extends the Raz verifier. This enables us to prove that Ek-Vertex-Cover is NP-hard to approximate within a factor of $(k-1-\epsilon)$ for arbitrary constants $\epsilon>0$ and $k\ge 3$. The result is nearly tight as this problem can be easily approximated within factor k. Our construction makes use of the biased long-code and is analyzed using combinatorial properties of s-wise t-intersecting families of subsets. We also give a different proof that shows an inapproximability factor of $\lfloor \frac{k}{2} \rfloor -\eps$. In addition to being simpler, this proof also works for superconstant values of k up to (log N)1/c, where c > 1 is a fixed constant and N is the number of hyperedges.

Journal ArticleDOI
TL;DR: It is shown that delayed simulation---unlike fair simulation---preserves the automaton language upon quotienting and allows substantially better state space reduction than direct simulation.
Abstract: We give efficient algorithms, improving optimal known bounds, for computing a variety of simulation relations on the state space of a Buchi automaton. Our algorithms are derived via a unified and simple parity-game framework. This framework incorporates previously studied notions like fair and direct simulation, but also a new natural notion of simulation called delayed simulation, which we introduce for the purpose of state space reduction. We show that delayed simulation---unlike fair simulation---preserves the automaton language upon quotienting and allows substantially better state space reduction than direct simulation. Using our parity-game approach, which relies on an algorithm by Jurdzinski, we give efficient algorithms for computing all of the above simulations. In particular, we obtain an O(mn3)-time and O(mn)-space algorithm for computing both the delayed and the fair simulation relations. The best prior algorithm for fair simulation requires time and space O(n6). Our framework also allows one to compute bisimulations: we compute the fair bisimulation relation in O(mn3) time and O(mn) space, whereas the best prior algorithm for fair bisimulation requires time and space O(n10).

Journal ArticleDOI
TL;DR: It is proved that the queue-number is bounded by the tree-width, thus resolving an open problem due to Ganley and Heath and disproving a conjecture of Pemmaraju.
Abstract: A queue layout of a graph consists of a total order of the vertices, and a partition of the edges into queues, such that no two edges in the same queue are nested. The minimum number of queues in a queue layout of a graph is its queue-number. A three-dimensional (straight-line grid) drawing of a graph represents the vertices by points in $\mathbb{Z}^3$ and the edges by noncrossing line-segments. This paper contributes three main results: (1) It is proved that the minimum volume of a certain type of three-dimensional drawing of a graph G is closely related to the queue-number of G. In particular, if G is an n-vertex member of a proper minor-closed family of graphs (such as a planar graph), then G has a $\mathcal{O}(1) \times \mathcal{O}(1) \times \mathcal{O}(n)$ drawing if and only if G has a $\mathcal{O}(1)$ queue-number. (2) It is proved that the queue-number is bounded by the tree-width, thus resolving an open problem due to Ganley and Heath [Discrete Appl. Math., 109 (2001), pp. 215--221] and disproving a conjecture of Pemmaraju [Exploring the Powers of Stacks and Queues via Graph Layouts, Ph. D. thesis, Virginia Polytechnic Institute and State University, Blacksburg, VA, 1992]. This result provides renewed hope for the positive resolution of a number of open problems in the theory of queue layouts. (3) It is proved that graphs of bounded tree-width have three-dimensional drawings with $\mathcal{O}(n)$ volume. This is the most general family of graphs known to admit three-dimensional drawings with $\mathcal{O}(n)$ volume. The proofs depend upon our results regarding track layouts and tree-partitions of graphs, which may be of independent interest.

Journal ArticleDOI
TL;DR: Two dynamic search trees attaining near-optimal performance on any hierarchical memory hierarchy are presented, matching the performance of the B-tree for $B = \Omega(\log N \log\log N)$.
Abstract: This paper presents two dynamic search trees attaining near-optimal performance on any hierarchical memory. The data structures are independent of the parameters of the memory hierarchy, e.g., the number of memory levels, the block-transfer size at each level, and the relative speeds of memory levels. The performance is analyzed in terms of the number of memory transfers between two memory levels with an arbitrary block-transfer size of B; this analysis can then be applied to every adjacent pair of levels in a multilevel memory hierarchy. Both search trees match the optimal search bound of $\Theta(1+\log_{B+1}N)$ memory transfers. This bound is also achieved by the classic B-tree data structure on a two-level memory hierarchy with a known block-transfer size B. The first search tree supports insertions and deletions in $\Theta(1+\log_{B+1}N)$ amortized memory transfers, which matches the B-tree's worst-case bounds. The second search tree supports scanning S consecutive elements optimally in $\Theta(1+S/B)$ memory transfers and supports insertions and deletions in $\Theta(1+\log_{B+1}N + \frac{\log^2N}{B})$ amortized memory transfers, matching the performance of the B-tree for $B = \Omega(\log N \log\log N)$.

Journal ArticleDOI
TL;DR: There are sufficient conditions for linear properties to be hard to test, and it is proved that there are 3CNF formulae (with O(n) clauses) such that testing for the associated property requires $\Omega( n)$ queries, even with adaptive tests.
Abstract: For a Boolean formula $\phi$ on n variables, the associated property $P_\phi$ is the collection of n-bit strings that satisfy $\phi$. We study the query complexity of tests that distinguish (with high probability) between strings in $P_\phi$ and strings that are far from $P_\phi$ in Hamming distance. We prove that there are 3CNF formulae (with O(n) clauses) such that testing for the associated property requires $\Omega(n)$ queries, even with adaptive tests. This contrasts with 2CNF formulae, whose associated properties are always testable with $O(\sqrt{n})$ queries [E. Fischer et al., Monotonicity testing over general poset domains, in Proceedings of the 34th Annual ACM Symposium on Theory of Computing, ACM, New York, 2002, pp. 474--483]. Notice that for every negative instance (i.e., an assignment that does not satisfy $\phi$) there are three bit queries that witness this fact. Nevertheless, finding such a short witness requires reading a constant fraction of the input, even when the input is very far from satisfying the formula that is associated with the property. A property is linear if its elements form a linear space. We provide sufficient conditions for linear properties to be hard to test, and in the course of the proof include the following observations which are of independent interest: In the context of testing for linear properties, adaptive two-sided error tests have no more power than nonadaptive one-sided error tests. Moreover, without loss of generality, any test for a linear property is a linear test. A linear test verifies that a portion of the input satisfies a set of linear constraints, which define the property, and rejects if and only if it finds a falsified constraint. A linear test is by definition nonadaptive and, when applied to linear properties, has a one-sided error. Random low density parity check codes (which are known to have linear distance and constant rate) are not locally testable. In fact, testing such a code of length n requires $\Omega(n)$ queries.

Journal ArticleDOI
TL;DR: It is shown that a $\gamma$-multiplicative approximation to the entropy can be obtained in $O(n^{(1+\eta)/\gamma^2} \log n)$ time for distributions with entropy $\Omega(\gamma/\eta)$, where $n$ is the size of the domain of the distribution and $\eta$ is an arbitrarily small positive constant.
Abstract: We consider the problem of approximating the entropy of a discrete distribution under several different models of oracle access to the distribution. In the evaluation oracle model, the algorithm is given access to the explicit array of probabilities specifying the distribution. In this model, linear time in the size of the domain is both necessary and sufficient for approximating the entropy. In the generation oracle model, the algorithm has access only to independent samples from the distribution. In this case, we show that a $\gamma$-multiplicative approximation to the entropy can be obtained in $O(n^{(1+\eta)/\gamma^2} \log n)$ time for distributions with entropy $\Omega(\gamma/\eta)$, where $n$ is the size of the domain of the distribution and $\eta$ is an arbitrarily small positive constant. We show that this model does not permit a multiplicative approximation to the entropy in general. For the class of distributions to which our upper bound applies, we obtain a lower bound of $\Omega(n^{1/(2\gamma^2)})$. We next consider a combined oracle model in which the algorithm has access to both the generation and the evaluation oracles of the distribution. In this model, significantly greater efficiency can be achieved: we present an algorithm for $\gamma$-multiplicative approximation to the entropy that runs in $O((\gamma^2 \log^2{n})/(h^2 (\gamma-1)^2))$ time for distributions with entropy $\Omega(h)$; for such distributions, we also show a lower bound of $\Omega((\log n)/(h(\gamma^2-1)+\gamma^2))$. Finally, we consider two special families of distributions: those in which the probabilities of the elements decrease monotonically with respect to a known ordering of the domain, and those that are uniform over a subset of the domain. In each case, we give more efficient algorithms for approximating the entropy.

Journal ArticleDOI
TL;DR: It is shown that any black-box construction beating the authors' efficiency bound would yield the unconditional existence of a one-way function and thus, in particular, prove $P eq NP$.
Abstract: A central focus of modern cryptography is the construction of efficient, high-level cryptographic tools (e.g., encryption schemes) from weaker, low-level cryptographic primitives (e.g., one-way functions). Of interest are both the existence of such constructions and their efficiency. Here, we show essentially tight lower bounds on the best possible efficiency of any black-box construction of some fundamental cryptographic tools from the most basic and widely used cryptographic primitives. Our results hold in an extension of the model introduced by Impagliazzo and Rudich and improve and extend earlier results of Kim, Simon, and Tetali. We focus on constructions of pseudorandom generators, universal one-way hash functions, and digital signatures based on one-way permutations, as well as constructions of public- and private-key encryption schemes based on trapdoor permutations. In each case, we show that any black-box construction beating our efficiency bound would yield the unconditional existence of a one-way function and thus, in particular, prove $P eq NP$.

Journal ArticleDOI
TL;DR: In this article, the problem of job scheduling on a variable voltage processor with $d$ discrete voltage/speed levels is considered and an algorithm which constructs a minimum energy schedule for $n$ jobs in $O(d n\log n)$ time is given.
Abstract: We consider the problem of job scheduling on a variable voltage processor with $d$ discrete voltage/speed levels. We give an algorithm which constructs a minimum energy schedule for $n$ jobs in $O(d n\log n)$ time. Previous approaches solve this problem by first computing the optimal continuous solution in $O(n^3)$ time and then adjusting the speed to discrete levels. In our approach, the optimal discrete solution is characterized and computed directly from the inputs. We also show that $O(n\log n)$ time is required; hence the algorithm is optimal for fixed $d$.

Journal ArticleDOI
TL;DR: A rigorous self-contained proof of results along the lines of Ajtai's seminal work is presented, and it is shown how this reduction implies the existence of collision resistant cryptographic hash functions based on the worst-case inapproximability of the shortest vector problem within the same factors.
Abstract: Lattices have received considerable attention as a potential source of computational hardness to be used in cryptography, after a breakthrough result of Ajtai [in Proceedings of the 28th Annual ACM Symposium on Theory of Computing, Philadelphia, PA, 1996, pp. 99--108] connecting the average-case and worst-case complexity of various lattice problems. The purpose of this paper is twofold. On the expository side, we present a rigorous self-contained proof of results along the lines of Ajtai's seminal work. At the same time, we explore to what extent Ajtai's original results can be quantitatively improved. As a by-product, we define a random class of lattices such that computing short nonzero vectors in the class with nonnegligible probability is at least as hard as approximating the length of the shortest nonzero vector in any n-dimensional lattice within worst-case approximation factors $\gamma(n) = n^{3} \omega(\sqrt{\log n\log\log n})$. This improves previously known best connection factor $\gamma(n) = n^{4+\epsilon}$ [J.-Y. Cai and A. P. Nerurkar, in Proceedings of the 38th Annual IEEE Symposium on Foundations of Computer Science, Miami Beach, FL, 1997, pp. 468--477]. We also show how our reduction implies the existence of collision resistant cryptographic hash functions based on the worst-case inapproximability of the shortest vector problem within the same factors $\gamma(n) = n^{3} \omega(\sqrt{\log n\log\log n})$. In the process we distill various new lattice problems that might be of independent interest, related to the covering radius, the bounded distance decoding problem, approximate counting of lattice points inside convex bodies, and the efficient construction of lattices with good geometric and algorithmic decoding properties. We also show how further investigation of these new lattice problems might lead to even stronger connections between the average-case and worst-case complexity of the shortest vector problem, possibly leading to connection factors as low as $\gamma(n) = n^{1.5} \omega(\log n)$.

Journal ArticleDOI
TL;DR: The algorithm takes the hierarchy-based approach invented by Thorup, and, if the ratio between the maximum and minimum edge lengths is bounded by n(log n)O(1), it can solve the single-source problem in O(m + n log log n) time.
Abstract: We present a new scheme for computing shortest paths on real-weighted undirected graphs in the fundamental comparison-addition model. In an efficient preprocessing phase our algorithm creates a linear-size structure that facilitates single-source shortest path computations in O(m log $\alpha$) time, where $\alpha$ = $\alpha$(m,n) is the very slowly growing inverse-Ackermann function, m the number of edges, and n the number of vertices. As special cases our algorithm implies new bounds on both the all-pairs and single-source shortest paths problems. We solve the all-pairs problem in O(mn log $\alpha$(m,n)) time and, if the ratio between the maximum and minimum edge lengths is bounded by n(log n)O(1), we can solve the single-source problem in O(m + n log log n) time. Both these results are theoretical improvements over Dijkstra's algorithm, which was the previous best for real weighted undirected graphs. Our algorithm takes the hierarchy-based approach invented by Thorup.

Journal ArticleDOI
TL;DR: This work derives the first constant-factor approximation algorithms for this model, which considers parallel, identical machine scheduling problems, where the jobs are subject to precedence constraints and release dates, and where the processing times are governed by independent probability distributions.
Abstract: We consider parallel, identical machine scheduling problems, where the jobs are subject to precedence constraints and release dates, and where the processing times of jobs are governed by independent probability distributions. Our objective is to minimize the expected value of the total weighted completion time. Building upon a linear programming relaxation by Mohring, Schulz, and Uetz [J. ACM, 46 (1999), pp. 924--942] and a delayed list scheduling algorithm by Chekuri et al. [SIAM J. Comput., 31 (2001), pp. 146--166], we derive the first constant-factor approximation algorithms for this model.

Journal ArticleDOI
TL;DR: The first deterministic online algorithm that is better than 2-competitive is presented, and a modified greedy algorithm is developed, called semigreedy, and it is proved that it achieves a competitive ratio of $17/9 \approx 1.89$.
Abstract: We study a basic buffer management problem that arises in network switches. Consider $m$ input ports, each of which is equipped with a buffer (queue) of limited capacity. Data packets arrive online and can be stored in the buffers if space permits; otherwise packet loss occurs. In each time step the switch can transmit one packet from one of the buffers to the output port. The goal is to maximize the number of transmitted packets. Simple arguments show that any work-conserving algorithm, which serves any nonempty buffer, is 2-competitive. Azar and Richter recently presented a randomized online algorithm and gave lower bounds for deterministic and randomized strategies. In practice, greedy algorithms are very important because they are fast, use little extra memory, and reduce packet loss by always serving a longest queue. In this paper we first settle the competitive performance of the entire family of greedy strategies. We prove that greedy algorithms are not better than 2-competitive no matter how ties are broken. Our lower bound proof uses a new recursive construction for building adversarial buffer configurations that may be of independent interest. We also give improved lower bounds for deterministic and randomized online algorithms. In this paper we present the first deterministic online algorithm that is better than 2-competitive. We develop a modified greedy algorithm, called semigreedy, and prove that it achieves a competitive ratio of $17/9 \approx 1.89$. The new algorithm is simple, fast, and uses little extra memory. Only when the risk of packet loss is low does it not serve the longest queue. Additionally we study scenarios when an online algorithm is granted additional resources. We consider resource augmentation with respect to memory and speed; i.e., an online algorithm may be given larger buffers or higher transmission rates. We analyze greedy and other online strategies.

Journal ArticleDOI
TL;DR: It is shown how to maintain a data structure on trees which allows for the following operations, all in worst-case constant time, which generalize the Dietz--Sleator "cup-filling" scheduling methodology.
Abstract: We show how to maintain a data structure on trees which allows for the following operations, all in worst-case constant time: insertion of leaves and internal nodes, deletion of leaves, deletion of internal nodes with only one child, determining the least common ancestor of any two nodes. We also generalize the Dietz--Sleator "cup-filling" scheduling methodology, which may be of independent interest.

Journal ArticleDOI
TL;DR: An algorithm to compute an orderly pair for any connected planar graphs G, consisting of an embedded planar graph H isomorphic to G, and an orderly spanning tree of H.
Abstract: We introduce and study orderly spanning trees of plane graphs. This algorithmic tool generalizes canonical orderings, which exist only for triconnected plane graphs. Although not every plane graph admits an orderly spanning tree, we provide an algorithm to compute an orderly pair for any connected planar graph G, consisting of an embedded planar graph H isomorphic to G, and an orderly spanning tree of H. We also present several applications of orderly spanning trees: (1) a new constructive proof for Schnyder's realizer theorem, (2) the first algorithm for computing an area-optimal 2-visibility drawing of a planar graph, and (3) the most compact known encoding of a planar graph with O(1)-time query support. All algorithms in this paper run in linear time.

Journal ArticleDOI
TL;DR: It is proved that this problem is decidable or undecidable depending on whether recognition is defined by strict or nonstrict thresholds, for probabilistic finite automata.
Abstract: We study the following decision problem: is the language recognized by a quantum finite automaton empty or nonempty? We prove that this problem is decidable or undecidable depending on whether recognition is defined by strict or nonstrict thresholds. This result is in contrast with the corresponding situation for probabilistic finite automata, for which it is known that strict and nonstrict thresholds both lead to undecidable problems.